1. Neuroscience
Download icon

Specific lexico-semantic predictions are associated with unique spatial and temporal patterns of neural activity

  1. Lin Wang  Is a corresponding author
  2. Gina Kuperberg
  3. Ole Jensen
  1. Harvard Medical School, United States
  2. Massachusetts General Hospital, United States
  3. Tufts University, United States
  4. Centre for Human Brain Health, University of Birmingham, United Kingdom
Research Article
  • Cited 0
  • Views 871
  • Annotations
Cite this article as: eLife 2018;7:e39061 doi: 10.7554/eLife.39061

Abstract

We used Magnetoencephalography (MEG) in combination with Representational Similarity Analysis to probe neural activity associated with distinct, item-specific lexico-semantic predictions during language comprehension. MEG activity was measured as participants read highly constraining sentences in which the final words could be predicted. Before the onset of the predicted words, both the spatial and temporal patterns of brain activity were more similar when the same words were predicted than when different words were predicted. The temporal patterns localized to the left inferior and medial temporal lobe. These findings provide evidence that unique spatial and temporal patterns of neural activity are associated with item-specific lexico-semantic predictions. We suggest that the unique spatial patterns reflected the prediction of spatially distributed semantic features associated with the predicted word, and that the left inferior/medial temporal lobe played a role in temporally ‘binding’ these features, giving rise to unique lexico-semantic predictions.

https://doi.org/10.7554/eLife.39061.001

Introduction

After reading or hearing the sentence context, ‘In the crib there is a sleeping …', we are easily able to predict the next word, ‘baby’. In other words, we are able to access a unique lexico-semantic representation of <baby> that is different from the lexico-semantic representation of any other word (e.g. <rose>), ahead of this information becoming available from the bottom-up input. In the present study, we used Magnetoencephalography (MEG), in combination with Representational Similarity Analysis (RSA), to show that the prediction of specific words is associated with distinct spatial and temporal patterns of neural activity before the predicted word is actually presented.

Prediction is hypothesized to be a core computational principle of brain function (Clark, 2013; Mumford, 1992). During language processing, probabilistic prediction at multiple levels of representation allows us to rapidly understand what we read or hear by giving processing a head start (see Kuperberg and Jaeger, 2016a, for a review). The strength of prediction, and the precise level of representation at which it occurs, is likely to depend on many factors (see Kuperberg and Jaeger, 2016a, section 3.4). However, there is now clear neural evidence that, at least in highly constraining sentence contexts, we are able to predict the semantic features of upcoming words.

This evidence comes from several sources. First, a large body of studies show that the N400 — an event related potential (ERP) that reflects semantic processing — is reduced in response to words whose semantic features match semantic predictions generated by highly predictable (versus less predictable) contexts. For example, the N400 elicited by ‘baby’ is smaller in the constraining context ‘In the crib, there is a sleeping …' than in the less constraining context, ‘Under the tree, there is a sleeping …' (Federmeier and Kutas, 1999; Kutas and Federmeier, 2011; Kuperberg, 2016b).

Second, several studies have reported differential modulation of brain activity following highly predictable versus less predictable sentence contexts, prior to the onset of predicted words. These include larger negative-going ERP effects (Freunberger and Roehm, 2017; Grisoni et al., 2017; León-Cabrera et al., 2017; Maess et al., 2016), increases in theta power (Dikker and Pylkkänen, 2013; Piai et al., 2016), and the suppression of alpha/beta power (Piai et al., 2014; Piai et al., 2015; Rommers et al., 2017; Wang et al., 2018). These anticipatory effects have been neuroanatomically localized to both neocortical (e.g. left frontal and temporal regions; Dikker and Pylkkänen, 2013; Piai et al., 2015; Wang et al., 2018) and subcortical (e.g. hippocampus and cerebellum; Bonhage et al., 2015; Lesage et al., 2017; Piai et al., 2016; Wang et al., 2018) regions. They have been attributed either to the process of generating predictions and/or access to the lexico-semantic representations that correspond to predicted words themselves. Importantly, however, previous studies have averaged across items that predict different upcoming words. It therefore remains unclear whether the brain produces unique patterns of neural activity that correspond to the prediction of item-specific lexico-semantic representations. For example, does the particular pattern of neural activity that is produced following the context, ‘In the crib there is a sleeping …' differ from the pattern of neural activity produced following the context, ‘On Valentine’s day, he sent his girlfriend a bouquet of red …'?

Multivariate Pattern Analysis (MVPA) provides one way of addressing this question (Kriegeskorte et al., 2008; Staudigl et al., 2015; Stokes et al., 2015a). Correlational approaches were first applied to fMRI data to identify spatial patterns of brain activity representing objects categories in the ventral stream (Haxby et al., 2001). They later evolved into Representational Similarity Analysis (RSA). The basic assumption of RSA is that similarities in patterns of brain activity reflect similarities between representationally similar items. Spatial RSA has been used to identify unique patterns of spatial activity during perception, cognition and action (Kriegeskorte et al., 2007; Kriegeskorte and Kievit, 2013). More recently, it has been applied to MEG and EEG data whose excellent temporal resolution can tell us exactly when such spatially-specific patterns of neural activity are activated in relation to the appearance of bottom-up input (Stokes et al., 2015a). Moreover, the precise temporal resolution of MEG/EEG also allows for the use of an analogous RSA approach that probes temporal rather than spatial patterns of neural similarity (Staudigl et al., 2015; Michelmann et al., 2016). Both spatial and temporal RSA approaches have been successfully used in combination with MEG and EEG to decode representationally specific visual information during the perception of bottom-up input (Cichy et al., 2014), as well as during its maintenance in working memory in the absence of bottom-up input (Wolff et al., 2017).

In the present study, we used MEG, together with both spatial and temporal RSA, to ask whether, under experimental conditions that are known to encourage specific lexico-semantic prediction, distinct words are associated with distinct spatial and temporal patterns of neural activity, prior to the appearance of the predicted input. Participants read 240 sentences, all with highly constraining contexts that predicted a specific word (Figure 1A). The sentences were visually presented at a slow rate of 1000 ms per word. This ensured the generation of specific lexico-semantic predictions and guaranteed sufficient time to detect any representationally specific neural activity before the onset of the predicted word. We constructed these sentences in pairs (120 pairs) such that each member of a pair predicted the same word, even though their contexts differed (e.g. ‘In the crib, there is a sleeping …' and ‘In the hospital, there is a newborn …'). During the experiment, sentences were presented in a pseudorandom order, with at least 30 other sentences (on average 88 sentences) in between each member of a given pair.

The experimental procedure and approach for Representational Similarity Analyses.

(A) Trials began with a blank screen (1600 ms). Sentences were presented in Chinese (translated here into English), word-by-word (200 ms per word; 800 ms blank interval between words). Sentences were followed either by ‘NEXT’ (2000 ms) or by a probe question (1/6th of trials, randomly). We constructed sentences in pairs such that the same word could be predicted from the context (e.g. S1-A and S2-A’; S3-B and S4-B') (although during presentation, members of each pair were presented separately, with at least 30 other sentences in between). One member of each pair ended with the predicted word (e.g. S1–A, S3–B) and the other member ended with a plausible but unpredicted word (e.g. S2–A’, S4–B’). Before the onset of the predicted word, we compared brain activity associated with the prediction of the same word (within-pairs) and a different word (between-pairs). (B) Spatial representational similarity analysis. Left: The pattern of MEG data over sensors was correlated between each sentence pair (e.g. S1–A and S2–A’) at each time sample t(j).. Right: The average spatial correlation values of pairs (R1within, R2within, …) in which the same word was predicted formed the within-pair spatial correlation time series (1Ni=1NRwithini, shown in red). The average spatial correlation values of pairs (R1between, R2between, …) in which different words were predicted formed the between-pair spatial correlation time series (12N(N1)i=12N(N1)Rbetweeni, shown in blue). (C) Temporal representational similarity analysis. Left: The temporal pattern of MEG activity was correlated between sentence pairs, at each sensor (sensor space) or at each grid point (source space). Right: The average temporal correlation values of pairs (R1within, R2within, …) in which the same word was predicted formed the within-pair temporal correlation topographic/source maps. The average temporal correlation values of pairs (R1between, R2between, …) in which different words were predicted formed the between-pair temporal correlation topographic/source maps.

https://doi.org/10.7554/eLife.39061.002

There is some evidence that the various semantic features and properties associated with words and concepts are represented in the brain across spatially distributed multimodal networks (Damasio, 1989; Price, 2000; Martin and Chao, 2001), which can be detected using spatial RSA (e.g. Devereux et al., 2013). For example, the particular set of semantic features and properties associated with the concept, <baby> (e.g. <human>, <small>, <cries>), might be represented by a particular spatially distributed pattern of neural activity, whereas the semantic features and properties associated with the concept, <rose> (e.g. <plant>, <scalloped petals>, <fragrant smell>) might be represented by a different spatially distributed pattern of neural activity. If, following a constraining context (e.g. In the crib, there is a sleeping …'), the prediction of a unique lexico-semantic item (<baby>) is represented by a unique spatial pattern of brain activity, then this spatial pattern should be more similar following another context that predicts the same word, that is within-pair (e.g. ‘In the hospital, there is a newborn …') than following another context that predicts a different word, that is between-pair (e.g. On Valentine’s day, he sent his girlfriend a bouquet of red …'). This should be just as true if we average across all within-pair sentences and compare them with all between-pair sentences. Importantly, this effect should be evident prior to the onset of the predicted word.

To test this hypothesis, we correlated the spatial pattern of MEG data across all sensors, between all possible pairs of sentences, at all time points over the last three words of the sentences. We were particularly interested in activity following the first word at which specific lexico-semantic predictions of upcoming words could be generated: the word before the SFW. We asked whether, at this point, the resulting spatial similarity values were larger following sentence contexts that constrained for the same word (within-pairs) than those that constrained for a different word (between-pairs), see Figure 1B.

A classic hypothesis of how spatially distributed semantic information becomes bound together to represent specific concepts in the brain is through a process of ‘temporal synchrony’ (Damasio, 1989). If, following a highly constraining context, the prediction of a unique lexico-semantic item is instantiated through a unique temporal pattern of brain activity, then the temporal patterns of neural activity should be more similar following pairs of sentence contexts that constrain for the same word (within-pairs) than following pairs that constrain for a different word (between-pairs). To test this hypothesis, we correlated the temporal pattern of MEG data evoked within the prediction period (before the onset of predicted word) between all possible pairs of sentences at each MEG sensor, and we asked whether there were any sensors in which the resulting temporal similarity values were larger for within-pair sentences than between-pair sentences.

Damasio (1989) also hypothesized that temporal binding occurred within so-called ‘convergence zones’ of the brain. Although it is still a matter of debate whether multiple convergence zones exist, parts of the temporal lobe, including anterior, ventral and medial regions, have been identified as ‘semantic hubs’ that bring spatially distributed semantic information together to form single concepts (Patterson et al., 2007; Ralph et al., 2017). If these regions play a functional role in instantiating the prediction of specific lexico-semantic items through temporal binding, then unique temporal patterns of prediction (Figure 1C) should localize to these regions. To test this hypothesis, we used source localization techniques to determine the neuroanatomical source of any increased temporal similarity following sentence contexts that predicted the same word (within-pairs) versus a different word (between-pairs).

Results

Twenty-six participants read 240 sentences, presented at a rate of one word per second, while MEG data were acquired. The sentences were constructed in pairs (120 pairs) that strongly predicted the same sentence-final word (SFW), although, during presentation, members of the same pair were separated by at least 30 (on average 88) other sentences. As an example (Figure 1A), sentences S1 (e.g. ‘In the crib there is a sleeping…') and S2 (‘In the hospital, there is a newborn…') both predicted the word ‘baby’. To avoid repetition of the predicted word across sentence pairs, one member of each pair ended with the predicted word (e.g. in S1, ‘baby’) while the other member ended with an unpredicted but plausible word (e.g. in S2, ‘child’). Participants were asked to read each sentence carefully and to answer yes/no comprehension questions following 1/6th of the sentences. Comprehension accuracy was high (98% ± 2.0%). We compared both spatial and temporal similarity patterns of sentence pairs that predicted the same SFWs (within-pairs) to those that predicted different SFWs (between-pairs), before the SFW actually appeared. All trials were included in the analysis.

Spatial RSA: The spatial pattern of neural activity was more similar in sentence pairs that predicted the same versus different words, and this effect began before the onset of the predicted word

In each participant, we quantified the degree of spatial similarity of MEG activity (30 Hz low-pass filter) produced by pairs of sentences that predicted either the same SFW (i.e. within-pairs, for example S1-A vs. S2-A’) or a different SFW (i.e. between-pairs, for example S1-A vs. S3-B) by correlating the spatial pattern of signal across sensors at each sampling point from −2000 ms until 1000 ms, relative to the onset of the SFW. We then averaged the resulting time series of spatial correlations (R-values), first within each participant and then across participants (Figure 2B). Both the within- and between-pair group-averaged time series of spatial correlation values showed a sharp increase at ~100 ms after the onset of the penultimate word (SFW-1; at −1000 ms) that lasted ~400 ms before sharply decreasing again (Figure 2A). The same pattern was observed around the onset of the previous word (SFW-2) and around the onset of the SFW itself. We attribute this general increase in spatial similarity to the visual onset and offset of each word. This general increase in spatial correlation was largest between −880 – −485 ms (R > 0.04) before the onset of the SFW, and between −897 – −507 ms (R > 0.04) before the onset of SFW-1.

Figure 2 with 4 supplements see all
Results of the Spatial Representational Similarity Analysis.

(A) The time series of spatial similarity R values combined across the within-pair and between-pair correlations. The horizontal line indicates a threshold of R = 0.04 where the general increase in spatial correlation was largest. (B) The time series of spatial similarity R values for pairs in which the same word was predicted (within-pairs, shown in red) and in which a different word was predicted (between-pairs, shown in blue). Both the within- and the between-pair spatial similarity time series showed a sharp increase at ~100 ms and a decrease at ~500 ms after the onset of each word. Between −880 and −485 ms before the onset of the final word, the spatial similarity was greater when the same word was predicted than when different words were predicted (within-pairs >between-pairs: t(25) = 3.751, p < 0.001). (C) Scatter plots of spatial similarity values averaged between −880 and −485 ms before the onset of the final word in 26 participants. In most participants (18/26) the within-pair spatial correlations were greater than the between-pair spatial correlations. (D) Cross-temporal spatial similarity matrices for the within- and between-pair correlations (Red: positive correlations; blue: negative correlations). Left and middle: Both sets of pairs showed increased spatial similarity along the diagonal with greater similarities for the within- than the between-pairs in the −900 – −500 ms interval prior to the onset of the final word. Right: The matrix shows the cluster with a statistically significant difference between the within-pair and between-pair spatial correlations (p = 0.002, cluster-randomization approach controlling for multiple comparisons over time). The absence of ‘off-diagonal’ correlations suggests that the spatial pattern of neural activity associated with the predicted word was reliable but changed over time.

https://doi.org/10.7554/eLife.39061.004

Averaged across the −880 – −485 ms interval before the onset of the SFW (corresponding to 120–515 ms after the onset of SFW-1), we found that the spatial pattern of neural activity was more similar in sentence pairs that predicted the same SFW (within-pairs: R = 0.074 + /- 0.02) than in pairs that predicted different SFWs (between-pairs: R = 0.067 + /- 0.02): t(25) = 3.751, p < 0.001, see Figure 2B. Figure 2C shows a scatter plot of the averaged R-values per participant within this interval. Eighteen out of 26 subjects had R-values below the diagonal, that is larger values for the within-pair than the between-pair spatial correlations. In contrast, there was no difference between the within-pair (R = 0.066 + /- 0.02) and between-pair (R = 0.068 + /- 0.02) spatial correlation values averaged across the −897 – −507 ms interval before the SFW-1 (corresponding to 103–493 ms after the onset of SFW-2): t(25) = −0.937, p = 0.358.

This difference in spatial similarity prior to the SFW cannot be explained by differences in the number of within-pair and between-pair sentences used to compute these mean spatial correlation values. This is because the number of trials per condition can affect the variance of the estimated mean value, but not the value of the estimated mean itself. Given that we carried out statistical analyses on the estimated mean values, the different number of within-pairs and between-pairs should not affect statistical inference at the participant level (Groppe et al., 2011 and Thomas et al., 2004). Nonetheless, to convince skeptics, we repeated the analysis using a randomly selected subset of between-pair correlations that matched the number of within-pair correlations. This analysis confirmed that the within-pair spatial correlation values remained significantly greater than the between-pair correlations (t(25) = 2.393, p = 0.025; Figure 2—figure supplement1).

The difference in spatial similarity prior to the SFW cannot be explained by differences in lexical processing of the word before the SFW (SFW-1): this word always differed within sentence pairs, and any differences in its lexical properties (visual complexity, word frequency and syntactic class) between members of a pair were matched between pairs that constrained for the same SFW (within-pairs) and pairs that constrained for a different SFW (between-pairs). The spatial similarity effect also cannot be explained by differences in the predictability of the SFW-1: the cloze probability of these words was fairly low (11% on average) and any difference in cloze probability between members of a pair was matched between pairs that constrained for the same SFW (within-pairs) and pairs that constrained for a different SFW (between-pairs).

However, to fully exclude the possibility that the spatial similarity effect was driven by lexical processing of the SFW-1 rather than anticipatory processing of the SFW itself, we carried out an additional control analysis. First, we selected a subset of 31 pairs of sentences that contained exactly the same SFW-1, but nonetheless predicted a different SFW (43 unique SFWs; this between-pairs subset can be found in Figure 1—source data 1). Then we selected sentence pairs that constrained for these same SFWs (within-pairs), but which differed in the SFW-1. These constituted 43 within-pair sentences (also shown in Figure 1—source data 1). Various global contextual properties (such as the number of words, number of clauses, and syntactic complexity) as well as the cloze probability of the SFW-1 were matched between this within-pair and between-pair subset (all ps > 0.05). We then compared the spatial similarity between these two subsets of sentence pairs. If the increased spatial similarity associated with the within-pairs versus between-pairs was due to the lexical processing of the SFW-1, then the spatial similarity should be greater in sentence pairs containing exactly the same SFW-1 (i.e. in the subset of between-pairs) than in sentence pairs that predicted the same SFW (i.e. in the subset of within-pairs). We found no evidence for this. Instead, we found that the averaged spatial similarity across the −880 – −485 ms interval before the onset of the SFW (corresponding to 120–515 ms after the onset of SFW-1) remained larger for the within-pair sentences (R = 0.072 + /- 0.02) than the between-pair sentences (R = 0.063 + /- 0.03): t(25) = 1.81, p = 0.08 (Figure 2—figure supplement 2), although in this subset analysis, the difference only approached significance due to the limited statistical power (on average there were only 40 within-pairs and 29 between-pairs after artifact rejection). Interestingly, the spatial correlation values, averaged across the −897 – −507 ms interval before the SFW-1 (corresponding to 103–493 ms after the onset of SFW-2), was larger for the between-pairs (R = 0.069 + /- 0.03) than the within-pairs (R = 0.058 + /- 0.03): t(25) = −2.295, p = 0.03. It is possible that this difference was driven by the prediction of the same SFW-1 in the between-pairs. However, this interpretation is speculative.

During sentence presentation, we avoided the repetition of the SFW (e.g. ‘baby’) within pairs by replacing the predicted SFW of one member of a pair with an unpredicted but plausible word in the other member of the pair (e.g. ‘child’). However, one might argue that, after encountering the predicted word (‘baby’), participants retained this item within memory, and that the increased spatial similarity of brain activity when reading the other member of the pair was due to anticipatory retrieval of this item that was facilitated by its previous presentation as a SFW. To address this concern, we divided the sentence pairs into two subsets according to whether the sentences with expected or unexpected SFWs were presented first. We then applied the spatial similarity analysis to both subsets (Figure 2—figure supplement 3) and compared their spatial similarity values. A repeated measures ANOVA with the factors Order (Expected SFW first, Unexpected SFW first) and Pairs (Within-pair, Between-pair) showed no main effect of Order (F(1,25) = 0.747, p = 0.396, η2 = 0.029), nor an interaction between Order and Pairs (F(1,25) = 1.804, p = 0.191, η2 = 0.067). We conclude that previously encountering a sentence ending with the expected SFW did not inflate the spatial similarity between sentence pairs that predicted the same SFW.

We then asked whether the increased spatial similarity associated with the within-pair versus between-pair sentences reflected the prediction of semantic features over and above the prediction of a general syntactic category (it is known that nouns and verbs are associated with distinct spatial patterns of activity; Vigliocco et al., 2011). To do this, we calculated within-category spatial similarity values by averaging the spatial similarity between all pairs of sentences that predicted the same syntactic category of words (i.e. nouns or verbs) and compared these values to the within-pair spatial similarity values. We found that the spatial similarity associated with pairs that predicted the same specific words (within-pair spatial similarity: R = 0.074 + /- 0.02) was significantly larger than the spatial similarity associated with pairs that predicted the same category (within-category spatial similarity: R = 0.068 + /- 0.02), (t(25) = −3.559, p = 0.002; Figure 2—figure supplement 4). This suggests that the greater within-pair versus between-pair spatial similarity effect was not simply reducible to the prediction of general syntactic category.

Finally, to further characterize the time course of brain activity reflecting unique lexico-semantic predictions, we correlated the spatial pattern of activity (across sensors) between each sentence (e.g. S1-A) at each time sample (e.g. t1) with that of its paired sentence (e.g. S2-A’) at all time samples (e.g. from t1 to tn) in each participant (see also King and Dehaene, 2014 and Stokes et al., 2015a), yielding a cross-temporal within-pair similarity matrix. We also calculated between-pair cross-temporal similarity matrices and averaged these within each participant and then across participants (Figure 2D). As expected, both the within- and between-pair group-averaged cross-temporal spatial similarity matrices showed that the spatial similarity was strongest around the diagonal in the first 500 ms after the onset of SFW-1. This was also the case for the difference between the within-pair and between-pair matrices (cluster-based permutation test: p = 0.002). This effect along the diagonal is consistent with the spatial similarity difference reported in Figure 2B and C. Moreover, the absence of an effect off the diagonal suggests that the spatial patterns associated with prediction changed over time.

Temporal RSA: The temporal pattern of neural activity was more similar in sentence pairs that predicted the same versus different words, and this effect localized to left inferior temporal regions

As described above, across the −880 – −485 ms interval prior to the onset of the SFW, we observed a general increase in spatial similarity between all pairs of sentences, regardless of whether they constrained for the same SFW (within-pairs) or a different SFW (between-pairs). We next asked whether, within this time window, there were any brain regions in which the temporal pattern of neural activity was more similar for within-pairs than between-pairs. Note that this temporal RSA approach is fairly conservative in that it was limited to the time window that showed a spatial similarity effect, and so it may not have captured more extended temporal similarity effects that were not accompanied by a spatial similarity effect. The reason we took this approach is that we were interested, a priori, in any functional relationship between these measures, that is whether the spatial similarity effect reflected brain activity associated with the prediction of spatially distributed semantic representations, and whether the temporal similarity effect reflected brain activity associated with temporal binding of these spatially distributed representations. However, in order to fully exploit the spatiotemporal pattern of the data, future studies could examine the spatial and temporal patterns simultaneously using a spatiotemporal searchlight approach (Nili et al., 2014; Su et al., 2012; Su et al., 2014).

In each participant, we quantified the degree of temporal similarity of MEG activity produced by sentence pairs that predicted the same versus a different SFW by correlating the temporal pattern of signal produced within this time window at each sensor, yielding spatial topographic maps of temporal correlations. We then averaged these spatial temporal similarity maps within each participant and then across participants (Figure 1C). The group-averaged temporal similarity maps revealed a general increase in temporal similarity over bilateral temporal and posterior sensors, regardless of whether sentences predicted the same or a different SFW. When comparing the within- and between-pair temporal similarity topographic maps (Figure 3A), the temporal pattern of neural activity was more similar in pairs that predicted the same versus different words over central and posterior sensors (cluster-based randomization test: p = 0.002; Figure 3A: right panel). The comparison of a randomly selected subset of between-pair correlations that matched the number of within-pair correlations showed a similar, albeit slightly reduced, effect (marginally significant cluster: p = 0.0679; a cluster-randomization approach controlling for multiple comparisons over sensors; see Figure 3—figure supplement 1A for the group-averaged temporal similarity maps).

Figure 3 with 2 supplements see all
Results of the Temporal Representational Similarity Analysis.

The Temporal Representational Similarity Analysis was carried out between −880 and −485 ms before the onset of the final word. (A) Temporal similarity topographic maps at the sensor level. Left and middle: Both the within- and between-pair correlations revealed increased temporal similarity over bilateral temporal and posterior sensors. Right: the difference map revealed greater temporal similarity when the same word was predicted (within-pairs) than when a different word was predicted (between-pairs) over central and posterior sensors. The sensors where this difference was significant at the cluster level are marked with black asterisks (p = 0.002; a cluster-randomization approach controlling for multiple comparisons over sensors). (B) Temporal similarity difference map in source space. The correlation values were interpolated on the MNI template brain and are shown both on the coronal plane (Talairach coordinate of peak: y = −19.5 mm) and the sagittal plane (Talairach coordinate of peak: x = −39.5 mm). This revealed significantly greater temporal similarity between sentence pairs that predicted the same word (within-pairs) than pairs that predicted a different word (between-pairs) within the left inferior temporal gyrus, extending into the medial temporal lobe including the left fusiform, hippocampus and parahippocampus as well as left cerebellum (p = 0.006; a cluster-randomization approach controlling for multiple comparisons over grid points).

https://doi.org/10.7554/eLife.39061.010

In order to estimate the underlying neuroanatomical source of the increased temporal similarity associated with the within-pair versus between-pair sentences, we repeated this analysis in source space. We first discretized the full brain volume using a grid. At each grid point, we constructed spatial filters using a ‘beamforming approach’ (a linearly constrained minimum variance technique; Van Veen et al., 1997) and applied it to the MEG data. Then, we performed the temporal similarity analysis on the time series from the spatial filters. The differences in the temporal similarity R-values were mapped on to the grid in each participant. These difference values were then morphed to the MNI brain and averaged. The source localization of the difference (corresponding to the difference of the topographic distribution, see Figure 3A: right panel) is shown in Figure 3B. It shows that the temporal pattern of neural activity was more similar in sentence pairs that predicted the same versus different words within a cluster over the left hemisphere (cluster-randomization controlling for multiple comparisons: p = 0.006). The strongest effect was found in the left inferior temporal lobe, as suggested by the 85% maximum difference of the temporal correlation values between the within-pair and between-pair sentences within the statistically significant cluster (see Figure 3—figure supplement 2A). The source extended medially into the left fusiform cortex, parahippocampus and hippocampus, as well as posteriorly into the left cerebellum. The comparison of a randomly selected subset of between-pair correlations that matched the number of within-pair correlations confirmed this finding (cluster-randomization controlling for multiple comparisons: p = 0.034, see Figure 3—figure supplement 1B for the statistically significant cluster and Figure 3—figure supplement 2B for the 85% maximum difference).

Discussion

We asked whether the prediction of distinct words in highly constraining contexts is associated with distinct spatial and temporal patterns of neural activity before the appearance of new bottom-up input. To this end, we used MEG in conjunction with an RSA approach to index brain activity as participants read sentences in which the final word was highly predictable from the context. Based on a spatial correlation measure, we were able to provide evidence that the prediction of specific individual words produced unique spatial patterns of brain activity. This activity was evident between 120 and 515 ms following the word prior to the predicted sentence-final word (SFW-1). Moreover, within this time window, using a temporal correlation measure, we show that the prediction of specific individual words produced distinct temporal patterns of neural activity, which localized to the left inferior temporal region and neighboring areas. To the best of our knowledge, this is the first study to show that unique spatial and temporal patterns of neural activity are associated with the prediction of distinct words during language processing.

Unique spatial patterns of neural activity are associated with the prediction of specific words, prior to the appearance of new bottom-up input

We found that the spatial pattern of neural activity (across sensors) was more similar in sentence pairs that predicted the same SFW (within-pairs) than in pairs that predicted a different SFW (between-pairs). This spatial similarity effect began at around 120 ms after the onset of the word before the SFW (i.e. the SFW-1). We interpret this finding as reflecting the greater spatial similarity in the pattern of brain activity produced by the prediction of the same word than the prediction of a different word. Before discussing this interpretation further, we first consider alternative possibilities.

One set of alternative interpretations is that, instead of reflecting similarities between the predicted SFW itself, the within- versus between-pair spatial similarity effect reflected a greater similarity in the pattern of brain activity evoked by the contexts of the sentence pairs that predicted the same word (within-pairs) versus a different word (between-pairs). The fact that the spatial similarity effect only became apparent after the onset of the word immediately preceding the SFW (i.e. after the SFW-1) suggests that it is unlikely to have reflected any general differences in these contexts (the full set of words prior to the SFW). Indeed, as noted in the Materials and methods, the two contexts within each pair were composed of distinct words, and any differences between members of a given pair in length (number of words) and complexity (number of clauses and syntactic complexity) were matched between pairs that constrained for the same SFW (within-pairs) and pairs that constrained for a different SFW (between-pairs). It is also unlikely that the spatial similarity effect reflected lexical differences of the SFW-1 itself because this always differed within pairs, and any differences in the visual complexity, frequency or syntactic class of the SFW-1 between the members of a given pair were again matched between pairs that constrained for the same word (within-pairs) and pairs that constrained for a different word (between-pairs). Finally, it is unlikely the spatial similarity effect reflected differences in the predictability of the SFW-1: the cloze probability of the SFW-1 was low (11% on average) and any difference between the members of a given pair in the cloze probability of the SFW-1 did not differ between pairs that constrained for the same SFW (within-pairs) and pairs that constrained for a different SFW (between-pairs).

Nonetheless, to rule out any possibility that the observed effect was driven by the lexical properties of the SFW-1, rather than the prediction of the SFW itself, we carried out an additional control analysis in a subset of sentence pairs that had the same SFW-1 but that predicted a different SFW (a subset of the between-pair sentences) and a subset of sentence pairs that constrained for these same SFWs, but that differed in the SFW-1 (a subset of the within-pair sentences). If the within-pair versus between-pair spatial similarity effect was driven by lexical similarities of the SFW-1, then we should have seen greater spatial similarity in pairs that contained exactly the same SFW-1, even though they constrained for a different SFW, than in pairs that contained a different SFW-1, even though they constrained for the same SFW. We found no evidence for this. Indeed, just as in our main analysis, the spatial patterns produced by the sentence pairs that predicted the same SFW (i.e. within-pairs) appeared to be more similar than the sentence pairs that predicted a different SFW (i.e. between-pairs), even though the between-pairs contained the same SFW-1. This strongly suggests that the observed effect reflects the prediction of the SFW rather than lexical processing of the SFW-1.

A second set of alternative interpretations might acknowledge that the increase in spatial similarity detected in the within-pair sentences reflects activity related to the prediction of a specific SFW. However, instead of attributing the effect to the predicted representation itself, they might attribute it to participants’ recognition of a match between the word that they had just predicted and a word that they had actually seen as the SFW earlier in the experiment. This seems unlikely because we found that the spatial similarity effect was just as large when the unexpected SFW of a pair was presented before the expected SFW, as when the expected SFW was presented first (see Figure 2—figure supplement 3). It is, however, conceivable that participants recognized a match between the word that they had just predicted and a word that they had predicted earlier in the experiment (even though this predicted word was never observed). For example, there is some evidence that a predicted SFW can linger in memory across four subsequent sentences, even if it is not actually presented (Rommers and Federmeier, 2018). This seems less likely to have occurred in the present study, however, where each member of a sentence pair was separated by at least 30 (on average 88) other sentences.

Our favored interpretation of the greater spatial similarity in brain activity produced by sentence pairs that constrained for the same word (within-pairs) versus a different word (between-pairs) is that it reflected activity associated with predicted SFW itself. This is by no means the first study to show evidence of lexico-semantic prediction before the onset of new bottom-up input during sentence processing. Several previous studies have reported evidence of such anticipatory processing following constraining relative to non-constraining contexts (see Kuperberg and Jaeger, 2016a, section 3.1), at least under experimental conditions that encourage the generation of high-certainty lexico-semantic predictions. What distinguishes the present study from this previous work is that it provides neural evidence that these lexico-semantic predictions are item-specific – that is, different predicted words are associated with spatially distinct patterns of neural activity.

This raises the question of exactly what type and grain of lexical information was reflected in these item-specific spatial patterns. In theory, an increased spatial similarity in association with sentence pairs that predicted the same upcoming word could have reflected greater similarity between the predicted word’s syntactic, semantic, phonological, and/or its orthographic features. For example, a particular spatial pattern associated with a predicted word ‘baby’ could, in theory, reflect activity at the level of its syntactic category (e.g. <noun>), its lexico-semantic features (e.g. <human>, <small >, <cries>), its particular orthographic form (/b-a-b-y/) and/or its particular phonological form (e.g. /’beibi/).

We were able to exclude the possibility that our analysis simply picked up on syntactic category similarities between predicted words, such as whether they represented nouns or verbs, which are known to have distinct neuroanatomical representations (e.g. Vigliocco et al., 2011). This is because we designed our study such that 50% of the predicted SFWs were nouns and 50% were verbs, allowing us to calculate the within-category spatial similarity between all pairs of sentences that predicted the same syntactic category. We compared these within-category spatial similarity values with the within-pair spatial similarity values, in which the predicted SFWs shared both syntactic category and lexico-semantic features. We found that the within-category spatial similarity values were significantly smaller than the within-pair spatial similarity values (Figure 2—figure supplement 4). These findings suggest that the within-pair spatial similarity effect did not simply reflect the prediction of the broad syntactic category of the SFW.

Instead, we suggest that the spatial similarity effect reflected similarities at the level of the semantic properties and features that defined the meanings of the predicted words. As noted in the Introduction, the multimodal semantic properties and features associated with words are thought to be represented within regions that are spatially distributed across the cortex (Damasio, 1989; Price, 2000; Martin and Chao, 2001). We suggest that our analysis picked up distinct spatially distributed patterns of neural activity that corresponded to the particular sets of features associated with distinct predicted words. For example, the prediction of the particular set of semantic properties and features corresponding to the word <baby> (e.g. <human>, <small>, <cries>) may have been reflected by the activation of a particular spatially distributed network that differed from the network reflecting the prediction of the particular set of semantic features corresponding to a different predicted word, <roses> (e.g. <plant>, <scalloped petals>, <fragrant smell>).

It is also possible that the increased spatial similarity in association with sentence pairs that predicted the same word reflected similarities of predictions generated at a lower phonological and/or orthographic level of representation. On this account, the prediction of semantic features led to the top-down pre-activation of information at these lower levels of the linguistic hierarchy before new bottom-up information became available to these levels (see Kuperberg and Jaeger, 2016a, sections 3 and 5 for discussion). The present study cannot directly speak to this hypothesis. This is because, for the most part, there is a one-to-one correspondence between the semantic features and the phonological or orthographic forms of words. However, the methods described here provide one way of addressing this question in future studies. For example, by examining the spatial similarity of sentence pairs that constrain for words that share orthographic features but that differ in their meanings (homonyms), it should be possible to dissociate the prediction of orthographic/phonological representations from the prediction of semantic features associated with a given lexico-semantic item.

In addition to suggesting that the prediction of specific words is associated with unique spatial patterns of neural activity, our findings also provide some information about time course of such activity in relation to the appearance of new bottom input. As noted above, the spatial similarity effect began at 120 ms after the onset of the word before the predicted SFW (i.e. SFW-1), even though the effect was not driven by the lexical properties or the predictability of the SFW-1 itself. This provides evidence that the prediction of the SFW was generated at the first point in time at which participants had sufficient information to unambiguously generate this prediction. For example, in the sentence ‘In the crib, there is a sleeping …', as comprehenders accessed the meaning of the word, <sleeping> , they may have also predicted the semantic features of <baby>. This type of account follows from a generative framework of language comprehension in which, following highly constraining contexts, comprehenders are able to predict entire events or states, along with their associated semantic features, prior to the appearance of new bottom-up input (e.g. Kuperberg and Jaeger, 2016a, sections 4 and 5; Kuperberg, 2016b; St. John and McClelland, 1990; Rabovsky et al., 2018). Importantly, however, we conceive of the spatial similarity effect detected here as primarily reflecting similarities at the level of semantic features (e.g. <human>, <small>, <crying>) associated with the predicted word (‘baby’), rather than similarities of the entire predicted events/states (e.g. the <baby sleeping in the crib> event versus the <newborn baby in the hospital> event) (see Kuperberg, 2016b). As noted above, we cannot tell from the current findings whether these predicted semantic features, in turn, led to the top-down pre-activation of specific phonological or orthographic word-forms.

Despite its early onset, the spatial similarity effect was fairly short-lived: it lasted until 515 ms following the onset of the SFW-1 (corresponding to 315 ms following its offset at 200 ms), and then dropped off in the second half of the interval before the onset of the SFW (see Figure 2B). This was confirmed by the cross-temporal spatial similarity matrix (Figure 2D). The precise reason for this transient pattern is unclear. It is possible that, the predicted information was not maintained over the relatively long interstimulus interval used in the present study (SOA: 1000 ms per word). On the other hand, a failure to detect neural activity over a delay does not necessarily imply that this information is not present. This idea has been recently discussed in relation to the notion of ‘activity-silent’ working memory, which holds that representations within working memory can be maintained in a silent neural state, instead of being accompanied by persistent delayed activity (Stokes, 2015b; Wolff et al., 2017). Such content-specific silent activity can only be detected, if it is in the focus of attention and task-relevant. On this account, in the present study, despite the fact that we were not able to detect it, the predicted information was still present during the interstimulus interval, and it only became available once new bottom-up input was encountered. Of course, this interpretation is speculative, particularly given our use of a very slow presentation rate. It will be important for future work to determine whether similar dynamics are associated with the prediction of upcoming words when bottom-up inputs unfold at faster, more naturalistic rates.

Finally, we note that the cross-temporal spatial similarity matrix showed that the increased spatial similarity to the within-pair sentences was only found along the diagonal line, rather than generalizing across time points. This suggests that the unique spatial pattern of brain activity associated with the prediction of specific words changed over time. We speculate that this may be because different properties associated with particular words became available at different times. For example, the different semantic features (e.g. <human>, <small>, <cries>) associated with the prediction of a specific word (e.g. ‘baby’) might have been recruited at different time points. As we discuss next, temporal binding may play a role in integrating these dynamically evolving spatial patterns of activity to instantiate specific lexico-semantic predictions.

Unique temporal patterns of neural activity within the left inferior and medial temporal lobe are associated with the prediction of specific words

In addition to being associated with unique spatial patterns of neural activity, we also found evidence that the prediction of specific words was associated with unique temporal patterns of neural activity. Specifically, across the time window that showed the increased spatial similarity effect, a cluster of MEG sensors revealed a greater similarity in the temporal pattern of brain activity in pairs of sentences that predicted the same SFW (within-pairs) versus a different SFW (between-pairs). Moreover, we localized the source of this effect to the left ventral and medial temporal lobe.

This observation is in line with a recent study that applied temporal RSA to intracranial EEG signals, and reported that the temporal pattern of neural activity within the left inferior temporal lobe encoded item-specific representations during picture naming (Chen et al., 2016). The current findings extend these previous results by suggesting that temporal similarity patterns corresponding to unique lexico-semantic items can be detected before new bottom-up input becomes available.

The precise functional significance of the temporal similarity effect is unclear. However, we suggest that it is consistent with a classic theory by Damasio (1989), who proposed that multimodal semantic features, represented in widely distributed regions of the cortex, become bound together through a process of ‘temporal synchrony’, and that this binding occurs within ‘convergence zones’, which act to unify these features into a discrete whole (Damasio, 1989). In the present study, it is possible that the unique temporal patterns of neural activity within the left ventral/medial temporal lobe played a functional role in binding the unique sets of semantic features that were represented by the unique spatial patterns of neural activity. Speculatively, these unique temporal patterns of neural activity may have also played a role in binding this information as it became available dynamically over time (as opposed to becoming available all at once). For example, by tracking (or perhaps even orchestrating) the particular time-course of accessing the distributed brain regions that represent the semantic features of <baby> (e.g. <human>, <small> and <cries>), a particular temporal signature may have functioned to bind the dynamically evolving and spatially distributed pattern of neural activity into a coherent lexico-semantic representation, corresponding to participants’ subjective experiences of predicting a specific word.

The localization of the temporal similarity effect to the left ventral temporal regions (left inferior temporal lobe and fusiform cortex) is consistent with the well-established role of these regions in lexico-semantic processing (Lüders et al., 1991; Lüders et al., 1986; McCarthy et al., 1995; Mummery et al., 1999; Nobre and McCarthy, 1995; Visser et al., 2010). In particular, it is consistent with the proposed role of these regions as ‘hubs’ that brings together widely distributed semantic features across the cortex (Patterson et al., 2007; Ralph et al., 2017). Such hubs may function as a ‘dictionary’ by mediating between widely distributed conceptual knowledge and specific word forms (orthographic and phonological knowledge) (Caramazza, 1996; Damasio et al., 1996) and/or they play a more domain-general role in semantic processing (Nobre et al., 1994; Patterson et al., 2007; Reddy and Kanwisher, 2006; Shimotake et al., 2015). By showing that this region can encode unique temporal patterns of neural activity that correspond to unique lexico-semantic predictions, our findings shed further light on how these regions might actually instantiate this type of binding.

Notably, the activity within the inferior temporal cortex extended into the medial temporal lobe (the parahippocampal gyrus and the hippocampus). While MEG source-modeling results within medial and subcortical regions should be interpreted with caution, the possible involvement of the hippocampus is interesting given other work that has implicated it as playing a crucial role in binding representations to generate predictions. A large literature from recordings in rats demonstrates that the hippocampus represents upcoming spatial representations as the rat is navigating (Gupta et al., 2012), and we have a good understanding of the physiological mechanisms supporting such predictions (Lisman and Redish, 2009). There is also growing evidence that these predictive mechanisms might generalize to the human hippocampus (Chen et al., 2011; Davachi and DuBrow, 2015; Harrison et al., 2006; Hindy et al., 2016; Schiffer et al., 2012). Moreover, recently, it was found that the temporal patterns in higher frequency bands recorded within the hippocampus were similar between a pre-picture interval and the picture itself (Jafarpour et al., 2017), suggesting a role in representing pre-activated non-verbal semantic information. Given these findings, it is conceivable that the hippocampus also plays an analogous role in language prediction. Indeed, Piai et al. (2016) used intracranial recordings in humans to demonstrate predictive effects in the hippocampus in a language task in which the sentence-final word had to be produced (Piai et al., 2016).

In addition to the medial temporal lobe, the activity also included the left cerebellum. Again, given the limited spatial resolution of MEG, this activation should be interpreted with caution. However, previous studies have reported bilateral cerebellum activation (right dominant) during language prediction (Bonhage et al., 2015; Lesage et al., 2017; Wang et al., 2018). Our findings seem to suggest that the cerebellum also may play a role in generating item-specific predictions in language processing.

Before concluding, we emphasize that this study does not speak to the debate about whether neural evidence of anticipatory processing, particularly at the level of specific word-forms (rather than semantic features), can be detected during sentence comprehension under conditions that do not encourage predictive processing (DeLong et al., 2005; Nieuwland et al., 2018; Yan et al., 2017); nor does it address the question of whether such predictions are probabilistic in nature. In the present study, we deliberately used highly constraining contexts to encourage participants to generate high certainty specific lexico-semantic predictions, and we used a long interstimulus interval between words to ensure that we would be able to detect any representationally specific neural activity if it was present. We see the unique contribution of our study as providing evidence that, when we know that item-specific lexico-semantic are generated, they are associated with unique spatial and temporal patterns of neural activity. These findings pave the way toward the use of these methods to determine whether and when such item-specific lexico-semantic representations become available as language, in both visual and auditory domains, unfolds more rapidly in real time.

Conclusion

In conclusion, we used MEG to show that unique patterns of neural activity are associated with the prediction of specific lexico-semantic items during language processing. We showed that unique spatial patterns became active at around 100 ms after a word was unambiguously predicted and that their activation was transient and dynamic. In addition, we show that the prediction was accompanied by unique temporal patterns of brain activity that localized to the left inferior and medial temporal lobe.

Materials and methods

Design and development of stimuli

We developed a stimulus set of 120 pairs of sentences in Mandarin with highly constraining contexts. The two contexts within each pair were distinct from one another, and they had no content words in common (with the exception of five pairs), but they each strongly predicted the same sentence-final word (SFW). For example, in Figure 1A, both sentences S1 and S2 predicted the word, ‘baby’. In half of these sentences, the expected final word was a noun and in the other half, it was a verb.

To select and characterize this final set of sentences, we began with an initial set of 208 pairs and carried out a cloze norming study in 30 participants (mean age: 23 years; range: 18–28 years old; 15 males), who did not participate in the subsequent MEG study. In this cloze study, sentence contexts were presented without the SFW (e.g. ‘In the crib there is a sleeping …') and participants were asked to complete the unfinished sentence by writing down the most likely ending. The two members of each sentence pair were counterbalanced across two lists (with order randomized within lists), which were each seen by half the participants. Testing took approximately 40 min per participant.

To calculate the lexico-semantic constraint of each sentence context, we tallied the number of participants who produced the most common completion for a given context. We retained 66 pairs in which 73% of the participants predicted the same SFW, that is at least 11 out of 15 participants filled in the same word in each sentence pair. We then revised 103 sentences (54 sentences in list 1 and 49 in list 2) to make them more constraining, and we re-tested them in the same group of participants. After this second round of cloze testing, we selected the final set of 120 sentences for the MEG experiment. In the final set of stimuli, the lexico-semantic constraints of 109 pairs were above 70% and the constraints of the remaining 11 pairs were slightly lower (mean: 58%; SD: 12). Across all pairs, the mean lexico-semantic constraint was 88% (SD: 12).

We then generated full sentences by adding a SFW to each member of a pair. In one member of each pair, this SFW was highly predictable; it was the most common word filled by the cloze participants (e.g. ‘baby’ following context S1, ‘In the crib there is a sleeping…'). In the other member of the pair, we selected a word that was semantically related to the highly predicted word but was not produced by any of the participants in the cloze norming, with the whole sentence still being plausible (e.g. ‘child’ following context, S2, ‘In the hospital, there is a newborn…'). Thus, for this sentence, the lexical cloze probability was zero, see Figure 1A for examples. All sentence contexts (e.g. S1 and S2) were combined with both lexically predicted (e.g. A: ‘baby’) and unpredicted (e.g. A’: ‘child’) SFWs, for example S1-A, S1-A’, S2-A, S2-A’ and the SFWs were then counterbalanced across two lists, ensuring that, in the MEG session, each participant saw both members of each sentence pair, but no participant saw the same SFW twice. Within each list, sentences were pseudo-randomized so that participants did not encounter more than three expected or unexpected SFWs in succession, and the two members of each pair were presented apart from each other, with at least 30 (on average 88) sentences that predicted different words in between. All Mandarin sentences, together with their English translations, are available in the Figure 1—source data 1.

We measured a number of properties of the sentence contexts up until the SFWs and determined whether these properties differed systematically between pairs of contexts that predicted the same SFWs (i.e. within-pairs) and pairs of contexts that predicted different SFWs (i.e. between-pairs). We counted the number of words in each sentence context (ranging from 4 to 12 words), and the number of clauses within each sentence (ranging from 1 to 4 clauses). We also marked whether there was embedded dependency in each sentence. Then, for each possible pair of sentences, we categorized whether their contexts differed (marked as 1) or not (marked as 0) from one another on each of these three measures. These values were used as the dependent variable in independent sample t-tests (within-pairs; N = 120; between-pairs; N = 120*119*2 = 28560). The tests showed that any differences in the number of words, number of clauses, and syntactic complexity were matched between pairs that constrained for the same word (within-pairs) and pairs that constrained for a different word (between-pairs): all ps > 0.20.

We also examined several lexical properties of the SFW-1 to make sure that any observed spatial or temporal similarity effect could not be explained by lexical processing of the SFW-1 itself. The Chinese SFW-1s as well as their English translations can be found in Figure 1A—source data 1. We coded the syntactic class of the SFW-1 as either a content word (verb, noun, adjective, adverb) or a function word (pronoun, classifier, conjunction, particle, prepositional phrases) and marked whether the syntactic class of the SFW-1 differed (marked as 1) or not (marked as 0) within members of each possible pair of sentences. In Chinese, the SFW could be either a word or a phrase, each containing several characters (ranging from 1 to 5). We assessed the visual complexity of each SFW-1 by aggregating the number of strokes of all characters, and, for each possible pair of sentences, we calculated the absolute difference in the number of strokes of the SFW. We also extracted word frequency values for each SFW-1 (measured as the log10 transformed n-gram frequency out of one million) from Sun, 2003 in 82% of the stimuli and from Da (2004) in 10% of the stimuli; the values of the remaining 8% of SFW-1s whose frequency did not appear in either database were marked as zero, and calculated the absolute difference of the word frequency values of the SFW-1 for each possible pair of sentences. These values were used as the dependent variable in independent sample t-tests (within-pairs; N = 120; between-pairs; N = 120*119*2 = 28560). The tests showed that any differences in the syntactic class, the visual complexity and the frequency of the SFW-1 were matched between pairs that constrained for the same word (within-pairs) and pairs that constrained for a different word (between-pairs), all ps > 0.40.

Finally, we assessed the cloze probability of the SFW-1 in a new group of 30 participants (mean age: 24 years; range: 19–28 years old; 15 males). In this test, sentence contexts were presented up until the SFW-2 (e.g. ‘In the crib there is a …'), and the 120 pairs of sentences were counterbalanced across two lists, which were each seen by 15 participants. The cloze probability of the SFW-1 was calculated by tallying the number of participants who produced the SFW-1 for a given context. Overall, the cloze of the SFW-1 was low (11.33% ± 20.25% on average). Once again, for each possible pair of sentences, we calculated the absolute difference in the cloze probability of the SFW-1 and carried out an independent sample t-test. Once again, any differences in cloze probability were matched between pairs that constrained for the same word (within-pairs: 17.00% cloze difference) and pairs that constrained for a different word (between-pairs: 17.28% cloze difference), t(28678) = −0.136, p = 0.89.

Participants in the MEG study

The study was approved by the Institutional Review Board (IRB) of the Institute of Psychology, Chinese Academy of Sciences. Thirty-four students from the Beijing area were initially recruited by advertisement. They were all right-handed native Chinese speakers without histories of language or neurological impairments. All gave informed consent and were paid for their time. The data of eight participants were subsequently excluded because of technical problems, leaving a final MEG dataset of 26 participants (mean age 23 years, range 20–29; 13 males).

Procedure

MEG data were collected while participants sat in a comfortable chair within a dimly-lit shielded room. Stimuli were presented on a projection screen in a grey color on a black background (visual angle ranging from 1.22 to 2.44 degrees). As shown in Figure 1A, each trial began with a blank screen (1600 ms), followed by each word with an SOA of 1000 ms (200 ms presentation with an inter-stimulus interval, ISI, of 800 ms). The final word ended with a period followed by a 2000 ms inter-trial interval. After one-sixth of the trials, at random, participants read either a correct or an incorrect statement that referred back to the semantic content of the sentence that they had just read (for example, S1-A and S2-A’ in Figure 1A might be followed by the incorrect statement, ‘There is an old man.'). Participants were instructed to judge whether or not the statements were correct by pressing one of two buttons with their left hand. This helped ensure that participants read the sentences for comprehension. In all other trials, the Chinese word '续 继' (meaning 'NEXT') appeared, and participants were instructed to simply press another button with their left hand within 5000 ms in order to proceed to the next trial.

The 240 sentences were divided into eight blocks, with each block lasting about 8 min. Between blocks there was a small break during which participants were told that they could relax and blink, but to keep the position of their heads still. Participants could start the next block by informing the experimenter verbally. The whole experiment lasted about 1.5 hr, including preparation, instructions and a short practice session consisting of eight sentences.

MEG data acquisition

MEG data was collected using a CTF Omega System with 275 axial gradiometers at Institute of Biophysics, Chinese Academy of Sciences. Six sensors (MLF31, MRC41, MRF32, MRF56, MRT16, MRF24) were non-functional and were therefore excluded from the recordings. The ongoing MEG signals were low-pass filtered at 300 Hz and digitized at 1200 Hz. Head position, with respect to the sensor array, was monitored continuously with three coils placed at anatomical landmarks (fiducials) on the head (forehead, left and right cheekbones). The total movement across the whole experiment was, on average, 8 mm across all participants. In addition, structural Magnetic Resonance Images (MRIs) of 25 participants were obtained using a 3.0T Siemens system. During MRI scanning, markers were attached in the same position as the head coils, allowing for later alignment between these MRIs and the MEG coordinate system.

MEG data processing

MEG data were analyzed using the Fieldtrip software package, an open-source Matlab toolbox (Oostenveld et al., 2011). In order to minimize environmental noise, we applied third order synthetic gradiometer correction during preprocessing. Then, the MEG data were segmented into 4000 ms epochs, time-locked from the onset of two words before the SFW (SFW-2) until 2000 ms after the onset of the SFW. Trials (i.e. whole epochs) contaminated with muscle or MEG jump artifacts were identified and removed using a semi-automatic routine. We then carried out an Independent Component Analysis (ICA; Bell and Sejnowski, 1997; Jung et al., 2000) and removed components associated with the eye-movement and cardiac activity from the MEG signal (about five components per subject). Finally, we inspected the data visually and removed any remaining artifacts. On average, 96% ± 3.4% of trials were retained.

Spatial Representational Similarity Analysis

Calculation of spatial similarity time series

A schematic illustration of the spatial representational similarity analysis (RSA) approach is shown in Figure 1B. First, we detrended and applied a 30Hz low pass filter to the MEG data. Next, in each participant, for each trial, and at each time sample, we extracted a vector of MEG data that represented the spatial pattern of activity across all 269 MEG sensors (6 of 275 sensors were not operational). We then quantified the degree of spatial similarity of MEG activity produced by the two members of each sentence pair predicting the same SFW (e.g. between S1-A and S2-A’, in Figure 1A) by correlating their spatial vectors at consecutive time samples across the 4000ms epoch. This yielded a time-series of correlations (Pearson’s r values) reflecting the degree of spatial similarity at each time sample between sentences that predicted the same SFW (e.g. time-series R1within and R2within, see Figure 1B). We refer to these as within-pair spatial similarity time series. After artifact rejection, in each participant, there were, on average, N = 111+/-8 complete within-pair spatial similarity time series. We then averaged these time series together to yield an average within-pair spatial similarity time series within each participant (1Ni=1NRwithini; Figure 1B).

We then repeated this entire procedure, but this time we correlated spatial patterns of MEG activity between pairs of sentences that predicted a different SFW, for example, between S1-A and S3-B (Figure 1A). This yielded 2N(N-1) between-pair spatial correlation time courses, for example R1between and R2between (Figure 1B). We again averaged these together to yield a time series of R-values within each participant (12N(N-1)i=12N(N-1)Rbetweeni; Figure 1B), which reflected the degree of similarity between spatial patterns of activity elicited by sentences that predicted different SFWs at each time sample (i.e. between-pair spatial similarity time series). Figure 2B shows the averages, across all participants, of the within-pair and the between-pair spatial similarity time series (see Figure 2—source data 1).

Calculation of cross-temporal spatial similarity matrices

To characterize how temporally sustained the spatial patterns were (see also King and Dehaene, 2014; Stokes et al., 2015a), in each participant, for each sentence pair that constrained for the same SFW, we correlated the spatial pattern vector between one member of the pair (e.g. S1-A) at a particular time sample (e.g. t1) with that of the other member (e.g. S2-A’) at all time samples (e.g. from t1 to tn), thereby constructing cross-temporal similarity matrices for all within-pair sentences, with each entry representing the spatial similarity between two sentences at two time samples (e.g. R(i,j) represents the correlation between S1-A at time i and S2-A’ at time j). In order to increase computational efficiency, we down-sampled the data to 300 Hz, and we smoothed the resulting correlation values in time with a Gaussian kernel (40 ms time window, SD: 8 ms). We then averaged the cross-temporal similarity matrices across all within-pair sentences within each participant and then across participants (Figure 2D: left). The R-values along the diagonal reflect the spatial similarity at corresponding time samples (R(i,j) when i = j; that is the time series of similarity R-values as described in Figure 1B), while the R-values off the diagonal reflects cross-temporal spatial similarity. We then repeated this entire procedure for pairs of sentences that predicted a different SFW (between-pairs). We randomly selected N between-pairs to match with the N within-pairs for averaging in order to increase computational efficiency (Figure 2D: middle). Figure 2D (right) shows the difference for the group-averaged within-pair and between-pair cross-temporal spatial similarity matrices.

Statistical testing

As can be seen in Figure 2A, the averaged within-pair and the between-pair spatial similarity time series showed a sharp increase in R-values at around 100 ms after the onset of the word before the SFW (SFW-1) lasting for about 400 ms (i.e. 300 ms into the ISI after the SFW-1 offset) before sharply decreasing again. This pattern of a sharp increase and decrease in spatial correlations was also seen in association with the previous word (SFW-2) as well as the following word (SFW). In order to objectively quantify the time-window over which this general increase in spatial similarity R values was sustained during the prediction period, we compared the averaged within-pair and between-pair spatial similarity time series against a threshold of R = 0.04 based on visual inspection of the R-values in the prediction time window. We found an increase in R-values from −880 ms to −485 ms (i.e. 120 ms to 515 ms relative to the onset of SFW-1), as well as from −1897 to −1507 ms before the onset of the SFW (i.e. 103 ms to 493 ms relative to the onset of SFW-2) (Figure 2B). Similar results were found for a threshold of R = 0.03.

We then averaged across the −880 – −485 ms interval before the onset of the SFW and carried out paired t-tests to determine whether, collapsed across this time window, the spatial pattern of MEG activity produced by sentence pairs that predicted the same SFW was significantly more similar than the spatial pattern of MEG activity produced by sentences that predicted different SFW (i.e. within-pair vs. between-pair spatial correlation R values). We repeated the same analysis for the −1897 – −1507 ms interval before the onset of the SFW (i.e. −897 – −507 ms before the onset of the SFW-1).

To test for cross-temporal statistical differences in spatial similarity patterns produced by sentence pairs that predicted the same versus different SFWs while controlling for multiple comparisons over time, we applied a cluster-based permutation approach (Maris and Oostenveld, 2007): We first carried out paired t-tests at each data time sample in the cross-temporal spatial similarity matrices within the 1000 ms interval between the onset of SFW-1 and the onset of SFW. Data points that exceeded a pre-set uncorrected p-value of 0.05 or less were considered temporal clusters. The individual t-statistics within each cluster were summed to yield a cluster-level test statistic — the cluster mass statistic. We then randomly re-assigned the spatial similarity R values across the two conditions (i.e. within-pair and between-pair) at each data point within the matrix, within each participant, and calculated cluster-level statistics as described above. This was repeated 1000 times. For each randomization, we took the largest cluster mass statistic (i.e. the summed T values), and, in this way, created a null distribution for the cluster mass statistic. We then compared our observed cluster-level test statistic against this null distribution. Any temporal clusters falling within the highest or lowest 2.5% of the distribution were considered significant.

Temporal Representational Similarity Analysis

Construction of temporal similarity maps at sensor level

In each participant, at each sensor for each trial, we considered the MEG time series in the −880 – −485 ms interval before the onset of the SFW — that is, the time-window over which we observed the general increase in spatial similarity R values during the prediction period (see Figure 2A). At each sensor, we then correlated this time series within this window between the two members of each sentence pair that predicted the same SFW (e.g. between S1-A and S2-A’, in Figure 1A) to yield an R value representing the degree of temporal similarity: an R value of 1 implies that the two time series are in perfect synchrony; an R value of 0 implies that the two time series are not correlated, while an R value of −1 implies that the two time series are anti-correlated. Together, these R values at each sensor yielded within-pair temporal similarity topographic maps for each pair, for example topographic maps R1within and R2within, see Figure 1C. We then averaged across all the within-pair temporal correlations at each sensor to yield an average within-pair temporal similarity topographic map within each participant and then averaged across participants (see Figure 1C).

We then repeated this procedure, but this time correlating time series from MEG sensors produced by pairs of sentences that predicted a different SFW, yielding topographic maps of the between-pair temporal correlations (e.g. R1between and R2between in Figure 1C). These maps were again averaged together to yield an average topographic map of R values within each participant, and then averaged across participants (Figure 1C).

Construction of temporal similarity maps at source level

We also constructed temporal similarity maps at the source level. We estimated the MEG signals at the source level by applying a spatial filter at each grid point using a beamforming approach (Van Veen et al., 1997). We computed a linearly constrained minimum variance (LCMV; Van Veen et al., 1997) spatial filter on the 30 Hz low-pass filtered (and linearly detrended) data from onset of SFW-1 to 1000 ms after onset of SFW (i.e. −1000–1000 ms relative to SFW onset). The LCMV approach estimates a spatial filter from a lead field matrix and the covariance matrix of the data from the axial gradiometers. To obtain the lead field for each participant, we first spatially co-registered the individual anatomical MRIs to the sensor MEG data by identifying the fiducials at the forehead and the two cheekbones. Then a realistically shaped single-shell head model was constructed based on the segmented anatomical MRI for each participant (Nolte, 2003). Each brain volume was divided into a grid with 10 mm spacing and the lead field was calculated for each grid point. Then the grid was warped on to the template Montreal Neurological Institute (MNI) brain (Montreal, Quebec, Canada). The MNI template brain was used for one participant whose MRI image was not available. The application of the LCMV spatial filter to the sensor-level data resulted in single-trial estimates of time series at each grid point in three orthogonal orientations. To obtain one signal per grid point we projected the time series along the direction that explains most variance using singular value decomposition. In order to construct temporal similarity maps in source space, we followed the same procedures as above, by correlating the time series at each grid point. The grand-average similarity values were interpolated onto the MNI template brain (Figure 3B; see Figure 3—source data 1).

Testing for significant differences between the within- and between-pair temporal similarity maps

To compare the within-pair vs. between-pair temporal correlation R values statistically, both at the sensor level and at the source level, we took a cluster-based permutation approach, controlling for multiple comparisons over sensors or grid-points (Maris and Oostenveld, 2007). At each sensor/grid point, in each participant, we compared the mean differences in the temporal similarity R values between sentence pairs predicting the same word (i.e. within-pair) versus a different word (i.e. between-pair). Sensors within 40 mm that exceeded the 95th percentile of the mean difference were considered clusters. We used the mean difference for thresholding the clusters in order to account for the overall R-value difference across participants. In source space, clusters were formed by contiguous grids points. Within each cluster, we then summed the mean differences of R values at each sensor/grid-point to yield a cluster-level test statistic — the cluster mass statistic. Next, we randomly re-assigned the R-values across the two conditions (i.e. within-pair and between-pair) at each sensor/grid within each participant, and calculated cluster-level statistics as described above. This was repeated 1000 times. For each randomization, we considered the largest cluster mass statistic (i.e. the summed mean difference within a cluster to create a null distribution for the cluster mass statistic). Then we compared our observed cluster-level test statistic against this null distribution. Any clusters falling within the highest or lowest 2.5% of the distribution were considered significant.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
    A corpus-based study of character and bigram frequencies in Chinese e-texts and its implications for Chinese language instruction
    1. J Da
    (2004)
    The Studies on the Theory and Methodology of the Digitalized Chinese Teaching to Foreigners: Proceedings of the Fourth International Conference on New Technologies in Teachingand Learning Chinese. pp. 501–511.
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
    Prediction, sequences and the hippocampus
    1. J Lisman
    2. AD Redish
    (2009)
    Philosophical Transactions of the Royal Society B: Biological Sciences 364:1193–1201.
    https://doi.org/10.1098/rstb.2008.0316
  35. 35
  36. 36
  37. 37
    Prediction Signatures in the Brain: Semantic Pre-Activation during Language Comprehension
    1. B Maess
    2. F Mamashli
    3. J Obleser
    4. L Helle
    5. AD Friederici
    (2016)
    Frontiers in Human Neuroscience, 10, 10.3389/fnhum.2016.00591, 27895573.
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
    Spatiotemporal searchlight representational similarity analysis in EMEG source space
    1. L Su
    2. E Fonteneau
    3. W Marslen-Wilson
    4. N Kriegeskorte
    (2012)
    Pattern Recognition in Neuroimaging (Prni), 2012 International Workshop On.
  67. 67
    Mapping tonotopic organization in human temporal cortex: representational similarity analysis in EMEG source space
    1. L Su
    2. I Zulfiqar
    3. F Jamshed
    4. E Fonteneau
    5. W Marslen-Wilson
    (2014)
    Frontiers in Neuroscience, 8, 10.3389/fnins.2014.00368, 25429257.
  68. 68
    Chinese lexicon
    1. M Sun
    (authors) (2003)
    973 Project . ID G1998030501A-03.
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75

Decision letter

  1. Matthew H Davis
    Reviewing Editor; University of Cambridge, United Kingdom
  2. Joshua I Gold
    Senior Editor; University of Pennsylvania, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Specific lexico-semantic predictions are associated with unique spatial and temporal patterns of neural activity" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Joshua Gold as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The manuscript describes an MEG study which shows greater spatial and temporal similarity of neural responses prior to the onset of an identical target word (e.g. "baby") in two sentences that share the same predicted final word ("In the crib, there is a sleeping…" vs "In the hospital, there is a newborn…") compared to equivalent responses in entirely unrelated sentences.

This greater similarity is seen over an extended time period during and after presentation of the penultimate word of the sentence ("sleeping" and "newborn"), though neural similarity is no longer apparent for around 400ms prior to the onset of the sentence final word, which was either as predicted ("baby") or a synonym ("child"). Source localisation of sensors showing maximum similarity localises this effect to an extended set of left inferior and medial temporal regions, "consistent with the well-established role of these regions in lexico-semantic processing". Results are interpreted as providing evidence in favour of "pre-activation of distinct words during language processing" and hence predictive computations during sentence comprehension.

I think that this is an informative and potentially important paper. It introduces new methods and findings into an area of ongoing theoretical interest concerning the role of predictive processes in sentence comprehension. As the authors are aware, however, such strong conclusions concerning predictive processes remain controversial. The reviewers of the paper were all in agreement that additional analyses were required to rule out some alternative factors that might also explain the observed findings.

Essential revisions:

1) The most critical point which I'd ask you to address is to rule out some alternative explanations of your observations. The manuscript already includes some additional analyses in which comparisons are made between "within-pair" vs "within-category/between-pair" sentences. However, the reviewers were concerned that this is only one of many alternative factors that could explain the observations.

The reviewers suggested, and I agreed, that there are a number of additional factors that could lead to similar activation on the penultimate word of the sentence (SFW-1) and which must be ruled out if you are to conclude that your effect is driven by prediction of the sentence final word (SFW). Among the factors suggested by reviewers, include:

– form-based properties of the penultimate word (word length, orthographic or phonological similarity, etc)

– lexical/semantic properties of the penultimate word (syntactic class, word frequency, imageability/concreteness, etc etc)

– incidental properties of the sentence up-to and including the penultimate word (e.g. number of words up to that point, syntactic complexity – e.g. number of clauses, embedded dependencies, etc)

To address this point you can do two things: (1) assess whether these factors are more similar for your within-sentence pairs than for your between-sentence pairs, (2) assess whether increased neural similarity would still be observed in your MEG data when these other factors are matched.

As you might be thinking, though, performing these second analyses will be difficult using the methods in the present manuscript. You face a severe loss of power/sensitivity as you progressively divide the materials into smaller and smaller subsets. (Incidentally, I felt that you don't need to match for the number of items compared, though I'm sure there are others who would be reassured by these analyses).

I'd therefore like to suggest an alternative method for running these analyses and one which doesn't require you to run analyses on subsets of materials. The method was (AFAIK) introduced in a paper by Carlin et al. (2011, Current Biology):

https://doi.org/10.1016/j.cub.2011.09.025

This is a method in which you partial out extraneous or unmatched factors while performing RSA analysis. In Carlin's case for face perception, this was achieved using partial spearman correlations to rule out physical features when comparing gaze direction. However, no doubt other statistical methods (e.g. multiple linear regression, etc) can also be applied. This comes under the rubric of "representational geometry analysis".

I think the more that you can do to ensure that other aspects of your sentence pairs, and the penultimate word of these sentence pairs, does not explain your observations the more satisfied readers of the paper will be that the only plausible explanation of your findings is that there is a neural signature of the predicted final word.

2) Even if these analyses confirm that other factors can't explain your findings, then you still need to explain what predictive pre-activation means giving that this neural effect is absent for the ~400ms immediately prior to the onset of the sentence final word. To my mind, this does not negate your conclusions regarding lexico-semantic prediction – but it does mean that a more nuanced mechanism must be involved. In particular, the idea that this reflects "activity silent working memory" or pre-activation of specific lexical-semantic properties of target words seemed like a stretch to one of the reviewers and I would agree. It might be, though, that by considering similarities between orthographic or semantic properties of the SFW in different sentences the authors could provide additional evidence in this regard. In the absence of this, though, I think that the authors should explain that their findings are consistent with pre-activativation while acknowledging the degree to which other interpretations (e.g. integration) might be possible.

3) One further point that two of the reviewers were confused by – and I think must be clarified – concerns the order of presentation and blocking of sentence presentation. Is it the case that the sentence pairs were presented on successive trials? I would hope not, since this is a serious confound for RSA analyses in fMRI and could also be problematic for MEG. I think it's the case that trials for the within-sentence pairs are no closer together in time than a randomly selected between-sentence pair, but I couldn't see this unambiguously stated in the manuscript. I'd like for you to confirm this and (if possible) report further analyses in which temporal distance or temporal order of trial pairs is excluded as a nuisance factor in neural similarity analyses.

These three points should be the main focus of a revision to the manuscript. In addition to these main points, I've also appended the three reviews that I've received. These include many other minor points, suggested changes and requests for clarification which you would do well to heed. There's always scope to clarify methodological aspects of a complex study like this.

However, one methodological concern (from reviewer 3), which I'll not insist on you addressing, concerns the separation of spatial and temporal RSA methods. I agree that spatio-temporal analyses in single-subject source space could detect neural effects that are missed by their sensor space and time-based analyses. However, to my mind this methodological concern could explain the absence of some effects in the existing analyses, but not the presence of reliable effects. Since there's a lot of work involved, and substantial correction for multiple comparisons would reduce sensitivity, I'm going to give you the option of not performing these analyses and instead discussing potential limitations and future directions that could be taken in similar future work.

Reviewer #1:

This manuscript by Wang et al. reports an MEG study in the field of neurocognition of language, investigating whether or not lexico-semantic properties of specific words are 'pre-activated' in a constraining sentence context. To this end, the authors used an analysis approach based on representational similarity analysis. Participants read sentences, presented one word at a time (stimulus presentation time 200 ms; SOA 1000 ms). Sentences were presented in pairs, and constructed such that both sentences of a pair constrained for the same sentence final word (e.g., baby). Analyses focused on brain signals -2000 to +1000ms relative to the onset of the sentence-final word. A spatial similarity time course was calculated for each pair, by calculating the correlation between a vector of activation values per channel, at each time point, between the two sentences of the pair ('within pair' condition). The same was repeated for each sentence relative to each other sentence, i.e., relative to all sentences not in the pair, yielding similarity time courses for pairs of sentences predicting different SFWs ('between pair' condition). The authors' expectation was that these similarity indices should be higher in the within condition, i.e., between the two sentences constraining for the same word, as compared to the 'between' condition. There are two results in this paper: The authors first observed an increased similarity independent of condition, roughly between 100 and 500 ms post onset of each word (i.e., for the sentence final word/SWF itself as well as the two preceding words, falling into the analysis time window, i.e., SWF-1 and SWF-2). Investigating further these time windows, the authors observed an increased similarity for within as compared to between pairs on the word preceding the critical word, i.e., SWF-1 (120 to 515 ms post onset SWF-1 or -880 to -485 ms relative to SWF; Figure 2B). By generalizing this analysis over time, i.e., by correlating each time point of the epoch with activity vectors from each other time point of the epoch, the authors show that this increased similarity was temporally restricted and did not persist into the 800 ms inter-trial interval preceding the sentence final word (Figure 2D). As control analyses, the authors demonstrate that the same results are obtained (albeit somewhat weaker), when matching the number of different pairs to the number of similar pairs. Also, the result could not be accounted for by noun vs. verb (i.e., word category level) similarity. Finally, the authors implemented a similar approach for temporal similarity, i.e., the similarity over time of activation time courses, and localized this effect to lateral and medial left temporal regions, extending into cerebellum. The authors conclude that they demonstrate the activation of unique spatial and temporal patterns of brain activity associated with the pre-activation of specific lexico-semantic items (i.e., words) during language processing, and that this pre-activation was transient and dynamic.

General Evaluation

This study approaches an important and also timely research question, as it is currently strongly debated whether and how predictive processes are involved in language processing. The application of RSA to this question is also innovative and, as far as I can tell, technically implemented in an excellent way. However, this manuscript leaves several questions open which I will detail below, and certain aspects of the study design, in my view, call into question the validity of interpreting the reported results as a predictive pre-activation of words.

Major Points

Most importantly, I think that the authors provide no evidence to actually support their claim that the observed increased similarity among within-pair sentences, at the pre-final word, actually reflects the pre-activation of the sentence final word. I think the most plausible account is that context-dependent constraint is already high on the pre-final word, so that alternative interpretations like ease of integration of the pre-final word are at least as likely as the pre-activation account that the authors try to propose here. At the very least, these two alternative accounts have to be discussed equally; if the authors give preference to the pre-activation account, this should be grounded in reliable additional empirical evidence. (In the Discussion section, the authors argue that 'greater similarity between sentence contexts' is an unplausible account for their result, given that the two sentences of each pair were composed of different words and, in particular, the SFW-1 prefinal word always differed. However, the sentences were constructed such that the constraint was high, so this does not rule out the possibility of expectation / ease of integration effects on the prefinal word, in my opinion.)

Related to this, it is inconsistent with their interpretation that the similarity effect disappears with the offset of the word-induced brain response of the pre-final SWF-1 word, i.e., around 500 ms prior to the onset of the sentence final target word. A true prediction / pre-activation should persist. The a-posteriori interpretation based on 'activity silent working memory' is a vast over-interpretation of the results, based on no data. Also the proposal that the pre-activation may involve a sequence of activation of different lexico-semantic properties of the target word is speculative beyond the interpretation of results, and not grounded in any data.

Also related to this, I think that the control analysis testing for noun/verb differences is not sufficient to warrant the general claim that 'higher order grammatical and semantic' effects? differences? cannot influence the present result. The noun/verb category difference is just one of many such features. I would find it much more convincing if the authors could show in their stimulus materials, that no such differences exist on the target position in the critical as compared to the control contrasts, as well as for the positions preceding the sentence final word. Also, I think behavioral testing could easily quantify the degree of expectancy/constraint on the pre-final words. This kind of additional data would allow for a more empirically grounded interpretation of the similarity effect on the pre-final word.

Also related to this, I wonder whether there should not be similarity effects also on the target words itself. Even though at the target word position not the same words were presented (e.g., baby and child in a sentence pair both constraining for baby), the similarity between those is still substantially higher than, e.g., baby and fridge (example from Figure 1).

Even more so, should not the pre-activation lead to increased similarity between the pre-word period and the brain activation elicited during the actual presentation of the word itself?

It is unclear to me, why sentences were presented in pairs. The pair-wise presentation of 120 sentence pairs each constraining for the same sentence-final word, without doubt can induce strategic expectation effects towards the end of the sentence – which has nothing to do with the kind of highly automatized predictive processes as postulated in the predictive coding framework. I think that this problem is not solved by the fact that only one of the two sentences contained the constrained-for target word at the SWF position. Actually, the fact that one pair contained an unexpected word could even increase the strategic handling of these sentence pairs.

It is also unclear to me why the very slow and un-naturalistic presentation rate was chosen. Again, I think that this can induce strategic processes, as well as increased working memory load, which may influence the RSA results. (I also tend to think that this design was chosen as the ITI preceding the SFW is the most obvious time window to search for predictive pre-activation, see my first point above. Given that no results were reported for this 'silent' pre-word time window, I tend to be very critical about interpreting the results as predictive pre-activation of the sentence final word.)

Combined, these points suggest to me that the authors interpret their results too strongly. Predictive pre-activation is claimed in the title, Abstract, and Discussion. I think the authors should generally tone down these claims and provide a more realistic and balanced account for their interesting result.

In their control analysis, the authors demonstrate that the within-pair similarity is also higher than the similarity calculated on the remaining sentences within target words with nouns or verbs. They use this to claim that their result cannot spuriously result from higher-order syntactic or semantic effects. I think this control analysis is nice, but its interpretation goes way too far, as the authors only tested one of many possible such linguistic features. Also, it is unclear why this post-hoc analysis is necessary at all, if (as I expect) authors controlled stringently for such obvious differences in their item construction. Even is the latter were not the case, it should be possible to a posteriori select the sentences for the between-pair analysis, out of all possible combinations, such that they are optimally matched to the 120 critical pairs?

Concerning the source analysis of the temporal similarity analysis: I am not expert enough in MEG beamforming to really judge this, but it appears to me that the source localization shown in Figure 3B does not seem to be a plausible generator of the scalp distribution of the difference effect shown in Figure 3A, left-most panel?

A lot of information about the stimulus construction and item materials is missing. The authors describe how sufficiently high cloze probability was assured in the sentences of the 120 pairs. However, many further aspects are important, like word category, word frequency, concreteness, etc. In particular, I think that it is important to assure that such obvious lexical and semantic properties are (a) balanced between the within-pair and the between-pair comparisons, as these are the final statistical contrast on which all interpretations are based; (b) that similar information is provided for the two words preceding the sentence-final word. (c) Furthermore, I think it is important to also provide data for the cloze probability of the pre-final words, in particular given that this is where the effect is found (see also my first point above).

Parts of the Discussion section and interpretation of the data are far too speculative, including the discussion of specific semantic properties that might be activated. For example, the authors write that "These findings provide strong evidence that unique spatial patterns of activity, corresponding to the pre-activation of specific lexical items, can be detected in the brain." I think this is not warranted given the presented data. I am picking out a few examples in the following:

The authors make several claims as to the specific nature of lexico-semantic preactivation, which are also not supported by the reported study: "… the particular spatial pattern of brain activity associated with the pre-activation of the word baby may have reflected the pre-activation of spatially distributed representations of semantic features such as little, cute, and chubby, while.… the pre-activation of the word roses may have reflected the pre-activation of semantic properties such as red and beautiful." This, in my view, is overly speculative and at the same time suggests to the superficial reader a level of detail that is by no means reached in this study.

Another example involves the claim that "this may be because different properties associated with particular words became available at different time. For example, the different semantic features (little, cute, chubby) associated with.… baby might have been recruited at different time points.", as a possible account why there were only effects along the diagonal. However, again, this is not grounded in any empirical data, and in tendency fails to acknowledge that also along the diagonal, there was no persistent effect beyond 500 ms pre-word onset.

With respect to the neural mechanism, the authors state that "the absence of an effect off the diagonal suggests that the spatial patterns associated with pre-activation evolved dynamically over time". However, there is no evidence to support this claim. In particularly when considering that there is also no persistent effect along the diagonal, it most likely indicates that there was no sustained pre-activation over time.

Reviewer #2:

This manuscript presents research aimed at investigating the hypothesis that specific words are pre-activated in the brain given a constraining semantic context. The authors test this hypothesis by presenting highly constraining sentences such that the final word in each sentence can be easily predicted. Moreover, they do so such that pairs of sentences are likely to be predicted to finish with the same word. They then examine the similarity of spatial and temporal patterns in MEG preceding the presentation of the final words. In particular they compare the similarity in these patterns between pairs of sentences with the same final predicted word and pairs of sentences with different final words. They find that both the spatial patterns and temporal patterns are MORE similar for sentences where the same final word is predicted than for sentences where different final words are predicted. They take this as evidence that specific lexico-semantic predictions are made by the brain during language comprehension.

This was a very well designed piece of research with interesting and compelling results. The manuscript was well written and the discussion seemed reasonable.

I have a few relatively minor comments and queries:

1) The nice study design included ensuring that the paired sentences didn't actually finish in the same word and that sometimes the sentence with an unexpected word would appear first and sometimes the sentence with the expected word would appear first. The authors argue that this means the results are not simply explainable on the basis that subjects might retain the expected final word in memory when reading the second sentence of a pair. However, it seems to me that, even though the unpredicted word has a much lower cloze, the subject might still retain that unexpected word in memory when hearing the second sentence of a pair. It doesn't seem that likely to me, but it's conceivable. I mean when a subject hears the unexpected word 'child', they might be more likely to retrieve that word when they are next presented with a sentence for which 'baby' is the "correct" prediction, but for which 'child' is a reasonable final word. So, much and all as I like the design, I do think it is still possible that retrieval of a previously stored word is still possible. One thing that I was unclear on (and sorry if I just missed it) was the actual ordering of the sentence presentation. Did the two members of a pair of sentences always appear consecutively? If so, this would make the idea of retrieval even more likely. If the 120 sentences are all just presented in a random order, then I guess it is unlikely. Again, sorry if I missed that.

2) A minor query – were there different numbers of words in the sentences? Or always the same? And, relatedly, did the subject always know when the final word was going to appear? It's just that a pet worry of mine is the generalizability of language research done on isolated sentences that are very regular in their makeup. I imagine subjects get into an unusual mindset with linguistic processes overlapping with more general decision making strategies that may confound things. I don't think that's an issue here for two reasons: 1) it wouldn't explain why the data are more similar within sentences than between and 2) subject didn't have to make deliberative decisions at the end of each sentence. But still, it would be nice to get a sense of the variability (or lack of it) in the structure of the sentences.

3) Very minor – in subsection “Design and development of stimuli” there seem to be 109 pairs of sentences above 70% close and 12 that were lower. That makes 121, not 120.

4) In subsection “MEG data processing” the authors say "Within this 4000ms epoch, trials contaminated…were…removed…" How is there a trial within this epoch? Is the trial not the entire epoch? Or am I misunderstanding what you mean by a trial?

Reviewer #3:

This study investigates the process of predictive access to upcoming strongly constrained words during reading. It does so by combining spatial and temporal resolution of the MEG with the representational similarity analysis (RSA) and asking whether the sentences that strongly constrain to semantically synonymous words show greater similarity of activity patterns (within-pairs) than those constraining to semantically unrelated words (between-pairs). Results point to the LH inferior and medial temporal areas showing greater activation similarity for within-pairs before the perceptual onset of the critical word and are therefore candidate locations for the word-related pre-activations. This approach constitutes a novel and exciting contribution to the study of predictive processing/coding in the language domain.

My two main concerns are as follows. First, that the specific method used for the RSA analysis -separating spatial and temporal dimensions of the data and using one dimension (spatial) to narrow down the testing time-window of the other (temporal) also narrowed down the scope of the effects uncovered. Second, that the sentences within- and between-pairs were insufficiently matched in terms of the syntactic and lexicosemantic characteristics of the words directly preceding the critical predicted word and observed effects could have been related to those differences rather than pre-activations. These two issues would need to be addressed before the strength and interpretation of the current set of effects could be fully evaluated.

Major points:

1) My main concert in terms of analysis methods is that the separation of the temporal and spatial components of the RSA analysis unnecessarily limited the kind of effects that were uncovered. For calculation of both spatial similarity time-series and cross-temporal spatial similarity matrixes all MEG sensors were included and hence the effects would be greatest in the time-points where many sensors simultaneously show similar activity for pairs of sentences. This means that strong and extended in time but spatially localised (or insufficiently distributed) effects might be missed. Especially since for determining the significant time-windows, vectors were averaged across subjects, which means that localised effects had even less chance of surviving given that the same effects in different subjects could appear in different sensors – due to the differences in head shape etc. Further the concern for the temporal RSA is that the time-window where the effects were tested (-880 -485, SFW aligned) were derived on the basis of the spatial analysis, where spatially continuous differences between within- and between-pair similarity values were found after applying an arbitrary cut-off (r>0.04).

To avoid these issues a spatiotemporal RSA could be carried out in the source space directly, or firstly in the sensor space (across sensors and time points) and then the significant spatiotemporal clusters could be source localised. For example, if beamforming is used to derive single-trial source estimates, then data RDMs can be derived using a modified version of the Searchlight approach (e.g Nili et al., 2014; Su et al., 2012 and 2014). For every trial, at every grid point and every time step (every 1/5/10 ms) a 3D data matrix is extracted consisting of activation from n of neighbouring grid points and n time-samples. Then for each pair of sentences predicting the same trials these data matrixes are correlated. Then the effects are averaged across all within-pairs producing grid point by time point spatiotemporal correlation values for the within-pair condition. The same can be repeated for between-pairs. Then a pair t-test can be done to compare within- and between- data across time and grid space, significant spatiotemporal clusters of differences would be determined with cluster permutation. If no major effects have been missed by separating spatial and temporal dimensions of the data, then spatiotemporal RSA would further validate the current set of results.

2) I have several questions about the experimental stimuli. Firstly, were the experimental sentences both between and within-pairs controlled for sentence length (n of words) and syntactic complexity (n of clauses, presence of embedded dependences)? The issue would arise if, for example, all within-pairs happened to have the same syntactic structure/complexity, while between-pairs had mismatching or different structure/complexity. Then the increases of the similarity before SFW for the within-pairs could potentially be attributed to similar demands of grammatical/syntactic processing, while decreased similarity for between-pairs would be driven by differences in these processing demands. The authors cover this potential caveat in subsection “Unique spatial patterns of neural activity are associated with the prediction of specific

words, prior to the appearance of new bottom-up input” of the discussion, and argued that in this case we would see within- and between-pair difference arise earlier. However, while such differences could have been building up, they also could have become significant only closer to the end of the sentence. To exclude this option differences between within- and between-pair sentences should be reported.

Secondly, were the SFW-1 words (the word directly before the SFW) controlled for any of the following characteristics across conditions: syntactic class, frequency, any semantic characteristics such as imageability, concreteness? Again, if the within-pairs matched in terms of SFW-1 characteristics more than the between-pairs sentences effects in the 'prediction' time-window could be driven by similarities of the SFW-1 processing and not the by the SFW pre-activation. Since the critical claim of this paper is that increases in spatial and temporal correlation of the neuronal activity for the averaged within-pairs is driven by pre-activation of the SFW it is critical to exclude any of the effects described above.

3) This point is related to the conclusions drawn by the authors in the Discussion section about the nature of the pre-activated representations. The authors suggest that the effects observed in the pre-SFW window can be driven by orthographic or phonological features of the predicted words. Have any of the analyses they proposed (subsection “Unique spatial patterns of neural activity are associated with the prediction of specific words, prior to the appearance of new bottom-up input”) been carried out? Since sentences used for this study were indeed very constraining, SFW pre-activation of the perceptual features of strongly predicted words would be expected under the predictive processing/coding approach.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Specific lexico-semantic predictions are associated with unique spatial and temporal patterns of neural activity" for further consideration at eLife. Your revised article has been favorably evaluated by Joshua Gold as Senior Editor and a Reviewing Editor. The reviewing editor writes:

I've now read the manuscript and author rebuttal in detail and I'm pleased to see that the authors have addressed the technical and methodological challenges that were raised by the reviewers of the original manuscript. I'm now satisfied that the results establish that information conveyed by a predictable sentence final word is activated at an early stage during processing of the penultimate word in a sentence. I think that this finding makes an important contribution to the literature by more-firmly establishing early lexical-semantic activation of predicted words as a key property of human sentence processing, and demonstrating a novel method by which the these time-limited lexical-semantic predictions can be shown in neural data.

While the manuscript has been much improved there are two remaining issues that I think need to be addressed before acceptance, as outlined below:

1) There are too many places in the Introduction and Discussion in which I think the authors aren't thinking critically enough about whether it is only their preferred "generative and predictive" view that could explain the present findings. My view is that many other accounts could also explain their findings. Specifically, any model which: (i) activates a cumulative semantic representation of sentence meaning, and (ii) emphasises processing speed and efficiency such that semantic representations that are strongly implied by the words read so far, but not yet directly expressed in words are activated – can also account for the current findings. There are many such models in the literature, but most notable (to my mind) is the "sentence gestalt" model from St John and McClelland, 1990, that has been recently updated by Rabovsky et al., 2018, and can predict the magnitude of EEG N400 responses in a wide range of sentence processing paradigms. To my knowledge this is not a model which is explicitly "generative and predictive" and yet I think it very likely that RSA analysis of the sentence gestalt representations generated by this model could simulate the results of the present study. While I don't think that the authors need to do the work to explore whether the model *can* simulate their findings, I do think that it is in their interests to offer a more balanced overview of the literature and to more precisely explain what sort of computational model is implied by their findings.

2) I had one other minor question about the method that they used in comparing cloze probabilities between and within item pairs which could be addressed by same time. This point is described in more detail in their rebuttal letter than in the manuscript. However, I think that this issue deserves a little more attention in the manuscript given the known importance of cloze probability in predicting the magnitude of EEG/MEG signals during sentence processing, and the. Specifically, in the rebuttal letter the authors report analyses of the difference between cloze probability for sentence pairs. However, if my understanding of this analysis is correct this analysis should be conducted not on the difference between cloze probabilities, but rather the absolute difference between cloze probabilities for within and between item pairs. I think that otherwise the average difference between cloze values would always be zero. I'd like the authors to report this analysis in the manuscript, including a description of the method used for conducting the analysis.

https://doi.org/10.7554/eLife.39061.018

Author response

Essential revisions:

1) The most critical point which I'd ask you to address is to rule out some alternative explanations of your observations. The manuscript already includes some additional analyses in which comparisons are made between "within-pair" vs "within-category/between-pair" sentences. However, the reviewers were concerned that this is only one of many alternative factors that could explain the observations.

The reviewers suggested, and I agreed, that there are a number of additional factors that could lead to similar activation on the penultimate word of the sentence (SFW-1) and which must be ruled out if you are to conclude that your effect is driven by prediction of the sentence final word (SFW). Among the factors suggested by reviewers, include:

– form-based properties of the penultimate word (word length, orthographic or phonological similarity, etc)

– lexical/semantic properties of the penultimate word (syntactic class, word frequency, imageability/concreteness, etc etc)

– incidental properties of the sentence up-to and including the penultimate word (e.g. number of words up to that point, syntactic complexity – e.g. number of clauses, embedded dependencies, etc)

To address this point you can do two things: (1) assess whether these factors are more similar for your within-sentence pairs than for your between-sentence pairs, (2) assess whether increased neural similarity would still be observed in your MEG data when these other factors are matched.

As you might be thinking, though, performing these second analyses will be difficult using the methods in the present manuscript. You face a severe loss of power/sensitivity as you progressively divide the materials into smaller and smaller subsets. (Incidentally, I felt that you don't need to match for the number of items compared, though I'm sure there are others who would be reassured by these analyses).

We fully agree that it is important to rule out the possibility that differences in processing of the penultimate word of the sentence (SFW-1) led to the pattern of results we observed. We have now carefully addressed this possibility and we think we can make a strong case that the lexical properties or the predictability of the penultimate word cannot account for our findings. We have made several major changes to the manuscript as described below:

A) In the Materials and methods, we now state that we measured: (1) the number of words, the number of clauses, and the syntactic complexity of the sentence context up until and including SFW-1; (2) various lexical properties of the SFW-1 (visual complexity, word frequency, syntactic class); and (3) the predictability (as operationalized by cloze probability) of the SFW-1. We showed that none of these factors differed systematically between pairs of contexts that predicted the same SFW (i.e. within-pairs) and pairs of contexts that predicted a different SFW (i.e. between-pairs).

We were unable to examine the orthographic or phonological features of the SFW-1 as a whole, because, in Chinese, the characters within each word/phrase are associated with distinct orthographic and phonological features. Also, as shown in the full set of stimuli (Figure 1A—source data 1), the SFW-1 could either be a content word (verb, noun, adjective, adverb) or a function word (pronoun, classifier, conjunction, particle, prepositional phrases). Concreteness values for these words were not available in available Chinese corpora. However, given the heterogeneity of the SFW-1, we think that the concreteness of the SFW-1 is unlikely to account for the observed effect.

B) In the Results, we describe a new control analysis that we carried out in order to fully exclude the possibility that the increased spatial similarity associated with sentence pairs that predicted the same SFW versus a different SFWwas driven by processing of the SFW-1 rather than anticipatory processing of the SFW itself. In this control analysis, we selected a subset of between-pair sentences that contained exactly the same SFW-1, but nonetheless predicted a different SFW. We then selected sentences that constrained for these same SFWs (within-pairs), but which differed in the SFW-1. We then compared the spatial similarity between these two subsets of sentence pairs. If the increased spatial similarity associated with the within-pairs versus between-pairs was due to the lexical processing of the SFW-1, then the spatial similarity should be greater in sentence pairs containing exactly the same SFW-1 (i.e. in the subset of between-pairs) than in sentence pairs that predicted the same SFW (i.e. in the subset of within-pairs). We found no evidence for this. Instead, the spatial similarity remained larger for the within-pairs than the between-pairs (although in this subset analysis, the difference only approached significance due to limited statistical power).

C) In the Discussion, we now explicitly discuss all these methods (described above) to address the possibility that differences in processing of the penultimate word of the sentence (SFW-1) led to the pattern of results we observed.

I'd therefore like to suggest an alternative method for running these analyses and one which doesn't require you to run analyses on subsets of materials. The method was (AFAIK) introduced in a paper by Carlin et al. (2011, Current Biology):

https://doi.org/10.1016/j.cub.2011.09.025

This is a method in which you partial out extraneous or unmatched factors while performing RSA analysis. In Carlin's case for face perception, this was achieved using partial spearman correlations to rule out physical features when comparing gaze direction. However, no doubt other statistical methods (e.g. multiple linear regression, etc) can also be applied. This comes under the rubric of "representational geometry analysis".

I think the more that you can do to ensure that other aspects of your sentence pairs, and the penultimate word of these sentence pairs, does not explain your observations the more satisfied readers of the paper will be that the only plausible explanation of your findings is that there is a neural signature of the predicted final word.

We thank the editor for the suggestion. We carefully read the paper recommended. However, these methods are not easily adapted for the way we chose to carry out our analysis. Specifically, in Carlin et al.’s 2011 study, the authors correlated item pairwise dissimilarity matrices. One matrix reflected the dissimilarity of the brain activity for all pairs of stimuli, and the other matrix reflected the dissimilarity of the factor of interest (i.e. qualitative gaze direction) for all pairs of stimuli. They also built dissimilarity matrices for other factors (such as grayscale intensities, head view, quantitative differences between angles of left and right gaze). This then allowed them to run partial Spearman correlations between the matrix reflecting the brain activity and the matrix reflecting the factor of interest, while controlling for the other factors on each item.

However, in the current study, we calculated the means across items of the brain pattern similarity values based on whether the same SFW was predicted by pairs of sentences (within-pair: the same SFW was predicted; between-pair: a different SFW was predicted). Our analysis approach has the advantage of increasing the signal-to-noise ratio of our correlation values, since the correlation values produced by random noise would be canceled out after averaging across items. Power is a particularly important consideration given that neural activity associated with the prediction of complex lexico-semantic representations (examined here) are likely to be smaller than activity associated with the perception of lower-level stimuli (probed in Carlin et al.’s study). However, our analysis approach makes it difficult to run correlation analyses that partial out other extraneous variables. Nevertheless, the additional control analysis described above, along with the additional measures of the contexts and the SFW-1, increase our confidence in claiming that the pattern of results we observed was driven by anticipatory processing of the SFW itself.

2) Even if these analyses confirm that other factors can't explain your findings, then you still need to explain what predictive pre-activation means giving that this neural effect is absent for the ~400ms immediately prior to the onset of the sentence final word. To my mind, this does not negate your conclusions regarding lexico-semantic prediction – but it does mean that a more nuanced mechanism must be involved. In particular, the idea that this reflects "activity silent working memory" or pre-activation of specific lexical-semantic properties of target words seemed like a stretch to one of the reviewers and I would agree. It might be, though, that by considering similarities between orthographic or semantic properties of the SFW in different sentences the authors could provide additional evidence in this regard. In the absence of this, though, I think that the authors should explain that their findings are consistent with pre-activativation while acknowledging the degree to which other interpretations (e.g. integration) might be possible.

We agree that the timing of the spatial similarity effect deserved more discussion. To our minds, there were two interesting features of this timing. The first is that the spatial similarity effect began to appear soon after the onset of SFW-1 (rather than at the offset of the SFW-1). The second is that the effect was not seen immediately prior to the appearance of the SFW itself. We consider each of these below:

A) The early onset of the spatial similarity effect.

The fact that the spatial similarity effect began to appear immediately following the SFW-1 raises the obvious question of whether the effect was driven by the lexical properties of the SFW-1 itself, or the predictability of the SFW-1 itself. For example, if the predictability of the SFW-1 differed systematically between pairs of contexts that predicted the same SFW (i.e. within-pairs) and pairs of contexts that predicted a different SFW (i.e. between-pairs), then the spatial similarity effect might have been driven by the “integration” of the SFW-1 rather than the prediction of the SFW. However, given that, as discussed above, the cloze values of the SFW-1 did not differ at all between pairs of contexts that predicted the same SFW (i.e. within-pairs) and pairs of contexts that predicted a different SFW (i.e. between-pairs), this seems unlikely. Moreover, the control analysis described above excludes the possibility that the spatial similarity effect was driven by bottom-up processing of the SFW-1 itself. This is discussed in subsection “Unique spatial patterns of neural activity are associated with the prediction of specific words, prior to the appearance of new bottom-up input”.

Rather than reflecting processing of the SFW-1 itself, we now make it clear in the Discussion that the early appearance of the effect “provides evidence that the prediction of the SFW was generated at the first point in time at which participants had sufficient information to unambiguously generate this prediction. For example, in the sentence “In the crib, there is a sleeping …”, as comprehenders accessed the meaning of the word, <sleeping>, they may have also predicted the semantic features of <baby>. This type of account follows from a generative framework of language comprehension in which, following highly constraining contexts, comprehenders are able to predict entire event or states, along with their associated semantic features, and incorporate such predictions into their mental models prior to the appearance of new bottom-up input (see Kuperberg and Jaeger, 2016; Kuperberg, 2016).”

B) The disappearance of the spatial similarity effect immediately prior to the appearance of the SFW itself.

In the Discussion, we now more explicitly state that the precise reason for this is unclear. First, we acknowledge that it is possible that the predicted information was not maintained over the relatively long interstimulus interval used in the present study. On the other hand, we also point out that a failure to detect neural activity over a delay does not necessarily imply that this information was not present. There is now quite compelling evidence from related fields challenging traditional ideas of how information is represented in the brain over delays. Having re-read the papers that we cited, we continue to think that they provide an interesting and possible explanation for why we did not see any effect in the delay period — one that we would like our readers to consider. We have tried to be more explicit about this, explaining that “representations within working memory can be maintained in a silent neural state, instead of being accompanied by persistent delayed activity (Stokes, 2015; Wolff et al., 2017). Such content-specific silent activity can only be detected if it is in the focus of attention and task-relevant. On this account, in the present study, despite the fact that we were not able to detect it, the predicted information was still present during the interstimulus interval, and it only became available once new bottom-up input was encountered.” Moreover, we pointed out that “Of course, this interpretation is speculative, particularly given our use of a very slow presentation rate. It will be important for future work to determine whether similar dynamics are associated with the prediction of upcoming words when bottom-up inputs unfold at faster, more naturalistic rates.”

3) One further point that two of the reviewers were confused by – and I think must be clarified – concerns the order of presentation and blocking of sentence presentation. Is it the case that the sentence pairs were presented on successive trials? I would hope not, since this is a serious confound for RSA analyses in fMRI and could also be problematic for MEG. I think it's the case that trials for the within-sentence pairs are no closer together in time than a randomly selected between-sentence pair, but I couldn't see this unambiguously stated in the manuscript. I'd like for you to confirm this and (if possible) report further analyses in which temporal distance or temporal order of trial pairs is excluded as a nuisance factor in neural similarity analyses.

We apologize for the confusion.

A) We have now made this clearer in the Introduction, Results as well as in the Materials and methods. In the Introduction, we state that “During the experiment, sentences were presented in a pseudorandom order, with at least 30 other sentences (on average 88 sentences) in between each member of a given pair.” In the Results, we state that “The sentences were constructed in pairs (120 pairs) that strongly predicted the same sentence-final word (SFW), although, during presentation, members of the same pair were separated by at least 30 (on average 88) other sentences.” In the Materials and methods, we state that “the two members of each pair were presented apart from each other, with at least 30 (on average 88) sentences that predicted different words in between.”

B) As mentioned above, our analysis methods make it difficult to explicitly account for the temporal distance between members of within-pair sentences as a nuisance factor. However, as noted in the previous version of the manuscript, we did carry out a control analysis, which “found that the spatial similarity effect was just as large when the unexpected SFW of a pair was presented before the expected SFW, as when the expected SFW was presented first (see Figure 2—figure supplement 3).”

C) We also state in the Discussion that “It is, however, conceivable that participants recognized a match between the word that they had just predicted and a word that they had predicted earlier in the experiment (even though this predicted word was never observed). For example, there is some evidence that a predicted SFW can linger in memory across four subsequent sentences, even if it is not actually presented (Rommers and Federmeier, 2018). This seems less likely to have occurred in the present study, however, where each member of a sentence pair was separated by at least 30 (on average 88) other sentences.”

These three points should be the main focus of a revision to the manuscript. In addition to these main points, I've also appended the three reviews that I've received. These include many other minor points, suggested changes and requests for clarification which you would do well to heed. There's always scope to clarify methodological aspects of a complex study like this.

We appreciate your careful summary of the reviewers’ concerns, and we hope that we have addressed them clearly.

However, one methodological concern (from reviewer 3), which I'll not insist on you addressing, concerns the separation of spatial and temporal RSA methods. I agree that spatio-temporal analyses in single-subject source space could detect neural effects that are missed by their sensor space and time-based analyses. However, to my mind this methodological concern could explain the absence of some effects in the existing analyses, but not the presence of reliable effects. Since there's a lot of work involved, and substantial correction for multiple comparisons would reduce sensitivity, I'm going to give you the option of not performing these analyses and instead discussing potential limitations and future directions that could be taken in similar future work.

We thank the editor for the suggestion. In the Results, we now point out that the analysis approach we took is fairly conservative. Specifically, we explain that “it was limited to the time window that showed a spatial similarity effect, and so it may not have captured more extended temporal similarity effects that were not accompanied by a spatial similarity effect.” We also point out that “The reason we took this approach is that we were interested, a priori, in any functional relationship between these measures, i.e. whether the spatial similarity effect reflected brain activity associated with the prediction of spatially distributed semantic representations, and whether the temporal similarity effect reflected brain activity associated with temporal binding of these spatially distributed representations. However, in order to fully exploit the spatiotemporal pattern of the data, future studies could examine the spatial and temporal patterns simultaneously using a spatiotemporal searchlight approach (Nili et al., 2014; Su et al., 2012; Su et al., 2014).”

Reviewer #1:

Major Points

Most importantly, I think that the authors provide no evidence to actually support their claim that the observed increased similarity among within-pair sentences, at the pre-final word, actually reflects the pre-activation of the sentence final word. I think the most plausible account is that context-dependent constraint is already high on the pre-final word, so that alternative interpretations like ease of integration of the pre-final word are at least as likely as the pre-activation account that the authors try to propose here. At the very least, these two alternative accounts have to be discussed equally; if the authors give preference to the pre-activation account, this should be grounded in reliable additional empirical evidence. (In the Discussion section, the authors argue that 'greater similarity between sentence contexts' is an unplausible account for their result, given that the two sentences of each pair were composed of different words and, in particular, the SFW-1 prefinal word always differed. However, the sentences were constructed such that the constraint was high, so this does not rule out the possibility of expectation / ease of integration effects on the prefinal word, in my opinion.)

We thank the reviewer for encouraging us to consider these potential confounds more carefully. We have carried out several additional analyses and have made several changes to the manuscript to address the concern that differences in processing of the pre-final word (SFW-1) led to the pattern of results we observed.

A) Perhaps most relevant to the reviewer’s point that the results could be driven by “the possibility of expectation/ease of integration on the pre-final word”, we ran a new cloze test to examine the probability of the SFW-1 (now described in Materials and methods). We found that the cloze probability of the SFW-1 was relatively low: 11% on average across all items. Moreover, the difference in cloze probability between members of sentence pairs was matched between pairs that constrained for the same SFW (within-pairs: 17.00% cloze difference) and pairs that constrained for a different SFW (between-pairs: 17.28% cloze difference): t(28678) = -0.136, p = 0.89. We think that this makes it unlikely that the observed effect was driven by the expectation or ease of integration of the SFW-1.

B) In the Materials and methods, we now state that we extracted: (1) the number of words, number of clauses, syntactic complexity of the sentence context up until and including SFW-1; (2) various lexical properties of the SFW-1 itself (visual complexity, word frequency, syntactic class). We show that none of these factors differed between pairs of contexts that predicted the same SFW (i.e. within-pairs) and pairs of contexts that predicted a different SFW (i.e. between-pairs).

C) In the Results, we now describe a new control analysis that we carried out in order to fully exclude the possibility that the spatial similarity effect was driven by lexical processing of the SFW-1, rather than anticipatory activity related to the prediction of the SFW itself. In this control analysis, we selected a subset of between-pair sentences (i.e. that predicted a different SFW) but that contained exactly the same SFW-1. We then selected a subset of within-pair sentences that constrained for these same SFWs, but that differed in the SFW-1. We then compared the spatial similarity between these two subsets of sentence pairs. If, in our original analysis, the increased spatial similarity associated with the within-pair sentences relative to the between-pair sentences was in fact driven by lexical processing of the SFW-1, then the spatial similarity should be greater in sentence pairs containing exactly the same SFW-1 (i.e. in the subset of the between-pair sentences) than in sentence pairs that predicted the same SFW (i.e. in the subset of within-pair sentences). We found no evidence for this. Instead, the spatial similarity remained greater in the sentence pairs that predicted the same SFW than in sentence pairs that predicted a different SFW (although in this subset analysis, the difference only approached significance due to the limited statistical power). We further discussed this finding in the Discussion in subsection “Unique spatial patterns of neural activity are associated with the prediction of specific words, prior to the appearance of new bottom-up input”.

Related to this, it is inconsistent with their interpretation that the similarity effect disappears with the offset of the word-induced brain response of the pre-final SWF-1 word, i.e., around 500 ms prior to the onset of the sentence final target word. A true prediction / pre-activation should persist. The a-posteriori interpretation based on 'activity silent working memory' is a vast over-interpretation of the results, based on no data.

We agree that the timing of the spatial similarity effect deserved more discussion. As the reviewer points out, there are two interesting features of this timing. The first is that the spatial similarity effect began to appear soon after the onset of the SFW-1 (rather than at the offset of the SFW-1). The second is that the effect then disappeared and was not detected immediately prior to the appearance of the SFW itself. We consider each of these points below:

A) The early onset of the spatial similarity effect.

The fact that the spatial similarity effect began to appear soon after the onset of the SFW-1 raises the obvious question of whether the effect was driven by either the lexical properties of the SFW-1 itself, or the predictability of the SFW-1 itself. As discussed above, we found no evidence that this was the case. Please see Results, Discussion and Materials and methods for details.

Rather, in the Discussion, we suggest that the early appearance of the effect “provides evidence that the prediction of the SFW was generated at the first point in time at which participants had sufficient information to unambiguously generate this prediction. For example, in the sentence “In the crib, there is a sleeping …”, as comprehenders accessed the meaning of the word, <sleeping>, they may have also predicted the semantic features of <baby>. This type of account follows from a generative framework of language comprehension in which, following highly constraining contexts, comprehenders are able to predict entire event or states, along with their associated semantic features, and incorporate such predictions into their mental models prior to the appearance of new bottom-up input (see Kuperberg and Jaeger, 2016; Kuperberg, 2016).”

B) The disappearance of the spatial similarity effect immediately prior to the appearance of the SFW itself.

In the Discussion, we now state more explicitly that the precise reason for this is unclear. First, we acknowledge that it is possible that, the predicted information was not maintained over the relatively long interstimulus interval used in the present study. On the other hand, we also point out that a failure to detect neural activity over a delay does not necessarily imply that this information is not present. There is now quite compelling evidence from related fields challenging traditional ideas of how information is represented in the brain over time delays. Having re-read the papers that we cited, we continue to think that they provide a possible explanation for why we didn’t see any effect in the delay period — one that we’d like our readers to consider. We have tried to be more explicit about this, explaining that the contents of working memory can be maintained in a silent neural state, instead of being accompanied by persistent delayed activity (Stokes, 2015; Wolff et al., 2017). We also now discuss the idea that content-specific silent activity can only be detected if it is in the focus of attention and task-relevant. On this account, in the present study, despite the fact that we were not able to detect it, the predicted information was still present during the interstimulus interval, and it only became available once new bottom-up input was encountered. Of course, this interpretation is speculative, particularly given our use of a very slow presentation rate. Therefore, we have emphasized that “It will be important for future work to determine whether similar dynamics are associated with the prediction of upcoming words when bottom-up inputs unfold at faster, more naturalistic rates.”

C) Finally, it is possible that, in the earlier version of the manuscript, there was some ambiguity in our use of the term “pre-activation” (of a lexico-semantic representation).

Some people have used the term “pre-activation” to refer specifically to the pre-activation of specific phonological or orthographic word-forms. We did not make this assumption. Rather, as we make clear in the Discussion, our assumption is that multiple different types and grains of information can be encoded (and therefore predicted) within a predicted lexical representation, and that it is unclear exactly what information was detected by our analysis.

To avoid any such ambiguity, in the revised version of the manuscript we now use the more general term “prediction” throughout. We explain what we mean by this at the very beginning of the Introduction: “After reading or hearing the sentence context, “In the crib there is a sleeping …”, we are easily able to predict the next word, “baby”. In other words, we are able to access a unique lexico-semantic representation of <baby> that is different from the lexico-semantic representation of any other word (e.g. <rose>), ahead of this information becoming available from the bottom-up input.” We also rephrased the introduction of the hierarchical generative framework of language comprehension: “strong beliefs about the underlying message that is being communicated can lead to the prediction of associated semantic features and sometimes to the top-down pre-activation of information at lower levels of the linguistic hierarchy (e.g. orthographic and/or phonological form) before new bottom-up information becomes available.”

In the Discussion, we state that “It is also possible that the increased spatial similarity in association with sentence pairs that predicted the same word reflected similarities of predictions generated at a lower phonological and/or orthographic level of representation. On this account, comprehenders not only predicted the semantic features of words, but they also pre-activated their word-forms. The present study cannot directly speak to this hypothesis.”

Later, we further re-iterate this point by stating that, while our findings provide evidence that the prediction of the semantic features of SFWs was generated at the first point in time at which participants had sufficient information to unambiguously generate this prediction, “we cannot tell from the current findings whether this, in turn, led to the top-down pre-activation of specific phonological or orthographic word-forms.”

Also the proposal that the pre-activation may involve a sequence of activation of different lexico-semantic properties of the target word is speculative beyond the interpretation of results, and not grounded in any data.

We found that the increased spatial similarity in the within-pair sentences, relative to the between-pair sentences, was only evident along the diagonal line; it did not generalize across time points. This suggests that the unique spatial pattern of brain activity associated with the prediction of specific words changed over time. In order to interpret this result, we speculated that “this may be because different properties associated with particular words became available at different times. For example, the different semantic features (e.g., <human>, <small>, <cries>) associated with the prediction of a specific word (e.g. “baby”) might have been recruited at different time points.”

Also related to this, I think that the control analysis testing for noun/verb differences is not sufficient to warrant the general claim that 'higher order grammatical and semantic' effects? differences? cannot influence the present result. The noun/verb category difference is just one of many such features. I would find it much more convincing if the authors could show in their stimulus materials, that no such differences exist on the target position in the critical as compared to the control contrasts, as well as for the positions preceding the sentence final word. Also, I think behavioral testing could easily quantify the degree of expectancy/constraint on the pre-final words. This kind of additional data would allow for a more empirically grounded interpretation of the similarity effect on the pre-final word.

We apologize for any confusion about this analysis.

A) This was not intended to be a “control” analysis. Instead, its purpose was to help us determine the type and grain of predicted information that might have been reflected by the item-specific unique spatial patterns. In our study, 50% of the predicted SFWs were nouns and 50% were verbs. Thus, in theory, an increased spatial similarity in association with sentence pairs that predicted the same upcoming word could have reflected greater similarity between the predicted word’s general syntactic category (a noun or a verb).

This analysis aimed to exclude this possibility. It therefore did not test for any difference in the prediction of nouns versus verbs. Rather, we averaged the spatial similarity values of sentence pairs that predicted nouns and verbs together, and we extracted the spatial similarity values of sentence pairs that predicted the same syntactic category (whether this was a noun or a verb), i.e. within-category sentence pairs. We then compared these within-category spatial similarity values with the original item-specific within-pair spatial similarity values. We found that the within-category spatial similarity values were significantly smaller than the within-pair spatial similarity values. These findings suggest that “the greater within-pair (versus between-pair) spatial similarity effect was not simply reducible to the prediction of general syntactic category”. This has been better explained in the Results section.

B) We ruled out the possibility that the item-specific prediction effect was driven by the differences in the pre-final words (SFW-1) or the preceding contexts, as we also discussed above. Detailed revisions were made in the Results, Discussion and Materials and methods.

C) We thank the reviewer for suggesting that we quantify the degree of expectancy/constraint on the pre-final words (SFW-1s). As noted above, we now report the results of a cloze probability test that examined the predictability of the SFW-1 (in the Materials and methods). We found that the cloze probability of the SFW-1 was relatively low: 11% on average across all items. Also, the difference in cloze probability of the SFW-1 was matched between the within-pair sentences (17.00% cloze difference) and the between-pair sentences (17.28% cloze difference): t(28678) = -0.14, p = 0.89. We believe that this provides strong evidence that the observed effect was driven by the prediction of the SFW instead of the expectation or ease of integration of the SFW-1.

Also related to this, I wonder whether there should not be similarity effects also on the target words itself. Even though at the target word position not the same words were presented (e.g., baby and child in a sentence pair both constraining for baby), the similarity between those is still substantially higher than, e.g., baby and fridge (example from Figure 1).

We understand the reviewer’s point. However, there is a potential confound: it is well established that words that violate strong lexico-semantic predictions produce a larger amplitude response between 300-500ms (a larger N400), even when they are semantically related to predicted words (e.g. Federmeier et al., 1999). This was true in the present MEG data where we saw clear evidence of an increased N400 amplitude on unexpected SFWs over the left temporal sensors. An engagement of left temporal regions in processing unexpected SFWs in both members of a pair (regardless of whether these words are the same or different from each other) would inflate the estimate of the spatial similarity value on these words. This would confound the comparison of the spatial similarity values between within-pair versus between-pair SFWs because in 25% of the between-pair sentences, the SFW of both members of the pair was unexpected (i.e. (N*(N-1)/2 pairs out of 2*N*(N-1) pairs) whereas this was not true in any of the within-pair sentences. This would have inflated our estimate of the spatial similarity values of the between-pair sentences, thereby reducing our power to detect a significant difference between the within-pair and the between-pair SFWs.

Nonetheless, because examination of Figure 2B suggests that, after the onset of the SFW, the spatial similarity values indeed appeared to be slightly greater in sentence pairs that predicted the same SFW (within-pairs) than in sentence pairs that predicted a different SFW (between-pairs), we went ahead and compared the averaged spatial similarity between the within-pair and between-pair sentences within the time window of 109 – 588ms (defined by our cutoff threshold: R > 0.04) after the onset of the SFW. However, the difference was not significant: t(25) = 1.388, p = 0.177. Given the confound and the complexity of any interpretation, we decided to focus the manuscript itself on activity prior to the onset of the SFW.

Even more so, should not the pre-activation lead to increased similarity between the pre-word period and the brain activation elicited during the actual presentation of the word itself?

This is an interesting question and one that we considered. However, addressing it runs into the same issues as those described above: any brain activity measured following the actual presentation of the SFWs is likely to reflect both information corresponding to the semantic features associated with the item-specific SFW itself, as well as more general processing of the SFW (regardless of its precise identity) in relation to its preceding context — that is, a SFW that is not predicted will evoke a larger N400 than a SFW that is predicted. This makes it tricky to interpret any similarities between brain activity produced prior the SFW and brain activity produced during the actual presentation of SFW, especially for unpredicted SFWs.

Despite this caveat, we carried out an exploratory analysis to examine the relationship between spatial patterns of activity produced during the prediction period and spatial patterns of activity produced following the onset of the SFW itself. We constructed two cross-temporal similarity matrices — one for expected SFWs (Author response image 1: left) and one for unexpected SFWs (Author response image 1: middle) — by correlating the spatial pattern of brain activity produced at each time point during the prediction window (-1000 to 0ms) with the spatial pattern of brain activity produced at each time point after the onset of the SFWs (0 to 1000ms). In order to determine whether there were any differences in the spatial pattern produced by unexpected SFWs and expected SFWs, we subtracted these two matrices (Author response image 1: right) and carried out a cluster-based permutation analysis. This revealed two effects (the two clusters shown in Author response image 1: right).

The first effect, shown in blue (Author response image 1: right), was driven by a stronger “pre-post” correlation in the sentences ending with expected than unexpected SFWs (cluster-level p < 0.001, 10000 permutations). Specifically, the spatial pattern of predictive activity that began just before the onset of the SFW, continuing until 200ms after its onset (-300 to 200ms) correlated with the spatial pattern of activity produced between 300-800ms after the onset of expected SFW. We speculate that, at -300ms prior to the onset of the SFW, the semantic features that had been predicted earlier were further activated in anticipation of the SFW, remaining active for 200ms after its onset; then, as the semantic features of the bottom-up expected input became available at around 400ms, they matched these predicted features thereby driving the increased spatial similarity values to the expected SFWs.

The second effect, shown in red (Author response image 1: right), was driven by a stronger “pre-post” correlation in the sentences ending with unexpected SFWs (cluster-level p = 0.025, 10000 permutations). Specifically, the spatial pattern of predictive activity that was originally produced following the SFW-1 (around 400-700ms prior to the onset of the SFW) correlated with the spatial pattern of brain activity produced between 400-800ms following the onset of unexpected SFWs. We speculate that the detection of the lexical violation on the SFW (during the N400 time window) triggered a re-activation of the originally predicted words, leading to the increased spatial similarity values within this later time window.

While these findings are interesting, these interpretations are obviously very speculative. Therefore, in the manuscript itself, we decided to focus on the well-motivated prediction effect preceding the SFW. However, we welcome the opportunity to share these preliminary data and our speculation here in this response to the reviewer’s question.

It is unclear to me, why sentences were presented in pairs. The pair-wise presentation of 120 sentence pairs each constraining for the same sentence-final word, without doubt can induce strategic expectation effects towards the end of the sentence – which has nothing to do with the kind of highly automatized predictive processes as postulated in the predictive coding framework. I think that this problem is not solved by the fact that only one of the two sentences contained the constrained-for target word at the SWF position. Actually, the fact that one pair contained an unexpected word could even increase the strategic handling of these sentence pairs.

A) We constructed these sentences in pairs (120 pairs) such that each member of a pair predicted the same word, even though their contexts differed (e.g. “In the crib, there is a sleeping …” and “In the hospital, there is a newborn …”). This was essential to the logic of our design — that the spatial similarity in brain activity produced prior to the onset of the SFW would be greater between members of a pair that predicted the same SFW than between members of a pair that predicted a different SFW. However, as we now emphasize in the Introduction, Results and Materials and methods, during the experiment itself, sentences were presented in a pseudorandom order, with at least 30 other sentences (on average 88 sentences) in between each member of a given pair.

B) Of course, even with this gap between the presentation of members of the same pair, it was important to avoid repetition confounds. This is why, during presentation of the stimuli, we replaced the SFW in one member of the sentence pair with an unpredicted word (therefore avoiding repetition of the SFW).

C) We also considered the possibility that participants may have been more likely to predict a particular word having previously seen this word. This is why we carried out the control analysis (subsection “Spatial RSA: The spatial pattern of neural activity was more similar in sentence pairs that predicted the same versus different words, and this effect began before the onset of the predicted word”), which showed that the spatial similarity effect was just as large when the unexpected SFW of a pair was presented before the expected SFW was presented as when they were presented in the opposite order (see Figure 2—figure supplement 3). In the Discussion (subsection “Unique spatial patterns of neural activity are associated with the prediction of specific words, prior to the appearance of new bottom-up input”), we now discuss in detail how our design might have affected the interpretation of the results.

D) We think that it is unlikely that the inclusion of these unexpected SFWs actually increased the strategic prediction effects towards the end of any given sentence. Previous studies have actually shown reduced effects of prediction when the validity of a predictive cue is low (e.g. during semantic priming: Lau, et al., 2013; Delaney-Busch et al., 2017; during sentence comprehension: Brothers et al., 2016; Brothers et al., 2017). In other words, this would have hurt our ability to detect an effect in the present study.

E) Having made these specific design points, we want to emphasize that we agree with the reviewer that these were far from naturalistic experimental conditions. As noted above, we see the unique contribution of our study as “providing evidence that, when we know that item-specific lexico-semantic are generated, they are associated with unique spatial and temporal patterns of neural activity”

In the Introduction, we are now more careful to emphasize up-front that our aim was to use “MEG, together with both spatial and temporal RSA, to ask whether, under experimental conditions known to encourage specific lexico-semantic prediction, distinct words are associated with distinct spatial and temporal patterns of neural activity, prior to the appearance of the predicted input.” And in the Discussion, we state that “these findings pave the way towards the use of these methods to determine whether and when such specific lexico-semantic representations become available as language, in both visual and auditory domains, unfolds more rapidly in real time.”

References:

Lau, E. F., Holcomb, P. J. and Kuperberg, G. R. (2013). Dissociating N400 Effects of Prediction from Association in Single-word Contexts. Journal of Cognitive Neuroscience, 25(3), 484-502

Delaney-Busch, N., Morgan, E., Lau, E., and Kuperberg, G. R. (2017). Comprehenders Rationally Adapt Semantic Predictions to the Statistics of the Local Environment: a Bayesian Model of Trial-by-Trial N400 Amplitudes. CogSci.

Brothers, T., Dave, S., Hoversten, L. J.,Traxler, M., Swaab, T. Y. (2016). Expect the unexpected: Speaker reliability shapes online lexical anticipation. Poster presented at the 8th Society of Neurobiology of Language Conference, London, England.

Brothers, T., Swaab, T. Y., and Traxler, M. J. (2017). Goals and strategies influence lexical prediction during sentence comprehension. Journal of Memory and Language, 93, 203-216. doi:https://doi.org/10.1016/j.jml.2016.10.002

It is also unclear to me why the very slow and un-naturalistic presentation rate was chosen. Again, I think that this can induce strategic processes, as well as increased working memory load, which may influence the RSA results. (I also tend to think that this design was chosen as the ITI preceding the SFW is the most obvious time window to search for predictive pre-activation, see my first point above.

Again, we agree with the reviewer that the experimental conditions in this study were un-natural. We have been more explicit in the Introduction that “the sentences were visually presented at a slow rate of 1000ms per word. This ensured the generation of specific lexico-semantic predictions and guaranteed sufficient time to detect any representationally specific neural activity before the onset of the predicted word.”

We return to this in the Discussion and have specifically pointed out that “It will be important for future work to determine whether similar dynamics are associated with the prediction of upcoming words when bottom-up inputs unfold at faster, more naturalistic rates.”

Given that no results were reported for this 'silent' pre-word time window, I tend to be very critical about interpreting the results as predictive pre-activation of the sentence final word.)

Please see above for our discussion of the timing of the observed effect. Note that we have made it clearer in the Discussion that “the predicted information was not maintained over the relatively long interstimulus interval used in the present study (SOA: 1000ms per word).”

Combined, these points suggest to me that the authors interpret their results too strongly. Predictive pre-activation is claimed in the title, Abstract, and Discussion. I think the authors should generally tone down these claims and provide a more realistic and balanced account for their interesting result.

As noted above, we no longer use the term, “pre-activation”, as it is possible that some people may interpret this as reflecting the pre-activation of a specific phonological or orthographic lexical form of a word. Instead, we use more general term, “prediction”, throughout the revised manuscript. Based on the design and all the control analyses we carried out, we think that our findings provide strong evidence that the prediction of semantic features associated with individual words produced unique spatial patterns of brain activity that were evident before new bottom-up input (i.e. the SFW itself) became available.

In their control analysis, the authors demonstrate that the within-pair similarity is also higher than the similarity calculated on the remaining sentences within target words with nouns or verbs. They use this to claim that their result cannot spuriously result from higher-order syntactic or semantic effects. I think this control analysis is nice, but its interpretation goes way too far, as the authors only tested one of many possible such linguistic features. Also, it is unclear why this post-hoc analysis is necessary at all, if (as I expect) authors controlled stringently for such obvious differences in their item construction. Even is the latter were not the case, it should be possible to a posteriori select the sentences for the between-pair analysis, out of all possible combinations, such that they are optimally matched to the 120 critical pairs?

In the Discussion, we have laid out several possible interpretations of the greater similarity associated with the within-pair versus between-pair sentences, ranging from the prediction of syntactic category, semantic features, to lower-level word-form features.

A) As discussed above, instead of controlling for the general syntactic category of the predicted SFWs, we explicitly manipulated this factor (i.e. the predicted SFWs could be verbs or nouns) so that we could ask whether the observed within-pair effect reflected only the prediction of syntactic category. We calculated the within-category spatial similarity between all pairs of sentences that predicted the same category of SFW. We compared these within-category spatial similarity values with the within-pair spatial similarity values. We found that the within-category spatial similarity values were significantly smaller than the within-pair spatial similarity values. These findings suggest that the within-pair versus between-pair spatial similarity effect did not simply reflect the prediction of these words’ broad syntactic category, but instead reflected prediction at the level of the semantic properties and features that defined the meanings of the predicted words, and perhaps lower-level orthographic and/or phonological features. This has been stated in the Discussion.

B) In the Discussion, we also further explain why the current study cannot address the question of whether or not the observed effects reflect the prediction of just the semantic features, or also the orthographic or phonological properties of the predicted words: “This is because, for the most part, there is a one-to-one correspondence between the semantic features and the phonological or orthographic forms of words. However, the methods described here provide one way of addressing this question in future studies. For example, by examining spatial similarity of sentence pairs that constrain for words with shared orthographic features but differing in their meanings (such as homonyms), it should be possible to dissociate the prediction of orthographic/phonological representations from the prediction of semantic features associated with a given lexico-semantic item.”

Concerning the source analysis of the temporal similarity analysis: I am not expert enough in MEG beamforming to really judge this, but it appears to me that the source localization shown in Figure 3B does not seem to be a plausible generator of the scalp distribution of the difference effect shown in Figure 3A, left-most panel?

The source localization in Figure 3B shows the difference between the temporal similarity associated the within-pair and the between-pair sentences. The scalp distribution shown in the left-most panel in Figure 3A indicates only the distribution of the temporal similarity values for the within-pair sentences. Therefore, the source localization (left inferior and medial temporal, extending to the cerebellum) in Figure 3B should be compared with the scalp distribution shown in Figure 3A right-most panel (most prominent over the central and posterior regions). In the revised Results, we have made this explicit: “The source localization of the difference (corresponding to the difference of the topographic distribution, see Figure 3A: right panel) is shown in Figure 3B.”

A lot of information about the stimulus construction and item materials is missing. The authors describe how sufficiently high cloze probability was assured in the sentences of the 120 pairs. However, many further aspects are important, like word category, word frequency, concreteness, etc. In particular, I think that it is important to assure that such obvious lexical and semantic properties are (a) balanced between the within-pair and the between-pair comparisons, as these are the final statistical contrast on which all interpretations are based; (b) that similar information is provided for the two words preceding the sentence-final word. (c) Furthermore, I think it is important to also provide data for the cloze probability of the pre-final words, in particular given that this is where the effect is found (see also my first point above).

A) We do not report the lexical features of the predicted SFW itself because any systematic difference in the lexical features of the within-pair predicted SFW (mean difference = 0) and the between-pair sentences (the mean difference will depend on the variability across items) is an intrinsic feature of our design.

B) As mentioned above, we extracted various lexical properties of the SFW-1 itself in all 240 of our sentences (visual complexity, word frequency, syntactic class). None of these factors differed systematically between pairs of contexts that predicted the same SFWs (i.e. within-pairs) and pairs of contexts that predicted different SFWs (i.e. between-pairs). We were unable to examine the concreteness or imageability of the SFW-1. This is because, as shown in the full set of stimuli (Figure 1A— source data 1), the SFW-1 could either be a content word (verb, noun, adjective, adverb) or a function word (pronoun, classifier, conjunction, particle, prepositional phrases). Concreteness values for these words were not available in available Chinese corpora. However, given the heterogeneity of the SFW-1, we think that the concreteness of the SFW-1 is unlikely to have had any effect on the observed effect.

C) As explained above, we also carried out a control analysis with a subset of stimuli with exactly the same SFW-1. This analysis revealed no evidence of an increased spatial similarity effect associated with lexical processing of the SFW-1.

D) As also noted above, we ran a separate cloze norming study to examine the probability of the SFW-1. We found that the cloze probability of the SFW-1 was relatively low: 11% on average across all items. Also, the difference in cloze probability of the SFW-1 was matched between the within-pair sentences (17.00% cloze difference) and the between-pair sentences (17.28% cloze difference): t(28678) = -0.14, p = 0.89. We hope this provides sufficient evidence to rule out the possibility that the observed effect was explained by the expectation or ease of integration of the SFW-1.

E) Given that (1) the cloze study above suggested that the contextual constraint only became strong after the presentation of the SFW-1, and (2) we did not actually see any evidence of a spatial similarity effect following the SFW-2, we did not extract the lexical characteristics of the SFW-2 in all our sentences. However, we did extract the number of words, the number of clauses and the syntactic complexity of the sentence contexts up until and including SFW-1. Again, none of these factors differed systematically between pairs of contexts that predicted the same SFWs (i.e. within-pairs) and pairs of contexts that predicted different SFWs (i.e. between-pairs).

Parts of the Discussion section and interpretation of the data are far too speculative, including the discussion of specific semantic properties that might be activated. For example, the authors write that "These findings provide strong evidence that unique spatial patterns of activity, corresponding to the pre-activation of specific lexical items, can be detected in the brain." I think this is not warranted given the presented data. I am picking out a few examples in the following:

The authors make several claims as to the specific nature of lexico-semantic preactivation, which are also not supported by the reported study: "… the particular spatial pattern of brain activity associated with the pre-activation of the word baby may have reflected the pre-activation of spatially distributed representations of semantic features such as little, cute, and chubby, while.… the pre-activation of the word roses may have reflected the pre-activation of semantic properties such as red and beautiful." This, in my view, is overly speculative and at the same time suggests to the superficial reader a level of detail that is by no means reached in this study.

In the revised version of the manuscript, we have been more careful in our wording to explain what we think that we can and cannot infer from these data. We continue to think that the data provide compelling evidence that “unique spatial and temporal patterns of neural activity are associated with distinct lexico-semantic predictions”. As discussed above, we now provide additional data and analyses to support this interpretation and to rule out an interpretation that the similarity effects observed were driven either by the lexical features or the predictability of the SFW-1.

As in any cognitive neuroscience study, we interpret our findings in relation to the prior literature. The reason why we designed this study in the first place — and why we think that the question is interesting — is because there was a priori reason to believe that the particular sets of semantic features associated with different words — or different predicted words — are associated with distinct patterns of spatial activity. As noted in the Introduction: “the various semantic features and properties associated with words and concepts are represented in the brain across spatially distributed multimodal networks (Damasio, 1989; Price, 2000; Martin and Chao, 2001) … For example, the particular set of semantic features and properties associated with the concept, <baby> (e.g. <human>, <small>, <cries>), might be represented by a particular spatially distributed pattern of neural activity, whereas the semantic features and properties associated with the concept, <rose> (e.g. <plant>, <scalloped petals>, <fragrant smell>) might be represented by a different spatially distributed pattern of neural activity.” This idea has a long history in cognitive neuroscience, and, as we also note, there is interesting evidence that it may be possible to capture evidence of distributed representations using spatial RSA (e.g. Devereux et al., 2013).

In the revised Introduction, we hope that we have explained the logic of our design more clearly, pre-empting our interpretation in the Discussion: “If, following a constraining context (e.g. In the crib, there is a sleeping …”), the prediction of a unique lexico-semantic item (<baby>) is represented by a unique spatial pattern of brain activity, then this spatial pattern should be more similar following another context that predicts the same word, i.e. within-pair (e.g. “In the hospital, there is a newborn …”) than following another context that predicts a different word, i.e. between-pair (e.g. On Valentine’s day, he sent his girlfriend a bouquet of red …”)”.

In the Discussion itself, we are more careful to make it clear that this is an interpretation of our results: “Instead, we suggest that the spatial similarity effect reflected similarities at the level of the semantic properties and features that defined the meanings of the predicted words … We suggest that our analysis picked up distinct spatially distributed patterns of neural activity that corresponded to the particular sets of features associated with distinct predicted words. For example, the prediction of the set of semantic properties and features corresponding to the word <baby> (e.g. <human>, <small>, <cries>) may have been reflected by the activation of a particular spatially distributed network that differed from the network reflecting the prediction of the set of semantic features corresponding to a different predicted word, <roses> (e.g. <plant>, <scalloped petals>, <fragrant smell>).”

We also offer other interpretations, e.g. that the patterns may have reflected the pre-activation of unique representations of the orthographic or phonological form of specific predicted words (Discussion section).

Another example involves the claim that "this may be because different properties associated with particular words became available at different time. For example, the different semantic features (little, cute, chubby) associated with.… baby might have been recruited at different time points.", as a possible account why there were only effects along the diagonal. However, again, this is not grounded in any empirical data, and in tendency fails to acknowledge that also along the diagonal, there was no persistent effect beyond 500 ms pre-word onset.

With respect to the neural mechanism, the authors state that "the absence of an effect off the diagonal suggests that the spatial patterns associated with pre-activation evolved dynamically over time". However, there is no evidence to support this claim. In particularly when considering that there is also no persistent effect along the diagonal, it most likely indicates that there was no sustained pre-activation over time.

As noted above, we have acknowledged that the interpretation of the dynamically evolving spatial patterns remains speculative.

Reviewer #2:

This manuscript presents research aimed at investigating the hypothesis that specific words are pre-activated in the brain given a constraining semantic context. The authors test this hypothesis by presenting highly constraining sentences such that the final word in each sentence can be easily predicted. Moreover, they do so such that pairs of sentences are likely to be predicted to finish with the same word. They then examine the similarity of spatial and temporal patterns in MEG preceding the presentation of the final words. In particular they compare the similarity in these patterns between pairs of sentences with the same final predicted word and pairs of sentences with different final words. They find that both the spatial patterns and temporal patterns are MORE similar for sentences where the same final word is predicted than for sentences where different final words are predicted. They take this as evidence that specific lexico-semantic predictions are made by the brain during language comprehension.

This was a very well designed piece of research with interesting and compelling results. The manuscript was well written and the discussion seemed reasonable.

I have a few relatively minor comments and queries:

1) The nice study design included ensuring that the paired sentences didn't actually finish in the same word and that sometimes the sentence with an unexpected word would appear first and sometimes the sentence with the expected word would appear first. The authors argue that this means the results are not simply explainable on the basis that subjects might retain the expected final word in memory when reading the second sentence of a pair. However, it seems to me that, even though the unpredicted word has a much lower cloze, the subject might still retain that unexpected word in memory when hearing the second sentence of a pair. It doesn't seem that likely to me, but it's conceivable. I mean when a subject hears the unexpected word 'child', they might be more likely to retrieve that word when they are next presented with a sentence for which 'baby' is the "correct" prediction, but for which 'child' is a reasonable final word. So, much and all as I like the design, I do think it is still possible that retrieval of a previously stored word is still possible. One thing that I was unclear on (and sorry if I just missed it) was the actual ordering of the sentence presentation. Did the two members of a pair of sentences always appear consecutively? If so, this would make the idea of retrieval even more likely. If the 120 sentences are all just presented in a random order, then I guess it is unlikely. Again, sorry if I missed that.

A) We apologize for not making this clearer in the previous version of the manuscript. In the revised Introduction, we now clarify that “During the experiment, sentences were presented in a pseudorandom order, with at least 30 other sentences (on average 88 sentences) in between each member of a given pair.” In the Results, we state that “The sentences were constructed in pairs (120 pairs) that strongly predicted the same sentence-final word (SFW), although, during presentation, members of the same pair were separated by at least 30 (on average 88) other sentences.” In the Material and methods, we state that “the two members of each pair were presented apart from each other, with at least 30 (on average 88) sentences that predicted different words in between.”

B) We now discuss the possible influence of the order of the presentation of the two members of each sentence pair in the revised Discussion: “A second set of alternative interpretations might acknowledge that the increase in spatial similarity detected in the within-pair sentencesreflects activity related to the prediction of a specific SFW. However, instead of attributing the effect to the predicted representation itself, they might attribute it to participants’ recognition of a match between the word that they had just predicted and a word that they had actually seen as the SFW earlier in the experiment. This seems unlikely because we found that the spatial similarity effect was just as large when the unexpected SFW of a pair was presented before the expected SFW, as when the expected SFW was presented first (see Figure 2—figure supplement 3). It is, however, conceivable that participants recognized a match between the word that they had just predicted and a word that they had predicted earlier in the experiment (even though this predicted word was never observed). For example, there is some evidence that a predicted SFW can linger in memory across four subsequent sentences, even if it is not actually presented (Rommers and Federmeier, 2018). This seems less likely to have occurred in the present study, however, where each member of a sentence pair was separated by at least 30 (on average 88) other sentences.”

2) A minor query – were there different numbers of words in the sentences? Or always the same? And, relatedly, did the subject always know when the final word was going to appear? It's just that a pet worry of mine is the generalizability of language research done on isolated sentences that are very regular in their makeup. I imagine subjects get into an unusual mindset with linguistic processes overlapping with more general decision making strategies that may confound things. I don't think that's an issue here for two reasons: 1) it wouldn't explain why the data are more similar within sentences than between and 2) subject didn't have to make deliberative decisions at the end of each sentence. But still, it would be nice to get a sense of the variability (or lack of it) in the structure of the sentences.

In the Materials and methods, we have provided more information on the length of the contexts up until and including the SFW-1 (ranging from 4 to 12 words). Thus, to address the reviewer’s question, the lengths of these sentences were quite variable, ranging from 5 to 13 words. Therefore, participants wouldn’t have known when the SFW was going to appear.

Also, we now make it clear that the difference in the number of words was matched between pairs that constrained for same word (within-pairs) and pairs constrained for a different word (between-pairs): t(28678) = -1.26, p = 0.20.

3) Very minor – in subsection “Design and development of stimuli” there seem to be 109 pairs of sentences above 70% close and 12 that were lower. That makes 121, not 120.

We thank the reviewer for pointing this out. We have clarified that 11 pairs of sentences had cloze values that were lower than 70%.

4) In subsection “MEG data processing” the authors say "Within this 4000ms epoch, trials contaminated…were…removed…" How is there a trial within this epoch? Is the trial not the entire epoch? Or am I misunderstanding what you mean by a trial?

We apologize for the confusion. The trial refers to the entire epoch. In the Materials and methods, we have changed the sentence to “Trials (i.e. whole epochs) contaminated with muscle or MEG jump artifacts were identified and removed using a semi-automatic routine.”

Reviewer #3:

My two main concerns are as follows. First, that the specific method used for the RSA analysis -separating spatial and temporal dimensions of the data and using one dimension (spatial) to narrow down the testing time-window of the other (temporal) also narrowed down the scope of the effects uncovered. Second, that the sentences within- and between-pairs were insufficiently matched in terms of the syntactic and lexicosemantic characteristics of the words directly preceding the critical predicted word and observed effects could have been related to those differences rather than pre-activations. These two issues would need to be addressed before the strength and interpretation of the current set of effects could be fully evaluated.

Major points:

1) My main concert in terms of analysis methods is that the separation of the temporal and spatial components of the RSA analysis unnecessarily limited the kind of effects that were uncovered. For calculation of both spatial similarity time-series and cross-temporal spatial similarity matrixes all MEG sensors were included and hence the effects would be greatest in the time-points where many sensors simultaneously show similar activity for pairs of sentences. This means that strong and extended in time but spatially localised (or insufficiently distributed) effects might be missed. Especially since for determining the significant time-windows, vectors were averaged across subjects, which means that localised effects had even less chance of surviving given that the same effects in different subjects could appear in different sensors – due to the differences in head shape etc. Further the concern for the temporal RSA is that the time-window where the effects were tested (-880 -485, SFW aligned) were derived on the basis of the spatial analysis, where spatially continuous differences between within- and between-pair similarity values were found after applying an arbitrary cut-off (r>0.04).

To avoid these issues a spatiotemporal RSA could be carried out in the source space directly, or firstly in the sensor space (across sensors and time points) and then the significant spatiotemporal clusters could be source localised. For example, if beamforming is used to derive single-trial source estimates, then data RDMs can be derived using a modified version of the Searchlight approach (e.g Nili et al., 2014; Su et al., 2012 and 2014). For every trial, at every grid point and every time step (every 1/5/10 ms) a 3D data matrix is extracted consisting of activation from n of neighbouring grid points and n time-samples. Then for each pair of sentences predicting the same trials these data matrixes are correlated. Then the effects are averaged across all within-pairs producing grid point by time point spatiotemporal correlation values for the within-pair condition. The same can be repeated for between-pairs. Then a pair t-test can be done to compare within- and between- data across time and grid space, significant spatiotemporal clusters of differences would be determined with cluster permutation. If no major effects have been missed by separating spatial and temporal dimensions of the data, then spatiotemporal RSA would further validate the current set of results.

We thank the reviewer for the suggestion.

A) In the Results, we have pointed out that the analysis approach that we took is fairly conservative: “it was limited to the time window that showed a spatial similarity effect, and so it may not have captured more extended temporal similarity effects that were not accompanied by a spatial similarity effect”. As pointed out by the editor, “this methodological concern could explain the absence of some effects in the existing analyses, but not the presence of reliable effects”. Also, “substantial correction for multiple comparisons in the searchlight analysis might reduce sensitivity”. Therefore, we decided not to take this approach in the current study. Rather, “we were interested, a priori, in any functional relationship between these measures, i.e. whether the spatial similarity effect reflected brain activity associated with the prediction of spatially distributed semantic representations, and whether the temporal similarity effect reflected brain activity associated with temporal binding of these spatially distributed representations.”

However, we state that “in order to fully exploit the spatiotemporal pattern of the data, future studies could examine the spatial and temporal patterns simultaneously using a spatiotemporal searchlight approach (Nili et al., 2014; Su et al., 2012; Su et al., 2014).”

2) I have several questions about the experimental stimuli. Firstly, were the experimental sentences both between and within-pairs controlled for sentence length (n of words) and syntactic complexity (n of clauses, presence of embedded dependences)? The issue would arise if, for example, all within-pairs happened to have the same syntactic structure/complexity, while between-pairs had mismatching or different structure/complexity. Then the increases of the similarity before SFW for the within-pairs could potentially be attributed to similar demands of grammatical/syntactic processing, while decreased similarity for between-pairs would be driven by differences in these processing demands. The authors cover this potential caveat in subsection “Unique spatial patterns of neural activity are associated with the prediction of specific words, prior to the appearance of new bottom-up input” of the discussion, and argued that in this case we would see within- and between-pair difference arise earlier. However, while such differences could have been building up, they also could have become significant only closer to the end of the sentence. To exclude this option differences between within- and between-pair sentences should be reported.

Secondly, were the SFW-1 words (the word directly before the SFW) controlled for any of the following characteristics across conditions: syntactic class, frequency, any semantic characteristics such as imageability, concreteness? Again, if the within-pairs matched in terms of SFW-1 characteristics more than the between-pairs sentences effects in the 'prediction' time-window could be driven by similarities of the SFW-1 processing and not the by the SFW pre-activation. Since the critical claim of this paper is that increases in spatial and temporal correlation of the neuronal activity for the averaged within-pairs is driven by pre-activation of the SFW it is critical to exclude any of the effects described above.

We fully agree that it is very important to rule out other factors that could lead to a greater similarity in brain activity on the SFW-1 in the within-pair sentences than the between-pair sentences. We have made the following major changes to the manuscript to address this:

A) In the Materials and methods, we now state that we measured: (1) the number of words, the number of clauses, and the syntactic complexity of the sentence contexts up until and including SFW-1; (2) various lexical properties of the SFW-1 itself (i.e. visual complexity, word frequency, syntactic class); and (3) the predictability (as operationalized by cloze probability) of the SFW-1. We showed that none of these factors differed systematically between pairs of contexts that predicted the same SFW (i.e. within-pairs) and pairs of contexts that predicted a different SFW (i.e. between-pairs).

In Chinese, it is difficult to measure the orthographic or phonological features of the SFW-1 as a whole. This is because the characters within each word/phrase of the SFW-1 had distinct orthographic and phonological features. Also, as shown in the full set of stimuli (Figure 1A—source data 1), the SFW-1 could either be a content word (verb, noun, adjective, adverb) or a function word (pronoun, classifier, conjunction, particle, prepositional phrases). This makes it difficult to examine the concreteness or imageability of the SFW-1 in all sentences (there is no available Chinese corpus listing all these words). However, given the heterogeneity of the SFW-1, we think that these factors are unlikely to have influenced the observed effect.

B) We carried out a new control analysis that aimed to fully exclude the possibility that the spatial similarity effect was driven by bottom-up processing of the SFW-1 rather than by anticipatory processing of the SFW itself. In this control analysis, we selected a subset of between-pair sentences that contained exactly the same SFW-1, but nonetheless predicted a different SFW. Then we selected sentences that constrained for these same SFWs (within-pairs), but which differed in the SFW-1. We then compared the spatial similarity between these two subsets of sentence pairs. If the increased spatial similarity associated with the within-pairs versus between-pairs was due to lexical processing of the SFW-1, then the spatial similarity should be greater in sentence pairs containing exactly the same SFW-1 (i.e. in the subset of between-pairs) than in sentence pairs that predicted the same SFW (i.e. in the subset of within-pairs). We found no evidence for this. Instead, the spatial similarity remained larger for the within-pairs than the between-pairs (although in this subset analysis, the difference only approached significance due to the limited statistical power).

C) In the Discussion, we now explicitly discuss why the spatial similarity effect cannot be explained by the contexts of the sentence pairs or the lexical properties of the SFW-1.

3) This point is related to the conclusions drawn by the authors in the Discussion section about the nature of the pre-activated representations. The authors suggest that the effects observed in the pre-SFW window can be driven by orthographic or phonological features of the predicted words. Have any of the analyses they proposed (subsection “Unique spatial patterns of neural activity are associated with the prediction of specific words, prior to the appearance of new bottom-up input”) been carried out? Since sentences used for this study were indeed very constraining, SFW pre-activation of the perceptual features of strongly predicted words would be expected under the predictive processing/coding approach.

In the Discussion, we have explained why the current study cannot address the question of whether or not the observed effects reflect the prediction of just the semantic features or also the orthographic or phonological features of the predicted words: “It is also possible that the increased spatial similarity in association with sentence pairs that predicted the same word reflected similarities of predictions generated at a lower phonological and/or orthographic level of representation. On this account, comprehenders not only predicted the semantic features of words, but they also pre-activated their word-forms. The present study cannot directly speak to this hypothesis. This is because, for the most part, there is a one-to-one correspondence between the semantic features and the phonological or orthographic forms of words. However, the methods described here provide one way of addressing this question in future studies. For example, by examining the spatial similarity of sentence pairs that constrain for words that share orthographic features but that differ in their meanings (homonyms), it should be possible to dissociate the prediction of orthographic/phonological representations from the prediction of semantic features associated with a given lexico-semantic item.”

[Editors' note: further revisions were requested prior to acceptance, as described below.]

While the manuscript has been much improved there are two remaining issues that I think need to be addressed before acceptance, as outlined below:

1) There are too many places in the Introduction and Discussion in which I think the authors aren't thinking critically enough about whether it is only their preferred "generative and predictive" view that could explain the present findings. My view is that many other accounts could also explain their findings. Specifically, any model which: (i) activates a cumulative semantic representation of sentence meaning, and (ii) emphasises processing speed and efficiency such that semantic representations that are strongly implied by the words read so far, but not yet directly expressed in words are activated – can also account for the current findings. There are many such models in the literature, but most notable (to my mind) is the "sentence gestalt" model from St John and McClelland, 1990 that has been recently updated by Rabovsky et al., 2018, and can predict the magnitude of EEG N400 responses in a wide range of sentence processing paradigms. To my knowledge this is not a model which is explicitly "generative and predictive" and yet I think it very likely that RSA analysis of the sentence gestalt representations generated by this model could simulate the results of the present study. While I don't think that the authors need to do the work to explore whether the model *can* simulate their findings, I do think that it is in their interests to offer a more balanced overview of the literature and to more precisely explain what sort of computational model is implied by their findings.

Thank you for bringing up these points.

We agree that the idea that the brain predicts semantic features associated with specific words does not follow specifically from the type of generative framework of language comprehension sketched out in section 5 of Kuperberg and Jaeger, 2016 or by Kuperberg, 2016. In the revised Introduction, we have removed all mention of a generative framework. Rather, we simply state, “Prediction is hypothesized to be a core computational principle of brain function (Clark, 2013; Mumford, 1992). During language processing, probabilistic prediction at multiple levels of representation allows us to rapidly understand what we read or hear by giving processing a head start (see Kuperberg and Jaeger, 2016, for a review).” (Note that we cite Kuperberg and Jaeger here as a comprehensive review of a large literature on prediction at multiple levels of representation — we only discussed the generative framework in the final section of that paper).

In the Discussion, in response to a reviewer, we brought up the generative framework more specifically to explain the earliness of the spatial and temporal similarity effects: the prediction of the SFW was generated at the first point in time at which participants had sufficient information to unambiguously generate this prediction, which was after the onset of the penultimate word. We suggested that, in the sentence “In the crib, there is a sleeping …”, as comprehenders accessed the meaning of the word, <sleeping>, they may have also predicted the semantic features of <baby>.

We argued that “this type of account follows from a generative framework of language comprehension in which, following highly constraining contexts, comprehenders are able to predict entire events or states, along with their associated semantic features…”. Here, we referenced Kuperberg and Jaeger, 2016 (referring to section 5) as well as Kuperberg, 2016 — papers in which we had outlined what this type of framework might look like at Marr’s computational level of analysis. The recent paper by Rabovsky, Hansen and McClelland, 2018, and its predecessor (St John and McClelland, 1990) describe models implemented at Marr’s algorithmic level of analysis. They share similar assumptions to those outlined by Kuperberg and Jaeger, 2016 section 5 and Kuperberg, 2016. We now include both citations at this point.

Regarding the editor’s note that the latter two models “are not explicitly generative and predictive": As discussed by McClelland, 2013, many connectionist models, although implemented at Marr’s algorithmic levels of analysis, are inherently generative, and probabilistically predictive, with close links to probabilistic Bayesian frameworks. The model by Rabovsky, Hansen and McClelland, 2018, probabilistically infers hidden causes (events) after encountering sequential inputs, and it is therefore both generative and probabilistically predictive. Indeed, it is characterized as such at the beginning of the Materials and methods: “The model environment consists of [sentence, event] pairs probabilistically generated online during training according to constraints embodied in a simple generative model”. The framework outlined in section 5 of Kuperberg and Jaeger is similarly generative and probabilistically predictive, and we believe that the two frameworks share many core assumptions.

There is, however, perhaps one relevant difference in the assumptions of the probabilistic framework outlined by Kuperberg, 2016, and the model implemented by Rabovsky, Hansen and McClelland, 2018: Kuperberg, 2016, is clear that the N400 primarily reflects the (subjective) probability of semantic features associated with an input (word or other stimulus), given the probability distribution over the latent cause (events) inferred just before the semantic features of the incoming word become available from the bottom-up input. Rabovsky, Hansen and McClelland, 2018, however, do not explicitly include a semantic features layer in their model. On the other hand, they do include statements that suggest that what they are indexing is, in fact, changes in activity at the level of semantic features associated with an input word, given the event predicted by the preceding context (e.g. “The N400 corresponds to the amount of unexpected semantic information in the sense of Bayesian surprise”; the model “provides a basis (together with connection weights in the query network) for estimating these probabilities [of semantic features] when probed.”). It is currently unclear to us whether this simply amounts to a difference in modeling approach, or whether this amounts to a true difference in assumptions about architecture.

As regards the current set of findings, however, we find it helpful to understand the effects observed as reflecting commonalities in the predicted semantic features associated with the prediction of specific words, rather than purely reflecting similarities at the level of the entire event (or shift in state to get to this event). In other words, while it may be that the representation of semantic features associated with an individual word are inherently tied in with the event being inferred, we still find it helpful to refer to “semantic features” associated with this word descriptively, both in relation to the N400 as well as in relation to the current findings. To be more specific, in the paired sentences, “In the crib there is a sleeping…” and “In the hospital, there is a newborn…”, we think that the increased within-pair spatial similarity effects observed ultimately reflected the predicted semantic features associated with <baby>, rather than the similarities between the two predicted events as a whole: the <baby sleeping in the crib> event and the <newborn baby in the hospital> event. These two events are distinct except for the presence of the semantic features, <baby>. We have therefore added an additional sentence to make this clear, and here we reference Kuperberg, 2016, who is explicit in discussing the N400 as reflecting the probability of encountering a given set of semantic features, given the agent’s current probabilistic beliefs about event being communicated.

This section in the Discussion now reads as follows:

“…. This provides evidence that the prediction of the SFW was generated at the first point in time at which participants had sufficient information to unambiguously generate this prediction. For example, in the sentence “In the crib, there is a sleeping …”, as comprehenders accessed the meaning of the word, <sleeping>, they may have also predicted the semantic features of <baby>. This type of account follows from a generative framework of language comprehension in which, following highly constraining contexts, comprehenders are able to predict entire events or states, along with their associated semantic features, prior to the appearance of new bottom-up input (e.g. Kuperberg and Jaeger, 2016, sections 4 and 5; Kuperberg, 2016; St John and McClelland, 1990; Rabovsky, Hansen and McClelland, 2018). Importantly, however, we conceive of the within-pair spatial similarity effect detected here as primarily reflecting similarities at the level of semantic features (e.g. <human>, <small>, <crying>) associated with the predicted word (“baby”), rather than similarities between the entire predicted events (e.g. the <baby sleeping in the crib> event versus the <newborn baby in the hospital> event) (see Kuperberg, 2016). As noted above, we cannot tell from the current findings whether this, in turn, led to the top-down pre-activation of specific phonological or orthographic word-forms.”

Earlier in the Discussion, we made it clear that “It is also possible that the increased spatial similarity in association with sentence pairs that predicted the same word reflected similarities of predictions generated at a lower phonological and/or orthographic level of representation. On this account beliefs about the underlying event and semantic features led to the top-down pre-activation of information at these lower levels of the linguistic hierarchy before new bottom-up information becomes available (see Kuperberg and Jaeger, 2016, sections 3 and 5 for discussion). The present study cannot directly speak to this hypothesis.” Note that here we referred only to Kuperberg and Jaeger, 2016, sections 3 and 5, which, unlike Rabovsky, Hansen and McClelland, 2018, does assume a hierarchy and clear representational distinctions between events and phonological/orthographic word form.

2) I had one other minor question about the method that they used in comparing cloze probabilities between and within item pairs which could be addressed by same time. This point is described in more detail in their rebuttal letter than in the manuscript. However, I think that this issue deserves a little more attention in the manuscript given the known importance of cloze probability in predicting the magnitude of EEG/MEG signals during sentence processing, and the. Specifically, in the rebuttal letter the authors report analyses of the difference between cloze probability for sentence pairs. However, if my understanding of this analysis is correct this analysis should be conducted not on the difference between cloze probabilities, but rather the absolute difference between cloze probabilities for within and between item pairs. I think that otherwise the average difference between cloze values would always be zero. I'd like the authors to report this analysis in the manuscript, including a description of the method used for conducting the analysis.

We apologize for the confusion. In the Materials and method session of the manuscript, we now clearly describe how the analysis was conducted: “for each possible pair of sentences, we calculated the absolute difference in the cloze probability of the SFW-1 and carried out an independent sample t-test. Any differences in cloze probability were matched between pairs that constrained for the same word (within-pairs: 17.00% cloze difference) and pairs that constrained for a different word (between-pairs: 17.28% cloze difference), t(28678) = -0.136, p = 0.89.”

https://doi.org/10.7554/eLife.39061.019

Article and author information

Author details

  1. Lin Wang

    1. Department of Psychiatry, Harvard Medical School, Boston, United States
    2. Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, United States
    3. Department of Psychology, Tufts University, Medford, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    wanglinsisi@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6911-0660
  2. Gina Kuperberg

    1. Department of Psychiatry, Harvard Medical School, Boston, United States
    2. Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, United States
    3. Department of Psychology, Tufts University, Medford, United States
    Contribution
    Conceptualization, Supervision, Funding acquisition, Visualization, Writing—original draft, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6093-7872
  3. Ole Jensen

    School of Psychology, Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
    Contribution
    Conceptualization, Software, Supervision, Funding acquisition, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8193-8348

Funding

National Natural Science Foundation of China (31540079)

  • Lin Wang

Ministry of Science and Technology of the People's Republic of China (2012CB825500)

  • Lin Wang

Ministry of Science and Technology of the People's Republic of China (2015CB351701)

  • Lin Wang

National Institute of Child Health and Human Development (R01 HD08252)

  • Gina Kuperberg

James S. McDonnell Foundation (Understanding Human Cognition Collaborative Award: 220020448)

  • Ole Jensen

Wellcome Trust (Investigator Award in Science: 207550)

  • Ole Jensen

Royal Society (Wolfson Research Merit)

  • Ole Jensen

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was funded by the Natural Science Foundation of China (31540079 to LW), the National Institute of Child Health and Human Development (R01 HD08252 to GRK), and a James S McDonnell Foundation Understanding Human Cognition Collaborative Award (220020448), Wellcome Trust Investigator Award in Science (207550) and the Royal Society Wolfson Research Merit Award to OJ. It was supported in part by the Ministry of Science and Technology of China grants (2012CB825500, 2015CB351701). We thank Yang Cao and Yinan Hu for their assistance with data collection.

Ethics

Human subjects: The study was approved by the Institutional Review Board (IRB) of the Institute of Psychology, Chinese Academy of Sciences (H15037). Thirty-four students from the Beijing area were initially recruited by advertisement. All gave informed consent and were paid for their time.

Senior Editor

  1. Joshua I Gold, University of Pennsylvania, United States

Reviewing Editor

  1. Matthew H Davis, University of Cambridge, United Kingdom

Publication history

  1. Received: June 14, 2018
  2. Accepted: December 20, 2018
  3. Accepted Manuscript published: December 21, 2018 (version 1)
  4. Version of Record published: January 7, 2019 (version 2)

Copyright

© 2018, Wang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 871
    Page views
  • 172
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)