Executive Resources Shape the Impact of Language Predictability Across the Adult Lifespan

Merle Schuckart; Sandra Martin; Sarah Tune; Lea-Maria Schmitt; Gesa Hartwigsen; Jonas Obleser

doi:10.7554/eLife.108176.1

Introduction

We constantly rely on our ability to swiftly yet accurately process linguistic input. When reading a book, watching television, or navigating a car through busy traffic while following instructions, language prediction is considered a catalyst that enhances the efficiency of linguistic processing [1–5]. However, due to the inherent flexibility and richness of natural language, upcoming words can rarely be predicted from context with complete certainty [6]. Instead, linguistic features are thought to be pre-activated broadly rather than following an all-or-nothing principle, as there is evidence for predictability effects even for moderately- or low-restraint contexts across the entire predictability span [7–10]. This generation of graded predictions is sometimes described as being passive and cost-free [11]. However, it is still under debate whether maintaining such an elaborate process really incurs no cognitive cost.

Graded language predictions necessitate the active generation of hypotheses on upcoming words as well as the integration of prediction errors to inform future predictions [1,5]. Supporting this, recent evidence suggests that language predictions may indeed impose processing demands. Shain et al. (2024) found that reading time increases with decreasing word predictability, with even small drops in predictability of highly expected words leading to significant processing costs. These findings suggest that language predictions are not entirely automatic or effortless. This aligns with numerous neuroimaging studies arguing for a strong interaction between language-specific and domain-general executive brain regions [12–15], particularly in situations that are cognitively demanding [16–20].

In this context, domain-general executive resources refer to higher-level cognitive control processes, such as working memory, inhibitory control, and cognitive flexibility, that are crucial for managing and coordinating behaviour across a wide range of tasks and modalities [21–23]. These processes are supported by the multiple-demand (MD) system, a fronto-parietal network that is recruited in various cognitively challenging situations [22]. However, while numerous studies support the claim that language prediction relies on such domain-general executive resources, a body of research suggests the opposite (e.g., [24–27]). This raises the unresolved question to what extent such resources are taxed by predictive processes [5,28–30].

An interesting test case for the dynamic interplay between language-specific and domain-general systems is cognitive ageing, as advancing age has been shown to be associated with sensory and executive decline [31–33]. In the current study, we thus ask: How does language prediction change when executive resources are limited – both intrinsically due to advanced age, and extrinsically through increased task demands?

The age-related change in cognitive resources is reflected by longer linguistic processing times, especially in situations with high cognitive effort, such as dual-task processing [34,35]. However, previous research presents conflicting results on how cognitive ageing affects the use of linguistic predictions during language comprehension. Behavioural studies suggest a decline in using context to make semantic predictions with age [36] while EEG studies present a more nuanced picture. Some studies indicate heightened neural sensitivity to unexpected information in older adults [37], while others report no significant age-related differences in neural response [38,39].

Furthermore, it is unclear how the use of language predictions across the adult lifespan might be affected by increasing cognitive demands. According to the compensation-related utilisation of neural circuits (CRUNCH) hypothesis [40], older adults might earlier reach a point where their cognitive load capacity is fully exhausted than young adults, leading to a performance decline. If language prediction draws on executive resources, its effect on reading time might thus diminish with increasing cognitive load due to shared cognitive resources. However, this effect might re-emerge once capacity limits are reached, causing tasks to be processed sequentially, which would result in an increased effect of language prediction paralleled by slower performance.

Here, we explored the role of executive control in language prediction in two large cohorts across the adult lifespan. Using a novel dual-task paradigm that couples natural reading with an n-back task that taxes executive resources, we tested the following hypotheses: First, both increased cognitive load and reduced word predictability (i.e., increased surprisal) should be reflected by longer reading time (Fig. 1a and b), alongside a decrease in text comprehension and n-back task performance. Second, we expected that the formation of language predictions should be contingent upon the availability of executive resources. Most importantly, a gradual limitation of these resources due to increased task demands should result in diminished effects of language predictability on reading time (interaction between cognitive load and surprisal; Fig. 1c).

Visualisation of hypotheses.
We expected main effects on reading time of (a) cognitive load and (b) surprisal, as well as (c) an interaction of surprisal and cognitive load. Additionally, (d) we explored how these effects are modulated by age.

Lastly, we explored how these previously described effects would be modulated by age (Fig. 1d). Note that the literature allows for contradicting hypotheses: On the one hand, if language predictions are impaired under limited executive resources, older individuals should rely less on language predictions due to overall decreased executive resources [31,32]. This should be reflected by diminished predictability effects. On the other hand, given the presumed stability of language comprehension across the lifespan [41], older adults might instead rely more heavily on language predictions, thus fully compensating for any impairments in reading comprehension caused by sensory and executive decline [31–33]. In this case, we should see strong surprisal effects independent of age, or even stronger surprisal effects in older than younger adults.

The two large-sample reading-time studies presented here help resolve the ongoing discussion on the role of executive resources in language predictions with three key findings: First, across our large age range, we found a general increase in reading time with both low language predictability and high task demands. Second, higher task demands reduce the influence of language predictability on reading time, indicating that linguistic prediction relies on executive control resources. Third, predictability had a more pronounced effect on reading time in older adults compared to younger ones. These findings highlight the dynamic interaction between language predictability and executive resources.

Results

We report data from 175 participants (M = 44.9 ± 17.9, 18-85 years, 51% female) who were either tested online (N_online = 80) or in the laboratory (N_laboratory = 95) in one session. Moreover, we conducted an internal, pre-registered replication study involving another 96 participants (M = 39.8 ± 14.0, 18-70 years, 51% female) tested online to replicate our main findings (see Fig. S1 for age distributions of all samples). During the experiment, participants engaged in a self-paced reading task. They read 300-word newspaper articles, presented word-by-word in various font colours (Fig. 2a). The task was either performed in isolation (Reading Only) or paired with a competing n-back task on the words’ font colour (1-back and 2-back Dual Task). Participants were instructed to read the texts carefully, as content-related multiple-choice questions were asked after each text (see Methods for details).

Experimental design and quantification of predictability as word surprisal using a large language model (GPT-2).
**(a)** Participants were asked to perform a self-paced reading task (Reading Only) which was complemented in some blocks by a secondary n-back task on the font colour of the words (Reading + 1-back, and Reading + 2-back). The order of the blocks was pseudo-randomized, with Reading Only always being the first condition to be presented, followed by the two dual-task conditions, and another main block for each of the three conditions. Both dual-task paradigms (Reading + 1-back and Reading + 2-back) were first introduced in short single-task training sets. **(b)** We generated one surprisal score for each word in the reading material by using context chunks of 2 words as prompts for next-word-predictions in GPT-2. The resulting probability for the actual next word in the text (here: “mail”, marked in teal) was then transformed into a surprisal score, which reflected how predictable the respective word was given the context. Additionally, based on the distribution of probabilities for all possible continuations, we computed an entropy score, which reflects the uncertainty in predicting the next word. Please note that the example sentence used here has been translated to English for better comprehensibility, while the original text materials were in German.

Increased Cognitive Load and Older Age Reduce Task Performance

Results from a generalised mixed model (GMM) for text comprehension accuracy showed that participants read the texts carefully and answered most comprehension questions correctly, with accuracies of 93% ± 14% (mean ± SD) in the Reading Only condition, and 80% ± 25% in the 1-back and 73% ± 28% in the 2-back Dual Task conditions (Fig. S2b). Increases in cognitive load (OR_1-back = 0.253, z_1-back = -8.928, p_1-back < 0.001, OR_2-back = 0.156, z_2-back = -12.403, p_2-back < 0.001) and in age (OR = 0.986, z = -2.676, p = 0.013) were associated with poorer performance (see Methods for model details; Fig. 3a, Table S1).

Estimated marginal effects of predictors age, cognitive load and surprisal on task performance and reading time.
Main effects of cognitive load and age on accuracy in the comprehension question task (panel a) and on n-back task performance (d-primes; panel b). Please note that we do not show d-primes for the Reading Only task as there was no n-back task in this condition. Reading time increased with increasing age and word surprisal (panel c, left: results from linear mixed model, LMM, right: results from generalised additive model, GAM – for an explanation see section *Modelling Potential Non-Linear Distributions*). In panel d, we show the 2-way interaction of cognitive load and surprisal (left) and cognitive load and age (middle). In both cases, effects were strongest in the Reading Only condition (see barplot insets). Additionally, we show how age modulates the effect of surprisal on reading time (panel c, right). For raw and predicted individual trajectories, please see Fig. S2 and S3 in the Supplementary Material. Estimated marginal effects were adjusted for “Reading Only” as the reference level.

Similarly, d’ values demonstrated good performance in the n-back task, with mean d’ of 3.77 ± 0.8 in the 1-back and 2.12 ± 0.87 in the 2-back Dual Task condition (Fig. S2a). The overall high d’ values observed here can be attributed to the low target ratio in the experiment, resulting in high correct rejection and low false alarm rates. A linear mixed-effects model revealed n-back performance declined with cognitive load (β = -0.014, t(170.5) = -3.961, p < 0.001) and with age (β = -0.005, t(170.17) = 25.462, p > 0.001; Fig. 3b, Table S1).

Reading Time Increases with Age, Surprisal, and Cognitive Load

To operationalise language predictability, we calculated word surprisal scores using a 12-layered generative pre-trained transformer model (GPT-2; [42]; Fig. 2b). Word surprisal quantifies the predictability of the current word given its preceding context [10,43,44]. We chose a context length of two words, as constraining the context has been shown to increase GPT-2’s psychometric predictive power, making its next-word predictions more human-like [45]. In addition to each word’s surprisal, we also computed word entropy, which reflects the uncertainty in predicting the next word. Thus, while word surprisal indicates the predictability of each word, word entropy reflects the uncertainty underlying its prediction.

We used linear mixed-effects regression (LMM) to assess the effect of word surprisal on reading time and its interaction with cognitive load and age (see Methods for model details). Our model as a whole was able to explain 65% of variance (conditional R2) in single-word reading time from surprisal, n-back load, n-back performance and other linguistic and demographic variables, and up to 81% when additionally considering the variability across subjects (marginal R2; see Table S2).

As hypothesised, we observed significantly longer reading time with advancing age, independent of cognitive load condition (β = 0.009, t(178.46) = 9.199, p < 0.001). For illustration, each additional year of age increased reading time by roughly 1%, respectively (see Fig. 3c, top; Table 1; Fig. S3a).

Main results for model for reading time (N = 175).

To account for the potential influence of verbal intelligence and education on the observed age effects, we compared three additional LMMs: a baseline model without these predictors, and two additional models, one including a verbal intelligence predictor and the other including an education predictor (please see Methods section “Control Analysis: Assessing Potential Effects of Verbal Intelligence and Education“).

Adding verbal intelligence as a predictor to our model did not significantly improve the model fit (χ²(1) = 0.769, p = 0.381). This implies that verbal intelligence cannot account meaningfully for the differences in reading time found between younger and older participants.

When including years of education as a predictor, we observed a significant effect on reading time (β = 0.009, t = 9.199, p < 0.001), which led to a modest yet significant improvement in model fit (χ²(1) = 4.209, p = 0.0402, AIC_{model_baseline} = 37580, AIC_{model_education} = 37578). However, the 3-way interaction of age, surprisal and cognitive load we found in our original model remained significant (β_1-back = -0.00007, t_1-back(88.61) = -5.229, p_1-back < 0.001, β_2-back = -0.00004, t_2-back(88.90) = -2.784, p_2-back = 0.007), suggesting that while education has a significant effect on reading time, it cannot account for the age-related effects observed in our models.

Analogous to the effect of age, increased cognitive load was associated with significantly longer reading time (β_1-back = 0.474, t_1-back(176.18) = 34.046, p_1-back < 0.001, β_2-back = 0.792, t_2-back(173.76) = 30.339, p_2-back < 0.001; Table 1 and Fig. S2 and S3), indicating participants read more slowly when faced with a more challenging task. Even after excluding the Reading Only condition from the model and comparing only the two equally attention-demanding dual-task conditions 1-back and 2-back (control analysis), hereby controlling for attentional switching costs, this effect still held true (β = 0.339, t(178.98) = 16.221, p < 0.001; Table S4).

Moreover, in line with our hypothesis, we observed a consistent increase in reading time with higher surprisal (β = 0.002, t(2361.37) = 11.321, p < 0.001; Table 1 and Fig. 3c, bottom). Highly predictable words were read more quickly. Specifically, for a change in surprisal by one standard deviation, reading time increased by about 2.1%. For words and individuals matched in all other regards, this translates to a mean reading-time difference of approximately 118 ms between words with the minimum and maximum surprisal values in our dataset.

Cognitive Load Reduces the Impact of Surprisal

In line with our hypotheses, cognitive load significantly modulated the effect of surprisal on reading time (β_1-back = -0.001, t_1-back (287959.11) = -6.772, p1-back < 0.001, β_2-back = -0.001, t_2-back(288294.96) = -7.681, p_2-back < 0.001). While we observed a clear increase in reading time in the Reading Only condition when surprisal was high, this effect was mitigated in both the 1-back and 2-back Dual-Task condition, where cognitive load was increased (Table 1 and Fig. 3d, left).

Age Modulates the Effect of Cognitive Load

While increased task demands were associated with prolonged reading time across participants, we found this effect became less pronounced with advancing age (β_1-back = -0.003, t_1-back(171.99) = -3.606, p_1-back < 0.001; β_2-back = -0.002, t_2-back(170.79) = -1.690, p_2-back = 0.097; Table 1 and Fig. 3d, middle). To illustrate, when comparing the reading time in the 2-back Dual Task and the Reading Only condition, we found an increase of 130.48% in an average young (i.e., 27 years old) and an increase of 111.30% in an average older (i.e., 63 years old) participant.

Age Modulates the Effect of Surprisal

Older age was associated with stronger surprisal effects (β = 0.00004, t(287771.27) = 9.287, p <0.001; Table 1 and Fig. 3d, right), indicating that highly unpredictable (i.e. surprising) words were associated with a significantly longer reading time, especially in older adults. This suggests that, as individuals age, the effect of word predictability on their reading time becomes increasingly pronounced instead of remaining constant or diminishing.

Age Modulates the Interaction of Surprisal and Cognitive Load

Finally, we tested whether the observed interaction between cognitive load and surprisal changes with advancing age: We found that age indeed modulated the joint influence of cognitive load and surprisal on reading time, as reflected by a significant three-way interaction of surprisal, cognitive load and age (β_{1-back - Reading Only} = -0.00011, t_{1-back - Reading Only}(287807.34) = -12.2661, p_{1-back - Reading Only} < 0.001, β_{2-back - Reading Only} = -0.00008, t_{2-back - Reading Only}(287771.65) = -8.484, p_{2-back - Reading Only} < 0.001; Table 1).

Even after excluding the Reading Only condition from the model and contrasting only the dual-task conditions 1-back and 2-back (control analysis), results still showed a significant three-way interaction of age, cognitive load and surprisal (β_{2-back - 1-back} = 0.00003, t(188203.53) = 3.373, p = 0.001; Table S4). Please note the change in reference level caused by the exclusion of the Reading Only condition, which causes a reversal in the direction of the three-way interaction between age, cognitive load, and surprisal.

To get a more nuanced understanding of age-related difference in the effect of cognitive load and surprisal on reading time, we conducted a simple slopes analysis for our original model (Fig. 4, Fig. S4): Under low cognitive load (condition Reading Only), surprisal significantly influenced reading time in all but the youngest participants. Put simply, when participants read a text without having to perform an additional n-back task, predictable (i.e., low-surprisal) words yielded significantly shorter reading time than unpredictable (i.e., high-surprisal) words (Fig. 4a, left panel, first plot). This effect was most pronounced in the oldest participants (β_{85 - 18} _years = 0.006, p < 0.001).

Results of the simple slopes analysis and exemplary marginal effects plots for three different ages.
In the Johnson-Neyman plot [46] on the left side of panel (a), we show the effect of surprisal on reading time across the whole age range separated by cognitive load condition: *Reading Only* (top; blue), *1-back Dual Task* (middle; yellow) and *2-back Dual Task* (bottom; red). The stronger the surprisal effect for a certain age, the higher the value on the y-axis. Grey areas indicate age ranges for which we did not find an effect of surprisal on reading time in the respective condition, whereas blue areas indicate a significant surprisal effect (see inset on the right for a visualisation of a non-significant effect in a younger participant and a significant effect in an older participant). In panel (b), we show the predicted surprisal effect in each cognitive load for an average young (average age - 1SD), middle-aged (average age) and older participant (average age + 1SD). The bar plots illustrate the predicted effects of surprisal on reading time across the three cognitive load conditions for those three average participants.

As task demands increased, we observed a reversal of this pattern, with younger participants exhibiting stronger surprisal effects than older participants (condition 1-back; Fig. 4a, left panel, second plot). Notably, under increased cognitive load, even the youngest participants started showing significant surprisal effects (β_{85 - 18 years} = -0.0009, p = 0.04).

Finally, in the most demanding condition (2-back), the pattern shifted again, with older adults again showing stronger surprisal effects than their younger counterparts (β_{85 - 18 years} = 0.001, p = 0.003; Fig. 4a, left panel, third plot).

Comparing surprisal effects between cognitive load conditions revealed that older adults showed the most pronounced reduction in surprisal effects as cognitive load increased (Fig. 4b), which suggests they were more vulnerable to increased task demands than younger participants (18 year-old: β_{1-back - Reading Only} = -0.003, p < 0.001, β_{2-back - Reading Only} = -0.003, p < 0.001, β_{2-back - 1-back} = 0.0004, p = 0.06; 85 year-old: β_{1-back - Reading Only} = -0.011, p < 0.001, β_{2-back - Reading Only} = -0.008, p < 0.001, β_{2-back - 1-back} = 0.003, p < 0.001).

Modelling Potential Non-Linear Contributions

To account for the possibility that age, surprisal, and their interaction with cognitive load might demonstrate non-linear effects on reading time, we fitted a generalised additive mixed-effects model (GAM) to our data, using the same model structure as for the linear regression model (see Methods for model details). Results from the GAM revealed overall similar effects relative to the LMM. The effective degrees of freedom (EDF) of all continuous predictors were estimated above 1, confirming their non-linearity. Similar to the LMM, both our predictors of interest, surprisal and age, demonstrated significant effects on reading time (Fig. 3c, right), with surprisal showing a more non-linear trajectory than age (EDF for surprisal: 4.107, EDF for age: 3.028, both p < 0.001). The smoothing splines for the three-way interaction of age, surprisal, and cognitive load showed significant effects for all levels of cognitive load (Fig. S4 and S5). Similar to the LMM, we found the strongest effect for the Reading Only condition (EDF for Reading Only: 10.248, p < 0.001, EDF for 1-back: 2.017, p = 0.036, EDF for 2-back: 4.89, p = 0.024). The full model results are reported in Supplementary Table S5 and Figure S4.

Interaction of Age, Surprisal, and Cognitive Load Generalises to New Sample

To probe the reliability of our findings, we conducted an exact online replication of the original experiment. When comparing the results of the online participants of the original sample and the online replication sample, we find similar patterns, as illustrated in Figure 5. We again found a significant 3-way interaction of age, cognitive load and surprisal (β_1-back = -0.00002, t_1-back(161446.16) = -1.605, p_1-back = 0.1386, β_2-back = -0.00006, t_2-back(161861.30) = -3.871, p_2-back < 0.001; see Table S3, Fig. 5, and Fig. S6), albeit using a smaller sample size of 96 instead of 175 participants and a slightly different age distribution. Notably, the three-way interaction showed similar effect sizes and directions of effects in the replication sample relative to our original online sample.

Results of the internal online replication in comparison with the results of the online sample of the original study.
Shown are estimates ± CI for all main effects of surprisal, age, and cognitive load as well as their 2-way and 3-way interactions. RO: Reading only. For full results please see Table S3. For a comparison of age distributions in the original online and lab sample and the online replication sample, please see Fig. S1. Please note that effects are grouped by their magnitude.

Modelling Cumulative Effects of Surprisal on Reading Time

As noted above (see section Reading Time Increases with Age, Surprisal, and Cognitive Load), reading time increased as a function of word surprisal, with a mean difference of 118 ms between the most and least predictable word in our text material. This corresponds to approximately 22.91% of the average per-word reading time in the Reading Only condition (M = 522.04 ± 275.927 ms), highlighting a substantial effect of surprisal – especially considering that all other predictors were held constant when estimating the effect of surprisal. The substantial effect of surprisal is particularly notable given that the texts were edited for ease of comprehension and contained relatively low surprisal values overall (M = 18.165 ± 7.523), indicating that the observed reading time differences between high- and low-surprisal words likely underestimate the potential effect size in more complex texts with higher surprisal variability.

Notably, in everyday life, we typically encounter sequences of words, ranging from short phrases to texts of hundreds or thousands of words. Consequently, small word-level effects can accumulate and yield substantial processing differences over time (cf. [47]). Thus, to quantify this cumulative effect of surprisal, we predicted reading time for two average participants aged 27 (M - 1SD) and 63 (M + 1SD) years for a short example sentence. Predictions were conducted for the easiest cognitive load condition Reading Only, in which surprisal effects were most pronounced, and for the most challenging condition 2-back Dual Task, where surprisal effects were diminished. The example sentence comprised 14 words and had relatively low surprisal values (M = 15.24 ± 7.65; even slightly lower than in our original text material), implying that the cumulative effects of surprisal shown here are rather conservative estimations.

In the condition with the largest surprisal effects (Reading Only), surprisal led to a total cumulative increase in reading time of 73.6 ms for younger and 648 ms for older participants over the course of the sentence. In the more challenging 2-back Dual Task condition, we observed a total increase of 199 ms in younger and 485 ms in older participants (see Fig. 6b for cumulated effects of surprisal; see Fig. 6a for predicted reading time incorporating all effects for a comparison).

Cumulative Effect of Surprisal on Reading Time.
To illustrate the cumulative effect of surprisal on reading time over the course of a text, we predicted reading times for an average younger (27 years, M - 1SD) and average older (63 years, M + 1SD) participant in the easy Reading Only condition (blue) and the most challenging condition 2-back (Dual Task; red) and computed the cumulative sum for a short example sentence. Panel a illustrates how reading time gradually increases in total over the course of the sentence, with all predictors being held constant at their average, except for the predictors age, cognitive load and word length. In panel b, we again show cumulative reading times, this time isolating the effect of surprisal. Please note that surprisal values are zero for the first two words, as our GPT-2 model estimates surprisal based on the two preceding words, which are unavailable at the beginning of the sentence. The example sentence used in both panels is the German translation of the opening line of *Anna Karenina*, “Happy families are all alike, every unhappy family is unhappy in its own way” [48].

Discussion

Linguistic predictions are a powerful feature of language comprehension. But do they really come ‘for free’, or how much do they draw on executive resources? With the present study, we asked how language predictions change with increasing cognitive load, and how this interaction is modulated by age. To do so, we paired a self-paced reading task with a secondary n-back task on the font colour of the words.

First, as expected, and validating our overall approach, high cognitive load was associated with an increase in reading time as well as a decrease in task performance across age groups. This is consistent with previous studies using n-back tasks [49,50].

Next, as hypothesised, higher word surprisal slowed down reading, even when controlling for word length and frequency as well as prediction entropy. This effect of surprisal replicates findings from previous studies showing that highly unpredictable words are generally associated with a longer reading time [9,10,51–54].

Finally, we explored the relationship between surprisal and cognitive load across the (cross-sectional) adult lifespan. We hypothesised that increasing cognitive load should gradually impair the building of language predictions, which should surface in a diminished surprisal effect on reading time in conditions with high cognitive load. In line with our hypothesis, we found that the effect of word surprisal on reading time was modulated by cognitive load. Specifically, when cognitive load was high, the effect of surprisal on reading time was significantly diminished.

Interestingly, this interaction between surprisal and cognitive load was modulated by age. While age generally increased the reliance on language prediction, it also increased the susceptibility of this strategy to changes in available executive resources: Older adults showed the strongest relative reduction of the surprisal effect with increasing cognitive load. However, under high load, older adults still showed the strongest surprisal effect in absolute terms (Figure 4b). In a direct replication of the original experiment, we reproduced this finding, further confirming the reliability of our results.

Disentangling Effects of Attention Versus Executive Resources

When investigating the interaction of cognitive load and language predictability on reading time, we found that increased cognitive load diminished the effect of word predictability. We take this as evidence that executive resources are involved in the generation of language predictions.

Predictive processing has not only been suggested to be foundational to language comprehension [5,9,51], but it is also thought to be a core mechanism of the human brain [4,55,56]. Drawing parallels between domains can thus offer valuable insights into the mechanisms of language prediction: For instance, there is evidence from the visual domain showing that attending to the stimulus material is a prerequisite for predictive processing [57,58]. This observation from the visual domain can potentially be extended to our findings, wherein language predictions were diminished if attention had to be divided between the reading material and a challenging non-linguistic secondary task.

However, attention alone cannot fully account for the differences in sensitivity to word predictability shown here. The predictability effect did not only diminish when a secondary task was introduced but also decreased with increasing cognitive load even when attentional switching costs were held constant between conditions (i.e., when comparing the 1-back to 2-back load conditions). In line with previous literature [59–61], our results thus suggest that executive processes beyond attention, such as updating and maintenance of context information, shifting between tasks, and inhibition of irrelevant information [21,23], are integral to language prediction.

Language Prediction as a Compensatory Mechanism in Older Age

Further examining the effect of language predictability across the adult lifespan revealed interesting age differences. Namely, older adults showed stronger language predictability effects than younger participants. This effect held true even when controlling for potential differences in verbal intelligence and education between participants. Our result aligns with previous work on age-related changes in linguistic predictions, indicating heightened sensitivity to unexpected lexical input in older adults [37].

This finding may reflect a commonly reported pattern of greater reliance on intact vocabulary and world knowledge with age in the face of declining executive functions [13,32]. Here, we show that even under increased cognitive load, older adults still rely heavily on their refined system of linguistic predictions driven by their lifelong experience. This allows them to make more fine-tuned predictions but also renders them more vulnerable to unexpected information (i.e., high-surprisal words) than their younger counterparts.

Previous studies have found a larger vocabulary size associated with more rapid processing of language and improved language comprehension [62–67]. In line with this, older adults exhibit more advanced language processing abilities compared to children or younger adults due to their accumulated years of exposure to language and their increased vocabulary [68,69]. This accumulated skill is thought to serve as a compensatory mechanism for decline in working memory capacities or reduced executive functioning with age [70,71]. Indeed, speech comprehension appears to remain largely intact in older adults [41,72].

As a caveat, one should not disregard the possibility of age-related differences between younger and older adults in utilising formed predictions. While we assumed that formed predictions are utilised automatically, and that the observed differences in the effect of language predictability on reading time between individuals might thus be attributable to a difference in executive resources, one could also argue that individuals might simply weigh their formed predictions differently.

Akin to the longer exposure to language in older individuals, it is reasonable to assume that older individuals have also had more time to accumulate experience regarding the accuracy of their predictions, and to refine their predictions through prediction errors. Consequently, older adults might rely more heavily on their predictions than younger adults [73,74]. Additionally, due to age-related sensory decline [33], older adults might exhibit a stronger dependence on context-based predictions to process language, as incoming sensory information might be less informative [75–77]. A stronger reliance on language predictions could thus serve as a compensatory mechanism to facilitate language comprehension despite sensory decline in older adults.

How Can Limited Executive Resources Affect Language Prediction?

As shown in Figure 4, the relationship between word predictability and age depends on cognitive load. Under low load, predictability effects on reading time increased with age; older adults showed robust effects, while younger adults showed none. Under intermediate load (1-back), this pattern reversed, with younger adults showing stronger predictability effects than older adults.

This reversal begs for an explanation, and we deem it most likely to reflect differences in how executive resources are deployed. In low-load settings, young adults may process both predictable and unpredictable words efficiently, minimizing observable surprisal effects. The absence of a predictability effect in this group should therefore not be taken as evidence against predictive processing. Rather, it may indicate that prediction is less necessary when processing is fast and flexible. Under intermediate load, executive resources are partially taxed, revealing underlying prediction processes in young adults. At higher loads, predictability effects diminish again, suggesting resource constraints impair predictive processing.

In older adults, however, predictability effects decline already at intermediate load, consistent with the CRUNCH model [40], which posits that cognitive capacity limits are reached earlier in aging. At this point, resources are insufficient to maintain predictive processing while also performing the secondary task. Behaviourally, this may result in fluctuating performance or trial-wise switching between the two tasks. As load increases further, older adults continue to show reduced, though still measurable, predictability effects – indicating sustained but strained processing.

Taken together, the data suggest that both age groups experience a reduction in predictive processing when executive resources are limited, but the “crunch point” is reached earlier in older adults.

Limitations and Future Directions

One intriguing question that remains is how n-back performance and language surprisal interrelate. It is plausible that when texts are highly predictable (i.e., when surprisal is very low on average), the cognitive load associated with language processing is reduced. This reduction could free up domain-general executive resources, thereby enhancing n-back performance. Conversely, when surprisal is high, the increased demands of processing less predictable language may compromise working memory updating and lead to poorer n-back performance. As we assessed n-back performance at block-level and maintained equal predictability across texts, analysing trial-level effects of surprisal on n-back task performance was not possible. Future studies could address this limitation by systematically examining the relationship between surprisal and n-back performance at a more granular level.

Another important area for future research involves exploring potential age-related differences in task strategies within dual-task designs: We previously hypothesised that the differing effects of cognitive load on surprisal-driven reading times across age groups may reflect compensatory strategies in older adults. Given declines in executive control and working memory [31–33], older adults may prioritise language processing over multitasking under high load, whereas younger adults might distribute resources more flexibly. Future studies should thus examine how age affects our response strategies in dual tasks.

Conclusion

In summary, the present study contributes to resolving the debate about the cognitive cost of predictive language processing. The data offer the following key insights:

First, low language predictability as well as high task demands both have a detrimental effect on reading time. This holds true across a large age range. Second, we find that higher task demands diminish the effects of language predictability on reading time, replicably demonstrating that language prediction draws on resources of executive control. Third, the data reveal age-related differences in the use of linguistic predictions: High predictability had more leverage on reading efficiency in older than in younger adults but was also more sensitive to available executive resources.

Materials & methods

Participants

We recorded data from 178 participants, who were either tested online (N = 83) or in a controlled lab environment (N = 95). We excluded data from three participants from the online sample from further analysis, either due to technical issues (N = 1) or because they reported having been distracted during the task (N = 2). The resulting final sample comprised 175 participants aged 18 to 85 years (M = 44.9 ± 17.9, 18-85 years) with a balanced gender distribution of 51% female, 47% male, and 2% non-binary identifying participants. All participants were native German speakers with normal or corrected-to-normal vision, and intact colour vision (assessed by a screening test; Ishihara, 1987). Exclusion criteria were a history of psychiatric or neurological disorders, drug abuse, dyslexia, illiteracy or any impairments in language processing. Individuals who had consumed drugs or alcohol immediately prior to the study were not eligible for participation.

All participants from the online sample enlisted through the recruitment platform Prolific, whereas lab participants were recruited via an existing database of the Max-Planck-Institute. Lab participants above the age of 40 performed the Mini-Mental State Examination [78] to screen for cognitive impairments. Middle-aged participants (40-59 years) had a mean score of 29.57 ± 0.84, whereas older participants (60-85 years) scored 28.23 ± 1.62 points.

Ethics Statement

The study was conducted in accordance with the Declaration of Helsinki and was approved by the local ethics committees of the University of Lübeck and Leipzig University, respectively. Participants provided their written informed consent prior to participation, and received financial compensation (12€/h).

Study Design

During the experiment, participants were asked to read short newspaper articles which were adapted from articles from the news archive of a well-known German magazine (“Der Spiegel”). The nine selected articles were edited to be easy to understand, neutral in tone and non-emotional in content to avoid any influences of text content on reading time. Additionally, all texts were limited to a length of 300 words (trials). Participants were instructed to read the texts carefully, as content-related multiple-choice questions were asked after each one. The comprehension questions served both as a measure of reading comprehension as well as a motivator to pay close attention to the reading material.

During this self-paced reading task (Reading Only task), each word was presented individually on screen and participants proceeded to the next word by pressing the space bar on their keyboard (Fig. 2a). To ensure that each word was displayed at least briefly, the response window started after the word had been shown for a fixed period of 50 ms. Each word was presented centred on screen in one of four font colours (Hex codes: #D292F3 [lilac], #F989A2 [muted pink], #2AB7EF [cerulean blue] and #88BA3F [leaf green]) against a white background.

In four of the six main blocks, the self-paced reading task was complemented by a secondary n-back task (see Fig. 2a), in which participants were instructed to press a target button on the keyboard (“C” for right-handed, “M” for left-handed participants) whenever the font colour of the current word matched that of the previous (1-back Dual Task) or the penultimate word (2-back Dual Task). Participants were still required to press the spacebar to advance to the next word after pressing the target button. Reaction times were recorded for both kinds of responses. Neither the Reading Only task nor the 1-back and 2-back Dual Task blocks were speeded, allowing participants to complete the experiment at their own pace.

Before being introduced to this combination of reading and n-back task (see Fig. 2a), participants could familiarise themselves with the n-back paradigm in short, non-linguistic, single-task blocks comprising coloured rectangles as stimuli. These blocks served as an introduction to the nature of an n-back task, and to quantify participants’ working memory abilities. For all main blocks with an n-back task, a target ratio of 16.667% was used. The low target-ratio was chosen to prevent an excessive number of n-back reactions during the dual-task blocks.

Taken together, the experiment comprised three cognitive load conditions: A baseline condition, comprising a reading task without additional n-back task (Reading Only), a reading task with an additional 1-back task (1-back Dual Task), and a reading task with an additional 2-back task (2-back Dual Task). Each condition was presented in two blocks, each comprising 300 trials. For each block, one of nine texts was randomly selected, with no text occurring more than once.

All experiments were implemented using lab.js [79]. Online studies were hosted on OpenLab – a server-side platform designed specifically for lab.js experiments [80] – and data were saved on OSF [81].

Generation of Word Surprisal and Entropy Scores

The predictability of each word was operationalised via word surprisal (Fig. 2b), which reflects the predictability of the current word given its preceding context [10,43,44]. A word’s surprisal score is defined as the negative logarithm of the word’s probability given its context [43]:

If a word w_n has a high surprisal score, its occurrence given its preceding context w₁, w₂,…, w_n-1 has a low probability, rendering it highly unpredictable (i.e. surprising).

In addition to each word’s surprisal, we also computed the entropy of the probability distribution for each predicted word given its context (Fig. 2b), which reflects the uncertainty in predicting the next word. The entropy is defined as the negative sum of the product of the probability of each word in the vocabulary and its respective logarithm probability [82], or – put simply – as the average surprisal of all possible continuations in the vocabulary [83]:

If entropy is low, only one or a few possible words in the vocabulary are assigned high probabilities of being the actual next word, hence indicating low uncertainty about which word will come next. This is usually the case if the previous context is very restricting. Conversely, if a vast amount of words in the vocabulary would be suited as continuations for the given context, and the probability distribution across words is fairly uniform, word entropy – and thus uncertainty about which word will come next – is high. Taken together, word surprisal signifies the predictability of each word whereas word entropy signifies the uncertainty underlying its prediction.

In the current study, we computed surprisal scores as well as one entropy score for each word in the experimental texts (mean surprisal: 18.165 ± 7.523, mean entropy: 4.067 ± 0.932; see further information in the supplementary material). We chose a context length of two words, as constraining the context has been shown to increase GPT-2’s psychometric predictive power, making its next-word predictions more human-like [45].

To generate word entropy and surprisal scores, we used a 12-layered GPT-2 model [42], which was pre-trained on German texts by the MDZ Digital Library team (dbmdz) at the Bavarian State Library, and the corresponding tokenizer, both available from the Hugging Face model hub [84]. Scores were calculated using Python version 3.10.12 [85].

Analysis

Preprocessing

To gauge participants’ response accuracy in the n-back task, we computed the detection-prime (d’) index. This measure quantifies the ability to distinguish between target and non-target stimuli, in our case trials (i.e. words in dual-and rectangles in single-task blocks) with colour repetitions and trials where the current colour doesn’t match the colour from the nth previous trial. A d’ value of 0 signifies an inability to discriminate between signal and noise stimuli, suggesting that participants indicated they saw a target in either no or all trials. Thus, we excluded all dual-task blocks with d-primes of 0 from further analyses, which affected only five participants. In total, we excluded three main blocks in the online sample and two main blocks in the lab sample.

After each text block, participants were asked to answer three multiple-choice questions as a measure of reading comprehension. For each question, we showed four response options, with only one of them being correct. To ensure participants performed the tasks as intended and read the words on screen, we excluded all blocks where none of the questions were answered correctly. In total, we excluded one 1-back and seven 2-back Dual Task blocks from the online sample, as well as four 1-back and eleven 2-back Dual Task blocks from the lab sample, from datasets of six and fourteen participants, respectively.

Lastly, we preprocessed the reading time data: First, any trial exhibiting a raw reading time exceeding 5000 ms was considered an extended break and subsequently excluded from further analyses. This cutoff was selected arbitrarily, based on the observation that participants tested in the lab did not exhibit trial durations exceeding 5000 ms. Therefore, we assume that participants tested online may have been distracted and less focused on the experiment during trial durations of this length.

To further remove outliers, we followed the procedure recommended by Berger and Kiefer (2021) to ensure comparable exclusion criteria for long and short outliers in typically skewed reading time: First, reading times were transformed using the Proportion of Maximum Scaling method (POMS; [86]). We POMS-transformed the data on block level to account for potential differences in reading time distributions between blocks. The square root of each value was then taken to ensure a symmetric distribution. Following this, we z-transformed the data and excluded all trials from further analysis where z-scores fell outside a range of -2 to 2 [87,88]. Taken together, we excluded 12968 trials (4.117% of all trials from the main blocks) with an average of 74.103 ± 12.654 excluded trials per participant.

To facilitate interpretability of units in the results, we subsequently continued working with the raw reading times, which had been cleaned of outliers at this stage, and log-transformed them for statistical analysis.

Statistical Analysis of n-Back Responses and Comprehension Questions

To ensure the validity of our cognitive load manipulation in the dual-task blocks, we examined whether increased cognitive load induced a decline both in n-back task performance – as indicated by reduced d-primes – and the accuracy in the comprehension question task, as reflected by a lower number of correct answers. We employed a linear mixed-effects model (LMM) for d-primes and a logistic linear mixed-effects model (GLMM, logit link function) for comprehension question accuracy.

In both models, we included recording location (online or lab), cognitive load and continuously measured age (centred) in both models as well as the interaction of age and cognitive load as fixed effects. In the model for the d-primes, we additionally included measures of comprehension question accuracy (on participant and block level) as well as the block number as fixed effects to control for different response strategies and tiredness effects, respectively. Moreover, we included the mean d-primes from the 1-back and 2-back Single Tasks as a working memory measure.

We assigned simple coding schemes to the factors recording location and cognitive load. While the model for d-primes included by-participant random slopes for cognitive load, the model for comprehension question accuracy comprised random intercepts for participants. Both models included random intercepts for participants and texts.

d-prime ∼ mean d-prime from single tasks +

mean comprehension question accuracy +

block-level deviation from mean comprehension question accuracy + recording location + block number +

age * cognitive load + (1 + cognitive load | ID) + (1 | text number)

Note. Structure of the model for d-primes in the n-back task in dual-task blocks. The variable age was centred and the variable cognitive load encompassed only two levels (1-back and 2-back) as there was no n-back task in the Reading Only condition.

comprehension question accuracy ∼ recording location +

age * cognitive load +

(1 | ID) + (1 | text number)

Note. Structure of the model for accuracy in the comprehension question task. The variable age was centred. We used a binomial family distribution with a logit link function for modelling the comprehension question accuracies.

Statistical Analysis of Reading Times

We explored the effects of cognitive load, age and surprisal as well as their 2-way and 3-way interactions on log-transformed reading times using a LMM. The model included an interaction of age, surprisal, and cognitive load motivated by our hypotheses as well as additional fixed effects to control for nuisance effects. The final selection of fixed and random effects structure was based on highest R² values.

We included the reading time of the previous word as a fixed effect to control for potential nuisance effects such as post-error-slowing following a missed n-back target in the previous trial, or sequential modulation effects if the previous trial was ended prematurely, leading to an extended reading time carried over to the current trial. As we only modelled reading times from trials where we had surprisal scores, the first two trials of each block were not included in the statistical analyses.

As response strategies may differ between individuals, but also within an individual from block to block, we included two different regressors representing these distinct between-vs. within-participant effects on reading time. Between-participant effects were modelled by the individual mean comprehension performance whereas within-participant effects were modelled by the block-level deviation from this mean (cf. [89,90]). We further included block-wise d-primes and participant-wise mean single-task d-primes as a proxy of each participant’s working memory capacity. By incorporating block- and participant level performance measures, which are designed to be sensitive to task difficulty, we accounted for the potential variation in perceived task load between age groups or samples. For instance, the 2-back task might present a greater challenge for an older individual compared to their younger counterpart, therefore rendering the tasks not entirely comparable between age groups if not appropriately controlled for.

The remaining fixed effects entailed the recording location (online or lab), word frequency (as estimated using Python’s wordfreq package; [91]), word length, as well as the position of block and trial in the course of the experiment as main effects. Furthermore, we included entropy as a fixed effect to account for the uncertainty in the prediction of the next word. Surprisal and entropy values were weakly correlated (r = 0.29, p < 0.001). To account for the delay in reaction time associated with n-back responses, we included n-back reaction as a binary predictor in our models.

Finally, we included the three-way interaction of age (as a continuous predictor), cognitive load (on 3 levels: Reading Only, 1-back Dual Task and 2-back Dual Task; contrasted using a simple coding scheme with the Reading Only condition as the reference level), and surprisal score (continuous predictor). This entails the implicit inclusion of all two-way interactions of age, cognitive load and surprisal, as well as the main effects of those variables.

Random effects included random intercepts for participants, the effect of text, the current word, the colour of the current word, and by-participant random slopes for cognitive load. All continuous predictors were centred.

log(RT) ∼ RT of previous word +

block-level d-prime + mean d-prime from single tasks +

mean comprehension question performance +

block-level deviation from mean comprehension question performance +

recording location + entropy +

word frequency + word length (without punctuation) +

n-back reaction + block number + trial number +

surprisal * age * cognitive load +

(1 + cognitive load | ID) + (1 | text number) + (1 | word) + (1 | colour)

Note. Model structure. RT = Reading Time, ID = participant.

To gain a more nuanced understanding of the three-way interaction of age, cognitive load and surprisal, we performed a subsequent simple slopes analysis. This analysis allows exploring the interaction of two continuous predictors, in our case quantifying the slope of the surprisal effect in each of the three cognitive load conditions as a function of age. This way, we determined for which age range and cognitive load condition we observed a significant effect of surprisal on reading time.

For all models, including control analyses and the internal replication, p-values were obtained using ANOVAs with type III sums of squares. Degrees of freedom for p-values and standard errors were estimated using Satterthwaite’s approximation for all LMMs, and Wald’s approximation for the GLMM and GAM [92,93]. All effects reported are significant on an alpha level of 0.05 after FDR-correction for multiple comparisons [94].

All analyses were carried out in R version 4.2.2 [95] using the packages gratia, interactions, lmerTest, lme4, mgcv, modelbased, and sjPlot [96–101].

Control Analysis: Dissociating Cognitive Control from Attention

To disentangle attentional and cognitive load effects, we modelled reading times using an additional linear mixed model of the same structure as described before but contrasting only the 2-back Dual Task condition with the less demanding 1-back Dual Task condition. The two Dual Task conditions only differ in cognitive load, but not attentional switching costs, which means any effects of cognitive load can be attributed to the cognitive load manipulation, with attentional demands held constant.

Control Analysis: Assessing Potential Effects of Verbal Intelligence and Education

To ensure potential effects of verbal intelligence or education did not unduly influence our findings, we analysed data from 95 lab participants who reported their formal education in years (M = 18.0 ± 3.343, range = 11–30) and completed a lexical decision task – the Spot-the-Word test [102] –, where they were asked to identify the word in pairs of words and non-words. Each participant’s score on this test (M = 32.021 ± 2.993, range = 21–37) provided a measure of their verbal intelligence.

To assess the potential effect of education and verbal intelligence on reading time, we fitted three additional LMMs: The first model mirrored the structure of the original LMM used to analyse log-transformed reading times (see section Statistical Analysis of Reading Times for the model structure), with one key modification: The predictor for recording location was excluded, as all participants were tested in a single location. The remaining two models followed the same structure, with the inclusion of centred education scores and centred Spot-the-Word test scores as additional predictors to account for education and verbal intelligence, respectively. We then statistically compared the baseline LMM with the two other LMMs using an ANOVA to determine whether verbal intelligence or education significantly improved the model fits.

Control Analysis: Modelling Non-Linear Effects of Age

We also fitted a generalised additive model (GAM) to our data to allow for non-linear relationships of the predictor variables with reading time, as it has been shown that reaction time is oftentimes modulated by predictor variables in a non-linear way [103,104].

Specifically, we employed a GAM to account for a potential non-linear relationship of age and surprisal with reading time. To this end, all continuous predictors were fitted with thin-plate regression splines and the interaction of surprisal, age, and cognitive load was fitted via a tensor product smooth with individual curves for each level of cognitive load. The number of basis dimensions for each smoothing spline was checked via model diagnostics available in mgcv after the first model set-up and appropriately updated to reach a k-index > 1.01 and p > 0.05 to avoid oversmoothing. The random effect structure was set up similarly to the LMM.

It is important to note that the outcomes of GAMs and LMMs can differ: GAMs are particularly adept at identifying localized, nonlinear changes in predictor effects on reading time that may be overlooked by LMMs. As a result, the effects obtained from LMMs and those derived from GAMs are based on distinct metrics, which complicates direct comparisons between the two approaches.

Internal Replication

To ensure the reliability of our findings, we conducted an internal replication of the previously described experiment. This internal replication was preregistered on OSF (doi: 10.17605/OSF.IO/SU6VX).

Replication Sample

As outlined in the preregistration, we conducted an online study with a sample of 100 participants. We excluded data from four participants from further analysis, either due to technical issues (N = 2) or because they reported having been distracted during the task (N = 2). The resulting final sample comprised 96 participants aged 18 to 70 years (M = 39.750 ± 13.996 years) with a balanced gender distribution of 51% female, 48% male, and 1% non-binary identifying participants. As in the original experiment, all participants were native German speakers with normal or corrected-to-normal vision and intact colour vision without dyslexia, illiteracy, a history of psychiatric or neurological disorders or drug abuse. Individuals who had consumed drugs or alcohol immediately prior to the study were not eligible for participation.

Ethics Statement

The study was conducted in accordance with the Declaration of Helsinki and was approved by the local ethics committee of the University of Lübeck. All participants were recruited online through Prolific and only tested upon having provided their written informed consent. Participants received a financial compensation of 12€/h.

Replication Analyses

Analogous to the original experiment, we first cleaned the reading time data of trials exceeding a duration of 5000 ms as well as outliers (see section Preprocessing), which affected 4.036% of all trials from the main blocks with an average of 72.656 ± 11.089 excluded trials per participant. The structure of the statistical model for the replication analysis was analogous to the model for the original analysis of reading times (see section Statistical Analysis of Reading Times), except for the predictor of recording location, which was excluded as we only analysed data collected online. To compare results from the original experiment and the replication, we fitted the model once using the data of the online sample from the original experiment and once using the new online replication datasets.

Given that we simplified the analysis approach in the original study after having preregistered the replication, we deviated from the analysis plan described in the preregistration and made the same modifications here, resulting in the use of word surprisal for only one context length instead of four and, consequently, only one LMM instead of several.

Data Availability

All experimental and analysis scripts are publicly available in the project’s Github repository https://github.com/MMarieSchuckart/EXNAT-1. Raw data as well as model outputs, results, and model comparisons will be made available upon publication in the project’s OSF repository: https://osf.io/dtwjk/.

Acknowledgements

We thank Christian Koblitz and Marcel Blumenthal for their help with data acquisition in the lab.

Additional information

Funding

This work was supported by German Research Foundation (DFG, OB 352/2-2 to JO, and HA 6314/4-2 to GH). GH was supported by Lise Meitner Excellence funding from the Max Planck Society and the European Research Council (ERC-2021-COG 101043747).

Author contributions

M.S.: Conceptualization, data curation, formal analysis, investigation, methodology, project administration, visualisation, writing: original draft, review & editing

S.M.: Conceptualization, data curation, formal analysis, investigation, methodology, project administration, writing: original draft, review & editing

S.T.: formal analysis, methodology, writing: review & editing

L-M.S.: methodology, writing: review & editing

G.H.: Conceptualization, funding acquisition, supervision, methodology, writing: review & editing

J.O.: Conceptualization, methodology, funding acquisition, supervision, writing: review & editing

Declaration of AI and AI-assisted technologies in the writing process

During the preparation of this work the authors used ChatGPT 3.5 to rephrase sentences. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Preregistration

The internal replication of the main experiment was preregistered on OSF (doi: 10.17605/OSF.IO/SU6VX).

List of Legends for the Supplementary Materials

Figure S1. Comparison of age distribution between samples.

Figure S2. Task performance and reading time by age and cognitive load condition. Task performance (d-primes) in conditions with an n-back task (a). Accuracy in the comprehension question performance task (b). Reading Time by age and condition (c). Reading time by condition (d). Solid line: M, shaded area: 95% CI, point: mean reading time for one participant in the respective condition.

Figure S3. Individual predicted reading time. Reading time by age (a) and condition (b).

Figure S4. Comparison of factor smooths for different levels of cognitive load from the three-way interaction of age, surprisal, and cognitive load. The difference smooths show slightly stronger effects of high surprisal in young than older adults for the 1-back relative to the 2-back condition, and stronger effects of high surprisal in older adults for the Reading only relative to the n-back condition.

Figure S5. Comparison of the results of the LMM and GAM control analyses. Panel c illustrates the interaction between cognitive load and surprisal for a representative younger and older participant, estimated using the LMM (left) and the GAM (right). For a complementary visualisation of the three-way interaction between age, cognitive load, and surprisal, see Fig. S4. For a visualisation of the main effects of age, surprisal and cognitive load on reading time, please see Fig. 3c.

Figure S6. Three-way interaction of age, surprisal, and cognitive load in the replication sample.

Table S1. Results from models for task performance measures (N = 175).

Table S2. Results from models for reading time for full original sample (N = 175).

Table S3. Results from models for reading time for original online sample and online replication sample (N = 80 and N = 96, respectively).

Table S4. Results from models for control analysis (1-back vs. 2-back) of reading time for full original sample (N = 175).

Table S5. Results from GAM for control analysis of reading time for full original sample (N = 175).

Funding

DFG (OB 352/2-2)

DFG (HA 6314/4-2)

European Research Council (ERC-2021-COG 101043747)

Additional files

Supplementary Material

Significance of findings

Strength of evidence

Abstract

Introduction

Visualisation of hypotheses.

Results

Experimental design and quantification of predictability as word surprisal using a large language model (GPT-2).

Increased Cognitive Load and Older Age Reduce Task Performance

Estimated marginal effects of predictors age, cognitive load and surprisal on task performance and reading time.

Reading Time Increases with Age, Surprisal, and Cognitive Load

Main results for model for reading time (N = 175).

Cognitive Load Reduces the Impact of Surprisal

Age Modulates the Effect of Cognitive Load

Age Modulates the Effect of Surprisal

Age Modulates the Interaction of Surprisal and Cognitive Load

Results of the simple slopes analysis and exemplary marginal effects plots for three different ages.

Modelling Potential Non-Linear Contributions

Interaction of Age, Surprisal, and Cognitive Load Generalises to New Sample

Results of the internal online replication in comparison with the results of the online sample of the original study.

Modelling Cumulative Effects of Surprisal on Reading Time

Cumulative Effect of Surprisal on Reading Time.

Discussion

Disentangling Effects of Attention Versus Executive Resources

Language Prediction as a Compensatory Mechanism in Older Age

How Can Limited Executive Resources Affect Language Prediction?

Limitations and Future Directions

Conclusion

Materials & methods

Participants

Ethics Statement

Study Design

Generation of Word Surprisal and Entropy Scores

Analysis

Preprocessing

Statistical Analysis of n-Back Responses and Comprehension Questions

Statistical Analysis of Reading Times

Control Analysis: Dissociating Cognitive Control from Attention

Control Analysis: Assessing Potential Effects of Verbal Intelligence and Education

Control Analysis: Modelling Non-Linear Effects of Age

Internal Replication

Replication Sample

Ethics Statement

Replication Analyses

Data Availability

Acknowledgements

Additional information

Funding

Author contributions

Declaration of AI and AI-assisted technologies in the writing process

Preregistration

List of Legends for the Supplementary Materials

Funding

Additional files

References

Article and author information

Author information

Merle Schuckart✽

Sandra Martin✽

Sarah Tune

Lea-Maria Schmitt

Gesa Hartwigsen★

Jonas Obleser★

Author Notes

Version history

Cite all versions

Copyright

Metrics

Merle Schuckart

Sandra Martin

Gesa Hartwigsen

Jonas Obleser