Introduction

Learning of new words is essential in language acquisition. During the learning process of unknown words, top-down influences can facilitate perception and encoding of these words depending on the similarity between the new words and the pre-existing knowledge. For example, previous studies have shown that artificial words with similar phonotactic properties to real words are easier to learn and to remember1,2, suggesting that pre-existing knowledge about the sound structure (i.e. schemas) facilitates encoding and consolidation of new words. A synchronized interplay between the ventromedial prefrontal cortex, hippocampus and unimodal associative cortices is proposed to explain schema instantiation and schema mnemonic effects3 at the neural level. To describe the influence of phonology on memory processing within this framework, new words with phonetic structure congruent to activated schema variables are inserted with less cognitive effort and difficulty into pre-existing knowledge networks of long-term memory in contrast to schema-incongruent words.

According to the complementary system account by Gaskell and colleagues4,5, learning of new words comprises a first stage of rapid initial familiarization represented by neural activities of the medial temporal lobe and a second stage of slow offline consolidation of the neocortex. The model suggests that, after initial hippocampal learning, through sleep consolidation memory representations of the words are gradually integrated into long-term lexical and phonological memory structures of the neocortex. Additionally, these processes of memory consolidation might depend on spontaneous and repeated reactivation of the newly formed memory traces during non-rapid eye-movement (NREM) sleep6. The active system consolidation hypothesis proposes synchronized hippocampal sharp-wave-ripples, thalamocortical sleep spindles and slow oscillations as underlying activities of memory reactivation during sleep. Accordingly, studies in humans79 and in rodent models10,11 found that sleep spindles and hippocampal sharp wave ripples occur grouped during slow oscillations and that these events correlate with memory consolidation6,1215.

While similarity of the learning material to pre-existing knowledge clearly facilitates encoding, this notion is controversial for the effects on sleep consolidation. For example, Havas and co-authors16 tested learning and consolidation of artificial words phonologically derived either from participant’s native language (L1 words in Spanish) or foreign language (L2 words in Hungarian). Here, sleep-associated improvements in memory recognition were only observable in the condition of the L2 words, suggesting that a reduced similarity to pre-existing knowledge facilitates consolidation during sleep. Also, in a study by Payne et al.17, sleep-associated memory benefits occurred only for semantically unrelated word-pairs, but not for semantically related words.

On the other hand, Zion and colleagues18 reported that sleep facilitated consolidation in a second language learning task in participants with a higher degree of meta-linguistic knowledge, indicating a beneficial effect of pre-existing knowledge on new word consolidation. Similarly, Durrant and colleagues19 reported that only schema-conformant but not non-conformant learnings benefitted from sleep. In a study from our own lab, German-participants learning unknown Dutch vocabulary profited more from sleep compared to native French speaker when given the same amount of learning trials, whereas German native speakers might have more pre-existing knowledge about the Dutch words in relation to the French native speakers, which facilitated learning and consolidation of the new Dutch vocabulary20.

A promising approach to examine sleep-associated memory consolidation is to experimentally bias reactivation by re-presenting olfactory2123 or auditory24,25 reminder cues during sleep, a technique known as targeted memory reactivation (TMR). TMR has been related to elevated slow-wave and spindle activity following cueing presentations as neural markers of memory reactivation26,27. So far two studies suggest that prior knowledge increase the effectiveness of TMR: Groch and colleagues28 reported improved memory performance by TMR for associations between pseudo-words and familiar objects, but not for word associations of new/unknown objects. Here, before sleep subjects showed as well better encoding performance for the familiar compared to the unknown objects. In another study29 of learning sound-object locations, the effectiveness of TMR correlated positively with the initial encoding performance prior to sleep. Additionally, both studies28,29 found significant associations between TMR’s effectiveness and sleep spindle activities after the cueing presentations. These findings suggest that higher similarity between the learning material and the pre-existing knowledge facilitates reactivation and consolidation during sleep. However, it is unknown whether these findings can be generalized to learning of new artificial words with different levels of phonotactical similarity to prior word-sound knowledge, as a factor of learning difficulty.

To address this issue, we designed artificial words with varying difficulty levels by phonotactical properties. Healthy young participants learned to categorize these words into rewarded and unrewarded words. By using auditory TMR, we re-presented the artificial words during subsequent sleep to study its effect on categorization performance. As a manipulation check, we expect that artificial words with more familiar phonotactical properties would be easier to learn compared to words with less familiarity. Regarding previous findings28,29, we hypothesized higher effectiveness of TMR for the easy to learn words in comparison to the difficult words, accompanied by higher oscillatory activity on the slow-wave and spindle range.

Results

Experimental design

To study the impact of difficulty in word learning on TMR, we created a novel learning paradigm. Here, we formed four sets of artificial words (40 words per set) consisting of different sequences of two vowels and two consonants (Fig. 1a). Comparison analyses between the sets revealed significant differences in phonotactic probability (PP; Fig. 1b; unpaired t-tests: G1 / G2 > G3 / G4, p<0.005). PP quantifies the frequency of a single phoneme (phoneme probabilities) or a sequence of phonemes (e.g., biphone probabilities) within a language and thus serves as a measurement of the similarity between the artificial words and the pre-existing real word knowledge. According to distinct levels of PP, we paired the four sets to the high-(G1 & G2) and low-PP (G3 & G4) condition, respectively.

Experimental design.

a) Artificial words consist of two vowels and two consonants according to four different sequences. b) The phonotactic probabilities (PP) for single phonemes (left panel) and biphone probabilities (right panel) averaged over the four sets (40 words per set) and pairing of the sets with respect to two distinct levels of PP in high-(black) and low-PP (grey). c) Schematic trial structure of the learning task with screen images and the durations in milliseconds. d) Feedback matrix with the four answer types (hits, CR = correct rejections, misses and FA = false alarms) regarding to response and reward assignment of the word. Note, subjects could receive and lose money points dependent on correct and incorrect responses. e) Experimental procedure with experimental tasks and phases in temporal order. TMR took place in the non-REM sleep stages 2 & 3. Error bars reflect standard errors of the mean (SEM).

During encoding, we trained the subjects to discriminate the artificial words based on reward associations by manual button presses. Therefore, the high- and the low-PP condition, consisted of 40 rewarded and 40 non-rewarded words. After the auditory presentation of the word and the button press response, subjects received feedback at the end of each trial (Fig. 1c). In terms of the signal detection theory30, the responses with regard to the reward associations of the words were assigned to the following four response types: hits (rewarded, correct); correct rejections (correct, unrewarded); misses (rewarded, incorrect); and false alarms (unrewarded, incorrect) (Fig. 1d). Dependent on responses, subjects received money points. By three presentations of each word in randomized order, they were trained to categorize the artificial words.

After encoding and before sleep, we tested the pre-sleep memory performance (Fig. 1e). Subjects slept one night in the sleep laboratory and during non-REM sleep stages 2 & 3 we conducted auditory TMR of the low-PP words in one group of subjects and TMR of the high-PP words in the other group (between subject design). At the following morning, we tested the post-sleep memory performance.

Manipulation check based on encoding analyses

To validate our novel paradigm, we examined the influence of PP on encoding performance. Based on the signal detection theory30, we calculated sensitivity values (d’) to measure subject’s abilities to differentiate and categorize rewarded and unrewarded words of the two PP conditions over the three presentations of the encoding task (Fig. 2a). A repeated-measure ANOVA on sensitivity with PP (high vs. low) and presentations (1 to 3) as within-subjects factors revealed significant main effects of PP (F(1,32) = 5.13, p = 0.03), and presentations (F(2,64) = 95.67, p < 0.001). Additional pairwise comparisons between PP conditions showed significant differences after the first presentation (paired t-tests of the three presentations in order: (1) t(32) = 0.73, p = 0.47; (2) t(32) = 2.35, p = 0.03; (3) t(32) = 2.17, p = 0.04). These results indicate superior learning performance for words with high-compared to words with low-PP.

Distinct levels of encoding between high- and low-PP words.

a) Learning curves showing encoding performance over presentations of high-(black) and low-PP (gray) words. Note significant greater performance of high- in comparison to low-PP over the second and third presentations. b) Grand average time-frequency plots time-locked to word presentations. The gray rectangle within the top panel borders time and frequency range of interest (0.7 to 1.9s; 8-13Hz). Three different panels from top to down regarding to the three presentations. Solid and dotted lines within plots representing stimulus onset and averaged offset respectively. Note, increases of oscillatory desynchronization in alpha range (8-13Hz) over the three presentations. b) (Top, right) Topographic map shows power values averaged over the time window and frequency range of interest. c) (Top) Time-frequency representation of t-values (merged over Pz and P3 electrodes) shows significant greater changed desynchronization in alpha oscillations of high- in contrast to low-PP during the third presentation. Below, topographic map indicates significant cluster of electrodes of comparison between PP conditions of the third presentation (0.7 to 1.9s; 8-13Hz). d) The bar chart shows mean changes across subjects in alpha power (merged over Pz and Cz electrodes) of the second and third presentation by subtracting the first presentation in high-(black) vs. low-PP (gray). Statistical analyses revealed significant higher desynchronization of high-compared to low-PP and a significant decrease in alpha power under 0 of high-PP at the third presentation. Error bars reflect standard errors of the Mean (SEM); *p<0.05, **p<0.01.

To examine the oscillatory correlates of encoding performance, we conducted time-frequency analyses time-locked to auditory word representations (Fig. 2b) and extracted power values over time, frequency, and EEG electrodes by using wavelet transformations (see methods). This analysis revealed a strong desynchronization after word presentations (0.7 to 1.9 s to stimulus onset) in the frequency range of the alpha waves (8-13 Hz) and pronounced over parietal and occipital electrodes in contrast to baseline activity (−1 to −0.1 s to stimulus onset; Fig. 2b, right panel). Testing whether changes in alpha desynchronization correspond to increased encoding performance over repetitive presentations, a repeated-measure ANOVA on alpha power values merged over central electrodes (Pz, Cz) including presentations (1 to 3) as a single within-subjects factor revealed a significant increase in alpha desynchronization (F(2,64) = 4.19, p = 0.02).

Regarding distinct levels of encoding performance between high- and low-PP, we compared next changes in alpha desynchronization between these conditions. Here, we obtained the changes in power values by subtracting the first from the second and third presentation for the high- and low-PP condition, respectively. Cluster-based statistics of the third presentation revealed a significant cluster over the left posterior electrodes with a stronger alpha power decrease for the high- in contrast to the low-PP condition (time window: 0.7 to 1.9s; frequency range 8-13Hz; averaged t-value over the 5 cluster electrodes = −2.69, p = 0.01; Fig. 2c). A repeated-measure ANOVA on alpha power changes (merged over Pz and Cz electrodes) with PP (high vs. low) and presentations (2 to 3) as within-subjects factors revealed a main effect of PP (F(1,32) = 5.42, p = 0.03), and a significant interaction (F(1,32) = 7.38, p = 0.01) (Fig. 2d). Post-hoc pairwise comparison between PP conditions showed significant difference of the third presentation (paired t-tests of presentations in order: (2) t(32) = −0.36, p = 0.72; (3) t(32) = −3.55, p = 0.001) and a t-test against 0 revealed significantly decreased activity in high-PP of the third presentation (t(32) = −2.92, p = 0.006). In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth. To summarize, as a manipulation check based on encoding analyses, we confirmed that the conditions of high- and low-PP corresponds to distinct levels in learning difficulty.

TMR affects memory consolidation of the easy to learn words

To examine whether TMR during sleep impacts memory consolidation with respect to learning difficulty, we calculated the overnight changes by subtracting the pre- from the post-sleep memory performance of the high- and low-PP conditions. Conducting t-tests against 0 revealed a significant increase in the TMR/cueing condition of high-PP (high-PP cued: 0.24 ± 0.1, t(10) = 2.44, p = 0.035), while the memory performance remained unchanged in all the other conditions (low-PP cued: −0.09 ± 0.08, t(10) = −1.16, p = 0.27; high-PP uncued: 0 ± 0.08, t(10) = −0.04, p = 0.97; low-PP uncued: 0.03 ± 0.1, t(10) = 0.31, p = 0.76; Fig. 3a). An additional two-way mixed design ANOVA on the same values with cueing as a between-subject factor (cued vs. uncued) and learning difficulty as a within-subject factor (high- vs. low-PP) showed no significant results (cueing: F(1,20) = 0.5, p = 0.49; learning difficulty: F(1,20) = 2.37, p = 0.14; cueing x learning difficulty: F(1,20) = 3.62, p = 0.07). Post-hoc pairwise comparisons revealed a significant difference between the high-PP cued and the low-PP cued (high-PP cued vs. low-PP cued: t(20) = 2.63, p = 0.02), and no significant difference to the other conditions (high-PP cued vs.: high-PP uncued t(20) = 1.91, p = 0.07; low-PP uncued t(10) = 1.55, p = 0.15). In additional control analyses, pre-sleep memory performance and vigilance shortly before the post-sleep memory task did not significantly differ between both cueing groups (see Table S1).

TMR affects the memory performance of the easy to learn words.

Bar charts show mean overnight changes of sensitivity (a) and c-criterion (b) values of high- and low-PP and cued (green) vs. uncued (gray) conditions. Note, statistical analyses revealed significant overnight increases only in the high-PP cued condition. Error bars reflect SEM; *p<0.05.

Next, we explored whether TMR conditions impact biases in decision making. Therefore, we analyzed changes between the pre- and post-sleep memory tasks of c-criterion values as a measurement of risk avoidance/seeking in decision making30. Here, we found as well a significant overnight increase in the high-PP cued condition (high-PP cued: 0.2 ± 0.07, t(10) = 3.04, p = 0.01), indicating an increased bias towards risk avoidance by TMR (Fig. 3b), while all the other conditions remained unchanged (high-PP uncued: 0.09 ± 0.06, t(10) = 1.4, p = 0.19; low-PP cued: 0.08 ± 0.05, t(10) = 1.63, p = 0.13; low-PP uncued: 0.03 ± 0.05, t(10) = 0.56, p = 0.59). An ANOVA on c-criterion changes shows no significant effects (cueing: F(1,20) = 1.86, p = 0.19; learning difficulty: F(1,20) = 3.35, p = 0.08; cueing x learning difficulty: F(1,20) = 0.43, p = 0.52). Additional post-hoc pairwise comparisons revealed a significant difference between the high-PP cued and low-PP uncued (high-PP cued vs. low-PP uncued: t(10) = 2.43, p = 0.04), and no difference to other conditions (high-PP cued vs.: high- PP uncued t(20) = 1.28, p = 0.22; low-PP cued t(20) = 1.57, p = 0.13).

Taken together, these results suggest that the effectiveness of TMR depends on the level of difficulty in word learning, while auditory cueing during sleep increases the memory performance of the easy to learn words.

Increased spindle power nested during slow wave up-states in TMR of the easy to learn words

After analyzing TMR’s effectiveness on behavior, we investigated the corresponding neural activities by EEG. By visual inspection of the signals, auditory word presentations during NREM sleep led to broad high-amplitude oscillations, called slow waves (SW; 0.5-3Hz), whereas sleep spindle activity (9-16Hz) with various amplitude occurred preferentially nested during the SW’s up-state phase (see for example traces Fig. 4a).

Increased spindle power during SW up-states in TMR of the easy to learn words.

a) Top and bottom panel, two example EEG traces of auditory cueing during sleep (−2 until 6s to stimulus onset). Top rows, in blue, signal filtered in the SW range (0.5-3 Hz) superimposed upon the broadband (0.5-35 Hz) signal in black. Vertical black lines with speaker symbols on top mark onsets of auditory presentations. Black arrows point to spindle activity during SW up-states. Bottom rows, in red, the same signal, but filtered in the spindle range (9–16 Hz). Note, elevated SW following cueing presentations with various spindle band activity nested during SW up-states. b) Grand average baseline corrected curve of increased SW density after TMR in percentage. Shaded gray areas mark time windows (0 to 0.5s, 0.5 to 1s and 1.5 to 2s) of significant increased SW density. c) Grand average time-frequency plots time-locked to the troughs of SW with averaged signals plotted as black lines. Two different panels (left and right) according to high- vs. low-PP cueing conditions. The rectangle within the left panel borders time (0.3 until 0.6s to SW troughs) and frequency range of up-state fast spindle band activity (12-16Hz). Corresponding topographic map at right shows elevated fast spindle power over mid-parietal electrodes. d) Time-frequency representation of t-values time-locked to SW shows significant greater spindle band power during SW up-states for high- vs. low-PP (merged over Pz, P3 and P4 electrodes). Right, topographic map of t-values shows corresponding significant cluster of electrodes (0.3 to 0.8s; 11-14Hz), *p<0.05.

To statistically analyze whether TMR increases the SW density, we conducted a detection algorithm. To control for individual differences, we included 30% of SW with the highest amplitude per subject (see methods) in subsequent analyses. Testing changes of SW density in time windows of 0.5s, revealed significant increases after stimulus onset in comparison to baseline (Fig. 4b; baseline period of −1.5 to 0s before stimulus onset; t-tests against 0; 0 to 0.5s: 49.73 ± 14.32%, t(21) = 3.47, p = 0.002; 0.5 to 1s: 83.44 ± 14.97%, t(21) = 5.57, p < 0.001; 1.5 to 2s: 40.33 ± 10.67%, t(21) = 3.78, p = 0.001). Additional comparisons between the TMR conditions of high- and low-PP showed no significant differences in SW density, SW amplitude, number of TMR presentations and other sleep parameters (see Table S2). In sum, these results indicate an increase of SW in response to auditory word stimulation independent of difficulty in word learning.

Next, we analyzed whether variations in spindle-band oscillations nested during the SW up- states after auditory TMR – as a neural signature of memory reactivation – differ between the conditions of high- and low-PP. To extract the spindle power during SW up-states, we conducted time-frequency analyses time-locked to the troughs of the detected SW in the time-window from 0 to 6s after auditory presentations. Fig. 4c shows the grand average analyses per condition, whereas the fast spindle band activity (12-16Hz) nested during SW up-states displayed a topography with maximum power over the central-parietal electrodes. Comparison between PP conditions revealed a significant greater spindle-band power prominent in the frequency range of 11-14Hz during SW up-states of high- in contrast to low-PP (averaged t-value over the 5 cluster electrodes = 2.66, p = 0.01; Fig. 4d), whereas post-hoc comparison including the individual SW amplitude as a covariate still revealed the significance of this cluster (t(20) = 2.38, p = 0.03).

These results suggest that successful TMR of the easy to learn words is associated with increased spindle activity nested during SW up-states.

Discussion

In this study, we asked whether learning difficulty of artificial words affects TMR’s effectiveness and its neural activities. After confirming easier encoding of artificial words with high-compared to low phonotactic familiarity, we found that auditory TMR during sleep improved the memory performance of the easy to learn words, whereas cueing of the difficult words had no effect. Correspondingly, at the neural level, we observed that cueing presentations of the easy words induced significant greater spindle-band oscillations nested during SW up-states in comparison to the difficult words. To our knowledge, we present here the first study to address the critical role of learning difficulty by experimental manipulations of phonotactical word properties on TMR’s effectiveness and its neural signatures.

Related to our finding of TMR’s effectiveness of the easy words, previous research in object association learning showed beneficial influences of encoding depth29 and prior knowledge28 on TMR. Language studies without using TMR revealed that prior knowledge affected sleep dependent memory integration of new words in children31 and in adults18,20. To explain these critical influences on consolidation during sleep, the model of the complementary systems approach of word learning (CLS)4 proposes that new word memory temporary encoded in the hippocampus transfer during sleep into cortical structures of lexical and phonological memory for long-term embedding, due to co-activation of both memory systems. Thus, word memory reactivation and consolidation during sleep depend on the hippocampal encoding depth and the cortical prior knowledge in phonotactics. Regarding our results, we suggest facilitated systems co-activation/consolidation of the easy to learn words by auditory cueing during sleep as a manipulation of both factors – via enhanced initial encoding and higher similarity to pre-existing phonotactical knowledge – compared to the difficult to learn words.

Contrary to our results, other studies found better memorization of words with low phonological16 and semantical17 similarity to pre-existing knowledge after a retention period of sleep compared to words with high similarity. To explain this discrepancy, we could speculate that a very high degree of similarity led to a strong memory formation during wake-encoding, which prevented an additional effect of consolidation during sleep. Indeed, Creery et al.29 showed that learned object-location associations only benefited from TMR during sleep when not almost perfectly recalled during prior wake. Here, a moderate encoding depth might benefit most from subsequent sleep consolidation compared to insufficient or almost perfect encoding. Alternatively, the contrary results could be related to characteristics of the learning paradigms (L1-vs. L2-like novel words with associations to objects16; semantically related vs. unrelated word pairs17) distinct to our design.

In addition to effectiveness of TMR on memory, we observed a significant change of the decision bias in the cueing condition of the easy to learn words and no overnight changes in the other conditions. Here, subjects became more biased in their decision making to avoid losses of money points. This observation suggests that TMR not only captures specific memory traces of the words and their reward associations, but also decision-making tendencies. In a previous study, Ai and colleagues32 revealed altered reward-relevant decision preferences by TMR. To what extent concepts of decision-making such as risk avoidance/seeking could be reactivated, transformed, and consolidated by TMR is largely unexplored and might be interesting for future research for example in neuroeconomics.

Correspondent to our behavior results of TMR’s effectiveness, we found significantly higher spindle-band oscillations nested during SW up-states after cueing presentations of the easy in comparison to the difficult words. Accordingly, the active system consolidation hypothesis proposes slow wave - sleep spindle coupling activities as an underlying mechanism of systems consolidation6,12,33. Previous studies reported associations of SW and spindle activities during sleep with the integration of new memories in pre-existing knowledge networks34,35. Strikingly, recent studies by implementing machine learning for EEG decoding revealed that classification accuracy of distinct memory categories peaked above chance level synchronized with SW-spindle coupling events26,36. Here, our results support the assumption of SW-spindle activities as an underlying mechanism of memory reactivation during sleep by providing additional evidence to link successful TMR on behavior to increased spindle activity during SW up-states.

In validation of our novel paradigm, we confirmed easier encoding and better memorization of artificial words with high- in comparison to low-PP, in line with previous studies1,2. These results suggest that participants used their prior knowledge of frequently occurring word-sound patterns to facilitate learning. Regarding the framework of schema theory3, processing of incoming words with high-PP might co-activate consistent schemata of prior word sound knowledge and thus may lead to facilitated memory integration and stabilization during encoding in comparison to processing of more inconsistent words with low-PP.

Our EEG analyses of encoding revealed desynchronization of alpha oscillations as a function of learning improvements. Additionally, superior encoding of the high-PP words corresponded to greater alpha desynchronization in comparison to the low-PP words. Consequently, we assume that the suppression of alpha oscillations over parietal and occipital electrodes represent a neural correlate of mnemonic functioning. Accordingly, previous studies linked desynchronization of alpha oscillations to memory processing37,38 and memory replay during wake39. A study of intracranial electrophysiology in epilepsy ptients40 showed that neocortical alpha/beta desynchronization precede and predict fast hippocampal gamma activity during learning. The authors propose that these coupled activities represent hippocampal-cortical information transfer important for memory formation. Here, our findings let suggest cortical alpha desynchronization as neural marker of wake mnemonic processing according to distinct levels of difficulty in word learning.

Our study has limitations due to a small sample size and between-subject comparisons. Further, potential influences of phonotactic word properties and the prior encoding depth on TMR cannot be interpreted independently because the conditions of phonotactic word properties impacted already the encoding depth before sleep. To disentangle these potential factors, future studies could include a higher number of encoding trials for the low-PP words to equalize the encoding depth between conditions prior to TMR, as successfully shown in a recent study20. Additionally, a within-subject design with a larger sample size would provide a more robust control of interindividual differences in sleep and cognition.

To conclude, our present study demonstrates that difficulty in word learning as a manipulation of phonotactic properties impacts the effectiveness of TMR. Here, TMR of the easy to learn words facilitates categorization learning while TMR of the difficult words had no effect. Auditory cueing of the easy words induced increased spindle activity during the SW up-states - as an assumed underlying activity of memory reactivation - compared to the difficult words. Thus, our findings suggest that future research and clinical applications to restore language capabilities should consider learning difficulty based on phonotactical word properties as a potential factor of successful TMR, whereas alpha desynchronization during initial learning as a neural marker of encoding depth may serve as a predictor for sleep-associated memory reactivation and consolidation.

Methods

Subjects

The study included 39 subjects (29 females) with an age range of 19 to 28 years (M = 22.28 years, SD = 2.04). Participants were recruited from the University of Fribourg community by E-Mail or through advertisements at the campus of the University. Before participation, subjects had to give written informed consent as approved by the Ethical Commission of the Department of Psychology of the University of Fribourg. All participants were German speakers, and no subject had a history of neurological or psychiatric illness. The participants were instructed to keep a normal sleep schedule, to get up in the morning before 8 a.m. and not to consume alcohol and caffeine on experimental days. For participation, subjects either received credit for an undergraduate class and/or monetary compensation.

Data were excluded from subjects who did not reach the minimal learning performance (d-prime during pre-sleep memory test greater than 0.9 in at least one of the two PP conditions) or showed a negative d-prime value in at least one condition (5 and 1 subjects respectively).

Pre-Learning

Participants arrived at the sleep laboratory at 19:30h. Electrodes for standard polysomnography (32 EEG electrodes, EMG and ECG electrodes) were mounted with two EEG mastoid electrodes. One electrode under the right eye was attached to record eye movements.

Encoding task

The encoding task started around 21:00h. Subjects learned to discriminate rewarded and unrewarded artificial words by right- and left-hand button presses. During each trial, an initial fixation cross was displayed between 1.7 and 2.3s. Subsequently, concurrent with the onset of a blank screen the sound of an artificial word was played for 0.7 to 1s. After a waiting period (blank screen 2s), a question mark appeared to signal the onset of the response time window (maximal duration 4s). Following the key press response, a feedback screen with the money points of the trial and the current task score appeared for 2s. The learning task contained 480 trials in randomized order according to 160 artificial words with three presentations each.

Pre-Sleep Memory task

After finishing the encoding task, following a 10-min break, subjects performed the pre-sleep memory task. The memory task had the same trial structure as the encoding task without the last feedback screen. Participants received again money points for correct responses. However, the received amount was only shown at the end of the task. The memory task contains 160 trials according to the 160 artificial words.

Auditory target memory reactivation during sleep

Following an additional impedance check and re-adjustment of the EEG electrodes, subjects went to bed in a noise and electric shielded cabin of the sleep laboratory. All night sleep periods started with light off between 23:00 and 24:00h. Based on online monitoring of SWS and N2 sleep, artificial words were presented aurally via loudspeakers (sound pressure level 55dB) with a randomized inter-stimulus interval of 8±2s. One group of subjects were exposed to artificial words with low phonotactic probability during sleep, whereas we presented to the other group the high phonotactic probability words (see Table S1 for number of reactivations). We interrupted word presentations when we observed online an arousal or patterns of REM sleep.

Post-Sleep Memory task

After sleep and re-adjustment of the EEG, subjects performed the post-sleep memory task (see above the description of the pre-sleep memory task).

Experimental tasks

All experimental tasks, including sleep reactivation, were conducted by using the E-Prime software (Psychology Software Tools, Sharpsburg, USA). By presentation of practice trials at the beginning, the subject’s understanding of the task was approved. Across all tasks, stimuli were presented in randomized order.

Stimuli

To create artificial words, we subdivided the alphabet in two groups of consonants (C1: b, c, d, f, g, h, j, k, l, m; C2: n, p, q, r, s, t, v, w, x, z) and vowels (V1: a, e, I; V2: o, u, y). Four-letter-words were created by selecting letters from the vowel and consonant groups according to four different sequences (G1:C1, V1, V2, C2; G2: C1, V1, C2, V2; G3: V1, C1, C2, V2; G4: V1, C1, V2, C2). Artificial words were converted automatically from text to speech files (wav format) by using MATLAB functions (tts.m, audiowrite.m) with the setting of a female computer voice with American English pronunciation. From this pool of 900 stimuli for each rule, we selected 40 artificial words per rule category (4 x 40 words) according to the inclusion criteria: fluent and understandable pronunciation; no meaning; no names; aurally discriminable to other selected words.

Phonotactic probability values were computed for each word by using an online computation platform41 (https://calculator.ku.edu/phonotactic/about) according to an American English lexicon. Between rule category comparison analyses revealed significant differences in phonotactic probabilities between two rules in contrast to the other two rules (see Fig. 1b).

Psychomotor vigilance task

With respect to vigilance after sleep before the post-sleep memory task, a reaction time measurement was conducted by the Psychomotor Vigilance Task (PVT)42. Each trial began with a fixation cross of a randomized duration between 2 and 10 seconds. Participants were instructed to press the spacebar button with the forefinger of the non-dominant hand as quickly as possible after numbers started to count in milliseconds on the screen. The reaction time was displayed for 1s after the key press. The PVT had a duration of 10 min (see for results Supplementary Table 2).

Behavior analyses

As a measurement of discrimination learning and memory performance, we calculated sensitivity values (d’=z(hits)-z(false alarms)) according to the signal detection theory30. Here, we considered correct trials with gaining of money points as hits and incorrect trials with a loss of money points as false alarms (see Fig. 1d). In addition, we assessed measurements of the response bias (c=−0.5*(z(false alarms)+z(hits))). According to the convention43, rates of 0 and 1 were replaced with 0.5/n and with (n-0.5)/n respectively, whereas n corresponds to the number of trials (n=40). Values of sensitivity and response bias were calculated separately for the low- and high-PP condition.

EEG recordings

We made EEG recordings by using customized 32-Ag/AgCl electrodes at 10-10 locations caps (EASYCAP, Woerthsee-Etterschlag, Germany) and 32 channel amplifiers (Brain Products, Gilching, Germany). Impedances were kept below 10 kΩ. The EEG was recorded with a sampling rate of 500 Hz using Brain Vision Recorder software (Brain Products, Gilching, Germany). Signals were referenced to electrodes at the mastoids. The ocular activity was measured via one EOG channel mounted ∼2 cm below the right eye. Muscle tone was monitored by EMG recordings made under the chin. The following EEG electrodes were used for subsequent wake and sleep analyses: Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T7, T8, P7, P8, Fz, Cz and Pz.

Sleep scoring

Sleep stages NREM1, NREM2, NREM3, wake, and REM sleep were scored manually according to the criteria of the American Academy of Sleep Medicine (AASM) by monitoring signals of frontal, central, and occipital electrodes over 30 s epochs44.

Preprocessing Wake EEG

Wake EEG were preprocessed using the Fieldtrip toolbox (http://fieldtriptoolbox.org; Donders Institute for Brain, Cognition and Behaviour, Radboud University, Netherlands)45. First, raw data signals were re-referenced to averaged mastoid electrodes and filtered (high- and low-pass, 0.5 and 45 Hz). For each trial, data were segmented into epochs (−2 to 4s) time-locked to stimulus onset. After demean and detrend processing steps, noisy trials were identified by visual inspection and discarded from further analyses. We then conducted an independent component analysis (ICA) to identify and reject ICA components impacted by eye blinks and eye movements.

Preprocessing Sleep EEG

EEG signals during sleep were re-referenced to averaged mastoid electrodes and filtered (high- and low-pass, 0.5 and 35 Hz). To analyze EEG activities of TMR, data were segmented into epochs (−2 to 8s) around the onset of stimuli presentations. After demean and detrend processing, trials with signals distorted by noise and movement artefacts were identified by visual inspection and removed from further analyses.

Time-Frequency Analysis Wake data

We conducted the time-frequency analyses by using Morlet wavelet analyses implemented in the Fieldtrip toolbox. The number of wavelet cycles was adjusted to 7 cycles. We extracted oscillatory power in the frequency range of 1 to 45 Hz with frequency steps of 0.2 Hz and time steps of 10 ms. All the power values were baseline corrected and transformed to absolute changed values by subtracting the corresponding averaged values of the baseline interval of −1 to −0.1 s before stimulus onset.

Slow wave-spindle power analyses

To reduce influence of potential confounders on TMR like low prior learning performance and number of reactivations, we included only subjects with a pre-sleep memory performance d’>0.75 in both PP conditions and who did receive a minimal number of 160 reactivations (each word was represented at least two times during sleep). Thus, we excluded 5 and 6 subjects respectively. The final sample for sleep analyses consisted of n = 22 with 11 subjects in each cueing condition.

To detect slow waves, we first localized all negative and positive peaks of the 0.5 to 3Hz band-pass filtered signal during the auditory cueing trials in the time windows of 0 to 6s according to stimulus onsets. In addition, time distance between the prior and posterior positive peaks should be in the range of 0.3 to 2s. For the following analyses, the 30% slow waves with the highest amplitude (from negative peak to posterior positive peak) were included for each subject.

We then extracted the corresponding spectral power (1–30Hz) in a time window around this negative slow wave peak (±2s) using Morlet wavelet (7 cycles) analysis implemented in the Fieldtrip toolbox, with a time window and step size of 10ms and frequency steps of 0.2Hz. Extracted power data over frequency and time was averaged of the slow waves and baseline corrected by subtracting the averaged values of the baseline interval of −2 to −1.5s prior to the negative slow wave peak. Finally, the transformed absolute changed values were plotted for the frequency and time range of interest and different conditions (see Fig. 4c).

Slow wave density analyses

To analyze the number of slow waves around memory reactivations during sleep, we divided the time course of reactivation trials in equally sized bins with a duration of 0.5s from −1.5 to 3s according to stimulus onset. Based on negative peaks of detected slow waves, we counted the occurring of slow waves for each bin over trials. Subsequently, we computed density values for each bin by division of the number of trials. Finally, we calculated the relative increase in slow wave density in percent by division of the averaged baseline values (bins: −1.5 to 0s; see Fig. 4b).

Statistical Analysis

To test for differences between conditions of wake behavioral data, repeated-measure ANOVAs and paired sample t-tests were used. Two-way mixed design ANOVAs and independent sample t-tests were computed for sleep behavior analyses. For all behavior statistical analyses, the p-value was set at p < 0.05 and all statistical analyses were processed by using MATLAB 2018a (MathWorks, Natick, USA) and SPSS (IBM Corp., Version 25).

Considering the problem of multiple testing in statistics, we applied cluster based permutation tests implemented in the Fieldtrip toolbox45 to test for potential differences of EEG time-frequency data. Here, dependent sample t-tests and an independent sample test (for between subject group comparison of sleep data) were conducted. We specified the number of permutations by 1000 and the minimal number of channels by two. The cluster alpha level was set at p = 0.05 for two-tailed testing. Plots with significant clusters of electrodes are shown in Fig. 2c and 4d. After confirming the existence of significant cluster, we conducted an additional repeated-measure ANOVA with averaged values of the identified time and frequency range of interest (see Fig. 2d). To control for SW amplitude size as a potential confounder in spindle activity differences during SW up-states, we extracted and averaged power values of the significant cluster (0.3 to 0.8s; 11-14Hz; Pz, P3, P4, O2, P7) per subject and conducted a post-hoc independent sample t-test based on residuals corrected by individual SW amplitude size.

Author contributions

Design of the study: A.-L.K. and B.R.; data acquisition: A-L.K.; statistical analyses and interpretation of the data: A.-L.K. and B.R.; writing the manuscript: A.-L.K. and B.R.

Supplementary tables

Gender, age and performance rates

Sleep and reactivation parameter

Acknowledgements

We thank the students at the University of Fribourg who assisted in data collection and Jonas Beck for reading and providing comments to improve the manuscript. This work was supported by the Swiss National Science Foundation Grant 168602 to A.-L.K. and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement MemoSleep No. 677875) to B.R., and the University of Fribourg.

Conflict of interest

The authors declare that they have no conflict of interest.

Data availability statement

Data is available on request.