Targeted memory reactivation in human REM sleep elicits detectable reactivation

  1. Mahmoud EA Abdellahi
  2. Anne CM Koopman
  3. Matthias S Treder
  4. Penelope A Lewis  Is a corresponding author
  1. School of Psychology, Cardiff University Brain Research Imaging Centre (CUBRIC), United Kingdom
  2. School of Computer Science and Informatics, Cardiff University, United Kingdom

Abstract

It is now well established that memories can reactivate during non-rapid eye movement (non-REM) sleep, but the question of whether equivalent reactivation can be detected in rapid eye movement (REM) sleep is hotly debated. To examine this, we used a technique called targeted memory reactivation (TMR) in which sounds are paired with learned material in wake, and then re-presented in subsequent sleep, in this case REM, to trigger reactivation. We then used machine learning classifiers to identify reactivation of task-related motor imagery from wake in REM sleep. Interestingly, the strength of measured reactivation positively predicted overnight performance improvement. These findings provide the first evidence for memory reactivation in human REM sleep after TMR that is directly related to brain activity during wakeful task performance.

Editor's evaluation

This valuable work in human subjects reports that sounds that were associated with specific memories during waking behaviors can trigger the reactivation of these memory representations during REM sleep. Convincing evidence is provided to support the conclusions. The work expands our understanding of memory processing during sleep.

https://doi.org/10.7554/eLife.84324.sa0

eLife digest

Sleep is crucial for rest and recovery, but it also allows the brain to process things it has learned while awake. This is why a person may go to bed frustrated with learning a tune on the piano but wake up the next morning ready to play it without fumbling. For this to happen, it is thought that memories must be reactivated during sleep – something which can be studied by monitoring brain activity. While it has been shown that memory reactivation occurs in some stages of human sleep, it was unclear whether it occurred in a specific stage known as REM sleep – which is important for learning.

To study memory reactivation during REM sleep, Abdellahi et al. recruited volunteers and monitored their brain activity during an ‘adaptation night’ when certain sounds played as they slept. The following day, memories – such as an image or pressing a certain button – were paired with the sounds, which were replayed during REM sleep the following night to trigger memory reactivation (experimental night). Abdellahi et al. measured how strongly brain activity during each night related to the waking activity when the sound pairing tasks were imagined and compared the adaptation and experimental nights. The experimental night showed clear signs of memory reactivation after the sounds were played during REM sleep, suggesting that the sounds triggered memories of the associated images or buttons.

These findings show that in humans, brain activity patterns that indicate memory reactivation can be identified during REM sleep. The work paves the way for future studies into the characteristics of this memory reactivation and how to trigger it in a way that leads to improvements in memory.

Introduction

The reactivation of memories in non-rapid eye movement (non-REM) sleep is widely supported by evidence from numerous species (Rasch and Born, 2013; Bendor and Wilson, 2012; Wang et al., 2019; Ji and Wilson, 2007), but it is still unclear whether equivalent reactivation occurs in rapid eye movement (REM) sleep. The first evidence for non-REM reactivation came from rodents (Pavlides and Winson, 1989; Wilson and McNaughton, 1994), and such reactivation has subsequently been identified in humans using EEG classifiers (Wang et al., 2019; Belal et al., 2018; Schreiner et al., 2018; Cairney et al., 2018; Abdellahi et al., 2023), fMRI (Shanahan et al., 2018; Rasch et al., 2007), and intracranial recordings (Zhang et al., 2018). Only a few rodent studies have shown evidence for reactivation in REM sleep (Pavlides and Winson, 1989; Louie and Wilson, 2001; Hennevin et al., 1995; Poe et al., 2000), but support for such reactivation comes from human work by Schönauer et al., 2017, who used EEG classifiers to elegantly distinguish between the REM sleep on nights after training on two very different tasks. Nevertheless, the reinstatement of EEG activity in REM that directly relates to EEG activity during a task in wake has yet to be demonstrated in humans. Targeted memory reactivation (TMR), a technique which allows the active triggering of memory reactivation, is linked to both neural (Lewis and Bendor, 2019; Berkers et al., 2018) and behavioural (Hu et al., 2019; Pereira et al., 2022; Oudiette and Paller, 2013) plasticity when applied in non-REM sleep.

TMR provides a temporal window for reactivation to occur as a result of the external cue, and this can be used to evaluate the detectability of such reactivation. In the current study, we used TMR to build on the work of Schönauer et al., 2017, by looking for task-related activity after TMR cues during REM sleep. We thus searched for a direct link between brain activity associated with a memory in wake and cued reactivation in REM sleep. Given that theta activity predominates during REM (Lomas et al., 2015; Nishida et al., 2009), we also wanted to gain a better understanding of the theta response after TMR cues in REM, and how this relates to reactivation.

We chose the serial reaction time task (SRTT), which is known to be sleep sensitive (Spencer et al., 2006; Born and Wilhelm, 2012), to examine these questions. The SRTT is modulated by TMR in non-REM sleep (Cousins et al., 2014; Cousins et al., 2016; Rakowska et al., 2021) but has also been strongly linked to REM (Maquet et al., 2000; Peigneux et al., 2003; Laureys et al., 2001). For instance, brain areas which were activated during the execution of an SRTT have been shown to reactivate during subsequent REM (Maquet et al., 2000; Peigneux et al., 2003). Furthermore, greater connectivity between premotor cortex, posterior parietal cortex, and bilateral pre-supplementary motor areas has been observed during REM after training on this task (Laureys et al., 2001). These studies build on the literature suggesting that memories may reactivate in REM to suggest that the SRTT specifically can be reprocessed in this sleep stage.

In our SRTT, participants were presented with audio-visual cues and responded by pressing four buttons (two with each hand), visual cues are shown in Figure 1—figure supplement 1. These cues were organised into an implicit 12-item sequence which was practiced repeatedly. The cue-sounds were then replayed during subsequent REM sleep to trigger the associated memories of left- and right-hand presses (Figure 1c). We trained participants on two sequences and replayed only one of them in sleep. As a control, we included an adaptation night in which participants slept in the lab and we played the same cue-sounds that would later be played during the experimental night. This provided an important control, as a null finding from this adaptation night would ensure that we are decoding actual memories, not just sounds. We used these data to discriminate between neural reactivation of left- and right-hand button presses using linear classification on EEGs during REM sleep, see Methods for more details.

Figure 1 with 1 supplement see all
Experimental design.

(a) The experiment consisted of two nights: an adaptation and an experimental night. On both nights, participants were wired up for polysomnography (PSG) and we recorded brain activity throughout the night. In both nights, tones were presented during rapid eye movement (REM) sleep. However, prior to the experimental night, participants completed training on the serial reaction time task (SRTT), and then performed an imagery task in which they were cued with pictures and sounds, but only imagine performing the finger tapping (without movement). PSG was recorded throughout these tasks. After waking up, from the experimental night, participants completed the motor imagery and the SRTT again, and finally did the explicit recall task. (b) In the SRTT, images were presented in two different sequences each with a different set of tones. Each image was associated with a unique tone and required a specific button press. In the imagery task, participants heard the tones and saw the images as in the SRTT, but only imagined pressing the buttons. This imagery data was used for classification, as it has cleaner signals compared to SRTT since there are no movement artefacts. (c) The sounds of only one learned sequence (cued sequence) were played during REM sleep to trigger the associated memories of left- and right-hand presses. (d) Participants were asked to mark the order of each sequence on paper as accurately as they could remember during the explicit recall test after sleep.

Results

Elicited response pattern after TMR cues

We looked at the TMR-elicited response in both time-frequency and ERP analyses using a method similar to the one used in Cairney et al., 2018, see Methods. As shown in Figure 2a, the EEG response showed a rapid increase in theta band followed by an increase in beta band starting about 1 s after TMR onset. REM sleep is dominated by theta activity, which is thought to support the consolidation process (Diekelmann and Born, 2010), and increased theta power has previously been shown to occur after successful cueing during sleep (Schreiner and Rasch, 2015). We therefore analysed the TMR-elicited theta in more detail. Focussing on the first second post-TMR onset, we found that theta was significantly higher here than in the baseline period, prior to the cue [-300 -100]ms, for both adaptation (Wilcoxon signed rank test, n=14, p<0.001) and experimental nights (Wilcoxon signed rank test, n=14, p<0.001). The absence of any difference in theta power between experimental and adaptation conditions (Wilcoxon signed rank test, n=14, p=0.68) suggests that this response is related to processing of the sound cue itself, not to memory reactivation. Turning to the ERP analysis, we found a small increase in ERP amplitude immediately after TMR onset, followed by a decrease in amplitude 500 ms after the cue. Comparison of ERPs from experimental and adaptation nights showed no significant difference, (n=14, p>0.1). Similar to the time-frequency result, this suggests that the ERPs observed here relate to the processing of the sound cues rather than any associated memory.

Figure 2 with 5 supplements see all
Classification of left vs. right hand.

(a) Time-frequency and ERP analyses of the experimental night. Power percentage changes from the baseline period are shown with colours. The solid black line represents the average results of all ERP analyses from all participants. (b) Time-frequency analysis and ERP analysis of the adaptation night. (c) Comparison between classification performance on experimental and adaptation nights reveals a significant effect described by a cluster which shows a higher classification performance for the experimental night compared to adaptation (n=14, p=0.004), a z-statistic value at every point is shown and cluster edges are marked with white after correcting for multiple comparisons with cluster-based permutation (see Methods for details). (d) Classification performance for the experimental night was also significantly higher than chance (AUC: 0.5) as shown by the cluster after correction (n=14, p=0.009).

Detection of memory reactivation after TMR cues

We trained our EEG classification models using sleep data and then tested them on wake data. We chose to do this because training a classification model on wake could cause the model to weigh features which are dominant in wake very highly even if those features were absent from sleep. By training our models on sleep data, we ensured that the features associated with reactivation were used by the models, and the models were thus able to look for these in the stronger, less noisy, signals recorded during wake. We used a searchlight approach to locate the channels of interest, see Methods for details. We then used a linear discriminant analysis (LDA) classifier in a time × time classification procedure (King and Dehaene, 2014). We also repeated the classification process using the adaptation night to be certain that the classification was not caused by sound-induced effect or EEG noise rather than reactivation of the encoded memory. Finally, we compared the results from the two nights, both to each other and to chance level using cluster-based permutation, see Methods. Comparison of the experimental night to the adaptation night showed a significantly higher area under the receiver operating characteristic curve (AUC) for the experimental night, described by a cluster (n=14, p=0.004) around 1 s after the onset of the cue at time 0, Figure 2c. In the adaptation night, no significant classification strength was found compared to chance (AUC=0.5), demonstrating that classification of this control condition did not differ from chance level. By contrast, comparison of the experimental night against chance showed a significantly higher AUC around 1 s after the onset of the cue for the experimental night, (n=14, p=0.009), Figure 2d. Time × time classification AUC plots are shown in Figure 2—figure supplement 1. These findings are significant when evaluated against both control and chance level, clearly demonstrating that we can detect memory reactivation after TMR cues in REM sleep.

As described in the section above, TMR elicited an increase in theta power in both experimental and adaptation nights. Because theta responses after TMR have been implicated in reactivation and consolidation (Schreiner and Rasch, 2017), we wanted to determine whether this increased theta power could be the carrier of reactivation and thus hold the discriminating features of classes. We therefore band-pass filtered our data and re-ran our classification analysis using only theta band. Interestingly, classification did not differ from chance in this theta-only analysis (p>0.3), suggesting that theta activity itself does not hold the information used to detect reactivation. For completeness, we also conducted a positive control in which we filtered out theta band and performed our classification analysis. We found a pattern of significant difference similar to that of the original classification in Figure 2c and d. Specifically, there was a significant difference against both chance level (n=14, p=0.007, corrected with cluster-based permutation) and the adaptation night (n=14, p=0.009, corrected with cluster-based permutation), as shown in Figure 2—figure supplement 2.

Correlation of classification performance with behaviour

Because TMR in SWS leads to reliable behavioural benefits in the SRTT Cousins et al., 2014; Cousins et al., 2016, we were interested to know whether applying this same manipulation in REM sleep lead to any measurable advantage. Our examination of overnight change showed that relative sequence improvement (cued vs un-cued) did not differ from chance (Wilcoxon signed rank test, n=14, p=0.43), Figure 3a, see Methods for details. This null result regarding the impact of TMR mirrors the finding of Cousins et al., 2016, where there was no significant result of stimulation, but there was a correlation with sleep spindles, which are thought to be markers of reactivation. We have a much stronger proxy for reactivation in this study in the form of our classification measure. We therefore set out to determine whether there was a relationship between behavioural improvement and the extent to which we could classify reactivation. This revealed a positive correlation (n=14, r=0.56, p=0.04), Figure 3b.

Correlation of classification with cued sequence improvement.

(a) Relative sequence improvement (cued vs un-cued) did not show significant improvement (Wilcoxon signed rank test, n=14, p=0.43). (b) Classification performance positively correlated with relative sequence improvement (Spearman correlation, n=14, r=0.56, p=0.04).

Finally, we wanted to know whether the extent to which participants learned the sequence during training might predict the extent to which we could identify reactivation during subsequent sleep. We therefore checked for a correlation between classification performance and pre-sleep performance to determine whether the degree of pre-sleep learning predicted the extent of reactivation, this showed no significant correlation (n=14, r=–0.39, p=0.17). Together, these findings suggest that while our TMR cueing did not lead to an overall significant benefit in behavioural performance, the extent to which we can detect reactivation in REM sleep did predict the extent to which sequence memory improves overnight, and this was not dependent on pre-sleep learning.

Discussion

In this paper we demonstrated that memory reactivation after TMR cues in human REM sleep can be decoded using EEG classifiers. Such reactivation appears to be most prominent about 1 second after the sound cue onset. Interestingly, although TMR also elicited an increase in theta power, our data suggest that this response is related to the tone, and does not embody the reactivation in and of itself.

Temporal structure of reactivation and TMR-elicited theta activity

Our data suggest that detectable memory reactivation is temporally consistent among trials and participants, as reflected in the timing of the clusters in Figure 2c and d. During wake, classification is more prominent around 0.6 s after the cue, whereas in REM it starts about 0.4 s later. This delayed start could potentially be due to the brain taking more time to process information and reactivate a memory during REM sleep than during wake. In fact, reactivation in SWS has also been shown to be delayed, appearing approximately from 1 s to 4.5 s after cue onset in a task of associating spatial locations with left- and right-hand movements (Wang et al., 2019), around 2 s after cue onset in a picture memory task (Cairney et al., 2018), and up to 10 s after cue onset in rodents (Bendor and Wilson, 2012).

REM sleep is dominated by theta activity, which is linked to attention during wake (Biel et al., 2021) and is more prominent with higher executive control (Magosso et al., 2021). Wakeful theta is also associated with the encoding of new information and memory processing (Vertes, 2005). Furthermore, studies in non-REM have shown that TMR triggers an increase in theta followed by an increase in sigma band (Cairney et al., 2018; Schechtman et al., 2021). Our data are in keeping with this, as we show that TMR elicited a power increase in theta band in REM sleep irrespective of whether the cues are meaningful (experimental night) or not (control night). We also demonstrate that theta does not embody the reactivation we are detecting, since we cannot classify reactivation using only this frequency. We therefore conclude that the theta activity we observed following TMR cues was related to the processing of the sound cues and not a representative of reactivation.

Correlation with behaviour

The performance of our REM classifiers positively predicts overnight improvement on sequence memory. Similar correlations between classification performance and behaviour have been found in non-REM TMR (Wang et al., 2019; Shanahan et al., 2018; Zhang et al., 2018). This could lead to the speculation that more reactivation means more consolidation, and therefore better post-sleep performance. However, TMR in REM sleep lead to no overall benefit in performance in either our current dataset (Koopman et al., 2020) or previous studies (Rasch et al., 2007; Cordi et al., 2014). Furthermore, examination of natural (un-cued) sleep showed a correlation between overnight memory improvement and the extent of memory-linked activity identified in non-REM but not REM sleep (Schönauer et al., 2017), potentially suggesting a difference in the function of replay in these two sleep stages. We are at a very early phase in understanding what TMR does in REM sleep, however we do know that the connection between hippocampus and neocortex is inhibited by the high levels of acetylcholine that are present in REM (Hasselmo, 1999). This means that the reactivation which we observe in the cortex is unlikely to be linked to corresponding hippocampal reactivation, so any consolidation which occurs as a result of this is also unlikely to be linked to the hippocampus. The SRTT is a sequencing task which relies heavily on the hippocampus, and our primary behavioural measure (sequence-specific skill [SSS]) specifically examines the sequencing element of the task. Our own neuroimaging work has shown that TMR in non-REM sleep leads to extensive plasticity in the medial temporal lobe (Cousins et al., 2016). However, if TMR in REM sleep has no impact on the hippocampus then it is quite possible that it elicits cortical reactivation and leads to cortical plasticity but provides no measurable benefit to SSS. Alternatively, because we only measured behavioural improvement right after sleep it is possible that we may have missed behavioural improvements that would have emerged several days later, as we know can occur in this task (Rakowska et al., 2021). We should also mention our relatively low number of participants (n=14) is a limitation to this correlation and we recommend that future studies confirm the same relationship.

Parallel characteristics of reactivations in REM and non-REM

While there is already a growing body of literature about TMR cued reactivation in non-REM sleep (Rasch and Born, 2013; Bendor and Wilson, 2012; Wang et al., 2019; Cairney et al., 2018; Schreiner et al., 2021), our findings provide initial information about human memory reactivation using TMR in REM. In fact, our data suggest some parallels between reactivation in these two sleep stages. For instance, similar to non-REM (Cairney et al., 2018), TMR-locked reactivation in REM is delayed compared to wake after cue onset. Furthermore, the reactivations we were able to detect appear to occur in roughly the same area of the cortex as is used in performing the task, and the extracted features of reactivation are similar to those extracted during wake activation which is why it is detectable using our machine learning models.

Authenticity of detected memory reactivation

We ensured that the pattern detected by our EEG classifiers is a genuine memory reactivation using two procedures. Firstly, we use an adaptation night, during which participants heard the TMR sounds before these were paired with any memory to ensure that our classification results from the experimental night were not caused by responses to sounds alone. Secondly, in developing our classifier, we captured the neural fingerprint associated with reactivation of a memory using the EEG pattern occurring after TMR cues in sleep. We then directly related this to the pattern of brain activity during wakeful imagining of the task. This was achieved by training our classification models on sleep EEG and testing them on wakeful imagining. Together, these procedures ensured that our classification stems from memory reinstatement and is related to genuine re-processing of memories during sleep. Our work builds on other studies showing discriminability of cued categories in sleep data without the inclusion of wake (Cairney et al., 2018; Schönauer et al., 2017), as well as on an approach that includes only the features that caused category discrimination in wake (Wang et al., 2019). We have explicitly examined the differential impact of training our classifiers with wake and training them with sleep and shown in Figure 2—figure supplement 3 that the classifiers trained with sleep are more robust in decoding the elicited reactivation. We therefore recommend that future studies adopt a similar approach by allowing the models to train on sleep data.

Conclusion

The question of whether memories reactivate in REM as well as non-REM sleep has been debated for some years. REM reactivation has been suggested by modelling (Hasselmo, 2008) and evidence of learning-dependent activation in human REM sleep has been observed (Maquet et al., 2000; Peigneux et al., 2003). However, null behavioural findings from human REM TMR studies Rasch and Born, 2013; Rasch et al., 2007 have led to scepticism amongst sleep researchers. Our current findings provide clear evidence that TMR in REM can elicit detectable reactivation. Furthermore, our analysis uncovers several important properties of REM reactivation, showing strong parallels with reactivation in non-REM sleep. More work is needed to explore how such reactivation links to behavioural and neural plasticity, and how this differs across a variety of cognitive tasks.

Methods

Participants

EEG and behavioural data were collected from human participants (n=16) (8 females, 8 males, and age mean: 23.6). Two participants were excluded due to technical problems, the rest of the datasets were included in the analyses (n=14). All participants had normal or corrected-to-normal vision, normal hearing, and no history of physical, psychological, neurological, or sleep disorders. Responses in pre-screening questionnaires reported no stressful events, a regular sleep-wake rhythm in the month before the study, and no night work or cross-continental travel in the 2 months before the study. All participants reported non-familiarity with the SRTT and all of them were right-handed. Participants did not consume alcohol in the 24 hr before the study or caffeine in the 12 hr prior to the study or perform any extreme physical exercise or nap.

Experimental design

The SRTT was adapted from Cousins et al., 2014, and participants performed SRTT before and after sleep. In the SRTT, sounds cued four different finger presses. During wakeful encoding, participants learned two 12-item sequences, A and B, A: 1 2 1 4 2 3 4 1 3 2 4 3 and B: 2 4 3 2 3 1 4 2 3 1 4 1. The location indicated which key on the keyboard to press as quickly and accurately as possible afterwards the next image appeared, locations and buttons associations were: 1 – top left corner = left shift key; 2 – bottom left corner = left Ctrl; 3 – top right corner = up arrow; 4 – bottom right corner = down arrow. Sequences had been matched for learning difficulty; both contained each item three times. The blocks were interleaved so that a block of the same sequence was presented no more than twice in a row, and each block contained three repetitions of a sequence. There were 24 blocks of each sequence (48 blocks in total), after each block a 15 s pause which could be extended by participants if they wish, during the pause participants were informed of their reaction time and error rate for the last block. We used two sets of pure musical tones, one consisted of low tones within the fourth octave (C/D/E/F) and the second consisted of high tones within the fifth octave (A/B/C#/D). Each sequence was paired with either low tones or high tones. These tone groups were counterbalanced across participants. The 48 blocks of sequences A and B were followed by four blocks that contained random sequences, which were used to help us isolate improvements in reaction time which were specific to sequence learning. Two of these random blocks were paired with the tone group of one sequence that was later replayed in REM sleep, and the other two were paired with the tone group of the other non-reactivated sequence (Abdellahi et al., 2023; Rakowska et al., 2021; Koopman et al., 2020).

On the centre of the screen ‘A’ and ‘B’ appeared while participants performed the task, and they knew that there were two sequences. However, they were not asked to explicitly learn the sequences. We counterbalanced which sequence was presented first and which sequence was reactivated across participants. At the beginning of each trial (time 0), a 200 ms tone was played, and at the same time a visual cue appeared in one of the corners of the screen. Participants were instructed to keep individual fingers of their left and right hand on the left and right response keys, respectively. Visual cues were neutral objects or faces, (Figure 1—figure supplement 1), used in previous study (Cousins et al., 2014). Visual cues appeared in the same position for each sequence (1=male face, 2=lamp, 3=female face, 4=water tap). Visual cues stayed on the screen until the correct key was pressed. After completing the SRTT, participants performed the imagery task by seeing and hearing the same cues as in the initial training task, but only imagining pressing the buttons without movement. This task consisted of 30 interleaved blocks (15 of each sequence) and was presented in the same order as during the SRTT. We used this imagery data for classification as it has higher signal-to-noise ratio since there is no movement compared to actual finger presses. Each trial consisted of a tone and a visual stimulus, the latter being shown for 270 ms and followed by an 880 ms inter-trial interval. There were no random blocks during the imagery task and no performance feedback during the pause between blocks. As a control, participants were asked to sleep in the lab before doing the SRTT training. The control night followed the same criteria as the actual experiment with the difference that the sounds were not yet associated with any task. Participants performed the task again in the morning but this time they did first motor imagery then SRTT. They were then asked to try to remember the locations of the images of the two sequences to test their recall for each of the sequences.

The delivery of TMR cues during REM sleep

Cues were delivered to participants during stable REM sleep which was identified using standard American Academy of Sleep Medicine (AASM) criteria (Richard et al., 2015). Tones were presented in the correct order for our sequence, with a pause of 1500 ms in between tones in a sequence, and a 20 s break at the end of each sequence. If participants showed any signs of arousal, or left REM sleep, cueing was paused immediately. Cueing then continued from where it had left off when they returned to stable REM sleep again.

EEG recording and pre-processing

EEG signals were acquired from 21 electrodes, following the 10–20 system. Thirteen electrodes were placed on standard locations: FZ, CZ, PZ, F3, F4, C5, CP3, C6, CP4, P7, P8, O1, and O2. Other electrodes were: left and right EOG, three EMG electrodes on the chin, and the ground electrode was placed on the forehead. Electrodes were referenced to the mean of the left and right mastoid electrodes. The impedance was <5 kΩ for each scalp electrode, and <10 kΩ for each face electrode. Recordings were made with an Embla N7000 amplifier and RemLogic 1.1 polysomnography (PSG) software (Natus Medical Incorporated). PSG recordings were scored by trained sleep scorers and only the segments scored as REM sleep were kept for further analyses.

Data were collected at 200 Hz sampling rate. EEG signals were band-pass filtered in the frequency range from (0.1 to 50 Hz). Subsequently, trials were cleaned based on statistical measures consisting of variance and mean. Trials were segmented –0.5 s to 3 s relative to the onset of the cue. EEG traces falling two standard deviations higher than the mean were considered outliers and trials were rejected if more than 25% of the channels were outliers. If trials were bad in less than 25% of the channels, they were interpolated using triangulation of neighbouring channels. Thus, 9.8% and 10.5% of trials were considered outliers and removed from the experimental night data and the adaptation night, respectively.

Data was subsequently analysed with independent component analysis (ICA), to remove eye movement artefacts which can occur during REM. Components identified by the ICA that were significantly correlated with the signals from the eye electrodes (corrected for multiple comparisons) were removed. All channels for each participant were manually inspected and artefacts were removed.

Time-frequency representation and ERP analysis

Our time-frequency analysis used a similar method to that used in Cairney et al., 2018. We used a hanning taper with 5 cycles that was convolved with the signals. We used 0.5 Hz frequency steps and 5 ms time steps. Power values are shown in the range of 4–30 Hz, Figure 2a and b. We also used a similar baseline of –300 ms to –100 ms pre-stimulus. The reported values represent the percentage of power change from baseline. The shown plots are the grand average from all participants and all channels. The same process was applied to both experimental and adaptation nights. For the ERP analysis, we adopted a similar approach to that in Cairney et al., 2018. We identified a baseline period of –200 ms to 0 ms and we reported the grand average from all participants and all channels. Small values of amplitudes are caused by the overall smoothing that would happen as a result of averaging many trials and taking the grand average from participants and channels, thus small shifts between values will make amplitude values smaller as shown, Figure 2a and b.

Time × time classification with time domain features

We adopted a time × time classification approach (temporal generalisation) on the time domain features. We first smoothed the EEG signals using 100 ms window such that every time point was replaced with the average of the 50 ms of both sides around it. We used a searchlight analysis to locate channels of interest. This used data from 12 participants who had performed the same SRTT task as those in the current study, but who had not undergone REM TMR and were therefore not included in our current dataset. Our classification used the smoothed amplitudes as features and LDA classifiers with fivefold cross-validation. In our searchlight analysis, the classifier was trained for every channel and time points were used as features. Thus, a classification outcome was obtained for every channel as shown in Figure 2—figure supplement 4. In the time × time classification, EEG signals of sleep and wake are organised as 3D tensors where the dimensions were: trials × channels × time points. We then trained an LDA classifier at every time point from sleep (trials × channels), with the values from all the different channels as our features. The classifier from each time point in sleep was applied to all time points from wake to obtain one row of classification results in the time × time classification plot (e.g., Figure 2—figure supplement 5). The process was then repeated for all time points after a sound cue in sleep (trial length was: 1.5 s in sleep and 1.1 s in wake). Classifiers were first trained on sleep data from a participant and tested on wake data from that same participant, then the AUC values from different participants were averaged. In sleep, only sounds from the cued sequence were used. Two of the sounds (1=male face, 2=lamp) were associated with left hand and thus their trials were aggregated to form the left-hand class. The other two sounds (3=female face, 4=water tap) were associated with the right hand so their trials were aggregated to form the right-hand class. When testing the classifier on wake data, we were interested in the differential activation pattern for left- and right-handed motor imagery. We therefore included motor imagery data from both cued and un-cued sequences and from both pre-and post-sleep imagery sessions. There was an issue with the recorded data of pre-sleep imagery for one participant, thus, for that participant only the post-sleep imagery was used.

We delivered TMR cues as long as participants were in stable REM. This has the advantage of allowing us to train our classification models with many trials but in the meantime, we needed to make sure that our TMR cues were processed by the brain. We knew the pattern of brain response elicited after TMR in both experimental and adaptation nights as shown in Figure 2a and b, and used this to assess the success of TMR in eliciting a response. We thus incorporated fidelity of TMR response to this pattern in our model. We did this by measuring the average theta power at [4 8]Hz in the first second after TMR. We then averaged these values across channels and time, giving one value for each sleep trial. We then robustly normalised the values by subtracting the median and dividing by the inter-quartile range for each value. Next, we rescaled the values to have the range [0 1] to act as weights for trials when performing classification. Values that were higher than the 90th percentile were set to 1 and those lower than the 10th percentile were set to 0, with normalisation and flooring and capping weights reflect the fidelity of each TMR cue. We then gave those weights for the classifiers. This process was done for both the experimental and the adaptation nights. This provides a new way of informing the classifier of the fidelity of each TMR cue. We also report the number of trials for each condition in Supplementary file 1. We used the AUC as the performance measure in our classification. Analyses were done in Matlab using FieldTrip (Oostenveld et al., 2011), MVPA-Light (Treder, 2020), Lively Vectors (LV) (Abdellahi, 2022), and customised scripts using Matlab, 2018. The clustering window used for cluster-based permutation was the whole length of trials (i.e., the whole time × time classification duration) which gave a stringent test. In other words, we did not limit the clustering window to a specific time window.

Behavioural analysis

The most robust measure of performance on the SRTT comes from the best performance blocks in each session (Cousins et al., 2014), we therefore took the average of the best four blocks before and after sleep to calculate performance. The SRTT measure which consolidates most strongly with sleep (Spencer et al., 2006) and with TMR (Cousins et al., 2014; Cousins et al., 2016) is SSS. This was calculated by subtracting reaction times on sequenced blocks from reaction times on the associated random blocks: SSS = (mean of random blocks – mean of best four blocks). Behavioural improvement across sleep was then calculated by: (SSS post-sleep – SSS pre-sleep)/SSS pre-sleep. In order to determine whether there are differences in this measure as a result of TMR, we calculated the relative difference in overnight improvement for cued and un-cued sequences as follows: relative sequence improvement (cued vs un-cued) = (improvement across sleep of the cued sequence – improvement across sleep of the un-cued sequence)/improvement across sleep of the un-cued sequence. When we correlated the relative cued vs. un-cued sequence improvement with classification performance, we calculated the latter as the mean of classification AUC values from each participant inside the cluster shown in Figure 2c.

Correcting for multiple comparisons

Multiple comparisons correction was done using MVPA-Light toolbox in Matlab (Treder, 2020) and customised scripts. In cluster-based permutation, we had a 2D time × time classification from each participant and each condition (experimental and adaptation). Each 2D classification outcome in the adaptation was subtracted from its corresponding classification outcome in the experimental night. Afterwards, each point of that 2D difference was tested among participants to see whether it significantly differs from 0 or not. This gave a 2D z-statistic values at each point. Then, the z-statistic values of the significant contiguous points were added together to get the observed cluster statistic. Afterwards, permutations were done where 2D classification outcomes were randomly shuffled between conditions and the process of taking the difference, computing z-values to see significant points, and adding the z-values of the significant points is repeated. This was one shuffling iteration, we repeated this 10,000 times and constructed a distribution of those permutation results and tested whether our observed statistic of the actual data was significant. Cluster-based permutation with sample-specific threshold of 0.05 was used. Permutation test threshold for clusters was 0.05.

Data availability

All relevant data generated or analysed are available along with Matlab scripts. Data are available at the Open Science Framework (OSF): https://osf.io/wmyae/; https://osf.io/fq7v5/.

The following data sets were generated
    1. Abdellahi MEA
    (2022) Open Science Framework
    ID wmyae. REM sleep TMR in human REM sleep elicits detectable reactivation with TMR.
    1. Abdellahi MEA
    (2022) Open Science Framework
    ID fq7v5. TMR in human REM sleep elicits detectable reactivation part2.

References

  1. Software
    1. Matlab
    (2018) Matlab 2018A
    The Mathworks Inc.
  2. Book
    1. Richard B
    2. Rita B
    3. Charlene EG
    4. Harding SM
    5. Lloyd RM
    6. Marcus CL
    (2015)
    The AASM Manual for the Scoring of Sleep and Associated Events.
    Am Acad Sleep Med.

Decision letter

  1. Laura L Colgin
    Senior and Reviewing Editor; University of Texas at Austin, United States
  2. Kenneth A Norman
    Reviewer; Princeton University, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Targeted memory reactivation in human REM sleep elicits detectable reactivation" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Laura Colgin as the Senior Editor. The following individual involved in the review of your submission has agreed to reveal their identity: Kenneth A Norman (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Reviewers agreed that there is a lack of clarity in the Methods as currently written which make it difficult to evaluate the results (as one example, what features go into the classifier?). The Methods will need to be revised to provide clarification before a decision can be made about whether the claims in this manuscript are rigorously supported by the results. Please see individual reviews below for details.

2) Some claims are not well supported by the presented results. Additional analyses are suggested that may strengthen the evidence for some of the claims. Claims with insufficient support (e.g., claims related to theta, claims related to dilation/compression, etc.) should be removed from the paper unless additional analyses/results can be included to provide sufficient support. Please see individual reviews below for details.

3) The justification for excluding trials (e.g., discarding low variance trials and discarding trials to match the number of clear trials across participants) is unsatisfactory. Please see individual reviews below for details.

Reviewer #1 (Recommendations for the authors):

My specific suggestions are as follows:

I suggest including a section in the methods explaining how TMR sounds were presented during REM sleep.

Please describe the methods for the searchlight analysis used to select channels of interest and a region of interest.

It would be helpful to have more details about how the LDA classifier was trained on the sleep data. It is my understanding that each sleep trial began with a TMR cue (a sequence of tones), and that both the cued and uncued sequences were associated with both left and right-handed button presses (each sequence contains 1,2,3, and 4). It is unclear to me how the classifier is then trained to discriminate between left vs. right hand.

It would also be helpful to have more details about testing on the imagery data, including more details about how the imagery data were prepared and preprocessed. Was the classifier tested on the imagery data from both pre- and post-sleep, from both cued and uncued sequence trials?

Regarding the procedure of rejecting trials with low variance ('Because TMR will not be effective with all trials, we also rejected trials with a low variance that do not differ from their mean across time since they are unlikely to contain a response.') – my reading of this is that "a response" refers to reactivation. If that is the case, then trials from the adaptation night would not contain "a response". Or perhaps a response more broadly refers to any event-related response to the sound cue. To alleviate any concerns related to this, it might be helpful to include some more descriptive information about how many trials are included in various conditions/analyses.

Reviewer #3 (Recommendations for the authors):

One recommendation is for the authors to clarify the methods. In particular:

The description of the "searchlight" procedure that was used for feature selection lacked detail. I couldn't figure out what the authors did from what was written in the paper. What features were included in each searchlight? Did the searchlights encompass multiple electrodes? What frequencies were included for each electrode? or did the authors use raw power? Also, I don't understand the statement on line 354 that "this was done on different participants who performed the same task to avoid circularity". Does this mean that feature selection was done separately for each participant, using a nested cross-validation procedure? or by "different participants" do you mean "participants who were not included in the main sample for this study"? Much more detail is needed here.

Is the main classification analysis done within participants (i.e., train on one participant's sleep data, test on the same participant's wake data) or across participants? I am assuming the former, but I tried looking this up and I could not find it (apologies if I missed it). Relatedly, for the statistics, is AUC first computed within each subject and then averaged across subjects, or is it computed differently?

Line 375 mentions a PCA step in the time dilation analysis, but it is unclear what the features are that are being fed into the PCA (are the feature vectors of dimensionality n_frequencies x n_electrodes x n_timepoints, or are they something else, and is the PCA looking at variance across trials, or across time points within a trial, or something else).

More information on the cluster-based permutation (i.e., what was permuted?) would be useful.

My second recommendation is for the authors to revisit the issues mentioned in the public review regarding theta-mediation, time dilation/compression, and sequence-specificity of the classification-behavior relationship. If the authors can not strengthen their claims, they should step back from making these claims.

Regarding the analyses looking at theta: The authors' median split approach is not the most statistically powerful way to address this question (see, e.g., papers by Gary McClelland about median splits). Using a more continuous regression approach might yield better results.

Regarding the analyses looking at dilation/compression: Here, the authors' main task is to show that their results necessitate a dilation/compression explanation, as opposed to simply being due to autocorrelation in feature patterns over time.

Regarding the analyses looking at the "sequence-specificity" of the classifier-behavior relationship: To assess sequence-specificity, the authors could run a bootstrap where, for each resampling of subjects, they compute the difference between the fisher z of the cued sequence and the fisher z of the uncued sequence.

I was intrigued by the authors' use of training on sleep data and testing on wake data. The justification for this (that features that are reactivated during sleep may be a subset of features that are reactivated during wake) is a sensible hypothesis but conjectural. If this hypothesis is correct, it implies that results should be worse if classifiers are trained on wake and tested during sleep. I don't think it's necessary to do this, but if the authors tried the reverse approach it would be useful to see the results (if wake-trained classifiers are indeed worse than sleep-trained classifiers, this would provide some converging support for the authors' claim that reactivated features during sleep are a subset of features activated during wake). It might also be useful to see the results if the authors train on sleep and test on sleep.

For the analyses looking at whether theta power is used for classification, the authors perform a "negative control" where they include theta-band oscillations (but filter out other bands) and show that reactivation goes away. They may also want to include a "positive control" where they filter out theta-band oscillations and show that reactivation persists.

The fact that objects and faces were associated with the motor responses (crossed with the left and right responses) suggests to me that the authors may have considered looking at object/face classification in addition to the left/right classification. If the authors did in fact perform other analyses (that yielded null results) I think it would add to the value of the paper to report them here.

The authors match the number of "clean trials" across participants, using the smallest number that was obtained across all participants (366) – this seems like it is unnecessarily discarding useful data. If the AUC is computed within each subject and then averaged across subjects, then each subject will "count the same" regardless of how many trials are included.

Lines 300-302: why did you include these four blocks that contained random sequences?

How do the authors reconcile the overall lack of benefit from REM TMR with the finding that, across participants, the degree of reactivation evoked by REM TMR cues correlates with behavioural change? The explanation on lines 244-246 ("This absence of behavioural improvement could be due to individual differences in the way REM processes the task, potentially relating to an interaction between the extent or style of learning and REM processes") was not helpful to me in reconciling these two points – the authors might want to further unpack their argument here to make it clearer.

Phrase: "wake-like memory reactivation" – I am not sure what this phrase means.

https://doi.org/10.7554/eLife.84324.sa1

Author response

Essential revisions:

1) Reviewers agreed that there is a lack of clarity in the Methods as currently written which make it difficult to evaluate the results (as one example, what features go into the classifier?). The Methods will need to be revised to provide clarification before a decision can be made about whether the claims in this manuscript are rigorously supported by the results. Please see individual reviews below for details.

We have rewritten the methods so that it is clearer and have considered all the points that reviewers raised and conducted further analyses as requested.

2) Some claims are not well supported by the presented results. Additional analyses are suggested that may strengthen the evidence for some of the claims. Claims with insufficient support (e.g., claims related to theta, claims related to dilation/compression, etc.) should be removed from the paper unless additional analyses/results can be included to provide sufficient support. Please see individual reviews below for details.

We thank the reviewers for this comment. We have removed the dilation/compression analysis from the manuscript, and altered the theta analysis.

3) The justification for excluding trials (e.g., discarding low variance trials and discarding trials to match the number of clear trials across participants) is unsatisfactory. Please see individual reviews below for details.

We have now altered the analysis to avoid exclusion of trials in this way. The new analysis showed the same effect which is reassuring. We have updated the manuscript accordingly.

Reviewer #1 (Recommendations for the authors):

My specific suggestions are as follows:

I suggest including a section in the methods explaining how TMR sounds were presented during REM sleep.

Thanks for this comment. We have now included it as follows:

“The delivery of TMR cues during REM sleep

Cues were delivered to participants during stable REM sleep which was identified using standard American Academy of Sleep Medicine (AASM) criteria (Richard et al., 2015). Tones were presented in the correct order for our sequence, with a pause of 1500ms in between tones in a sequence, and a 20-second break at the end of each sequence. If participants showed any signs of arousal, or left REM sleep, cueing was paused immediately. Cueing then continued from where it had left off when they returned to stable REM sleep again.”

Please describe the methods for the searchlight analysis used to select channels of interest and a region of interest.

We have now updated the methods section and included more details about this in the section on time x time classification with time domain features:

“We used a searchlight analysis to locate channels of interest. This used data from 12 participants who had performed the same SRTT task as those in the current study, but who had not undergone REM TMR and were therefore not included in our current dataset. Our classification used the smoothed amplitudes as features and LDA classifiers with five-fold cross-validation. In our searchlight analysis, the classifier was trained for every channel and time points were used as features. Thus, a classification outcome was obtained for every channel as shown in (Figure 2—figure supplement 4).”

It would be helpful to have more details about how the LDA classifier was trained on the sleep data. It is my understanding that each sleep trial began with a TMR cue (a sequence of tones), and that both the cued and uncued sequences were associated with both left and right-handed button presses (each sequence contains 1,2,3, and 4). It is unclear to me how the classifier is then trained to discriminate between left vs. right hand.

Thanks for raising this point. This is very important, and we apologise that we were not clear enough in describing this before. Each sequence was associated with 4 tones (one for each finger), and the cued sequence used four completely different tones from the un-cued sequence (counterbalanced across participants). Each sleep trial began with just one TMR tone – e.g. the sound associated with just one finger used in the cued sequence. Our LDA classifiers were trained at every time point in the sleep data after each TMR tone onset, and a classifier model was built. Classes 1 and 2 were aggregated to represent the left-hand class, similarly, classes 3 and 4 were aggregated to represent the right-hand class. This model was then tested on each time point in wake after the presentation of each sound. We have edited this part in the methods to make the point clearer.

“In sleep, only sounds from the cued sequence were used. Two of the sounds (1 = male face, 2 = lamp) were associated with left hand and thus their trials were aggregated to form the left-hand class.”

It would also be helpful to have more details about testing on the imagery data, including more details about how the imagery data were prepared and preprocessed. Was the classifier tested on the imagery data from both pre- and post-sleep, from both cued and uncued sequence trials?

We were interested in the activation pattern of right- and left-hand motor imagery which should not be dependent on the tone, thus, the classifier was tested on imagery data from the cued and un-cued sequences and we included all data from pre- and post-sleep motor imagery. Data were collected at 100 Hz sampling rate and were band-pass filtered in the range [0.1 50] Hz, then segmented into trials of 1100ms. We have now included more details about this in the text:

“When testing the classifier on wake data, we were interested in the differential activation pattern for left- and right-handed motor imagery. We therefore included motor imagery data from both cued and un-cued sequences and from both pre-and post-sleep imagery sessions. There was an issue with the recorded data of pre-sleep imagery for one participant, thus, for that participant only the post-sleep imagery was used.”

Reviewer #3 (Recommendations for the authors):

One recommendation is for the authors to clarify the methods. In particular:

The description of the "searchlight" procedure that was used for feature selection lacked detail. I couldn't figure out what the authors did from what was written in the paper. What features were included in each searchlight? Did the searchlights encompass multiple electrodes? What frequencies were included for each electrode? or did the authors use raw power? Also, I don't understand the statement on line 354 that "this was done on different participants who performed the same task to avoid circularity". Does this mean that feature selection was done separately for each participant, using a nested cross-validation procedure? or by "different participants" do you mean "participants who were not included in the main sample for this study"? Much more detail is needed here.

Thanks for this comment and sorry that we were not clear. We have now updated the methods section and included more details about this in the section on time x time classification with time domain features:

“We used a searchlight analysis to locate channels of interest. This used data from 12 participants who had performed the same SRTT task as those in the current study, but who had not undergone REM TMR and were therefore not included in our current dataset. Our classification used the smoothed amplitudes as features and LDA classifiers with five-fold cross-validation. In our searchlight analysis, the classifier was trained for every channel and time points were used as features. Thus, a classification outcome was obtained for every channel as shown in (Figure 2—figure supplement 4).”

Is the main classification analysis done within participants (i.e., train on one participant's sleep data, test on the same participant's wake data) or across participants? I am assuming the former, but I tried looking this up and I could not find it (apologies if I missed it). Relatedly, for the statistics, is AUC first computed within each subject and then averaged across subjects, or is it computed differently?

Thanks for this point. Yes, the classification was done by training on one participant and testing on the same participant, we now added this in the text. Yes, the AUC was calculated for each participant and then averaged across participants, we have included this now in the text as well.

“Classifiers were first trained on sleep data from a participant and tested on wake data from that same participant, then the AUC values from different participants were averaged.”

Line 375 mentions a PCA step in the time dilation analysis, but it is unclear what the features are that are being fed into the PCA (are the feature vectors of dimensionality n_frequencies x n_electrodes x n_timepoints, or are they something else, and is the PCA looking at variance across trials, or across time points within a trial, or something else).

In response to your comments and those of other reviewers we have now removed the analysis of temporal compression and dilation. However, we have tried to answer this query:

“raw data was smoothed and used as time domain features. The data was then organized as trials x channels x timepoints then we segmented each trial in time based on the compression factor we are using. For instance, if we test if sleep is 2x faster than wake we look at the trial lengths in wake which was 1.1 sec. and we take half of this value which is 0.55 sec. we then take different window in time from sleep data such that each sleep trial will have multiple smaller segments each of 0.55 sec., we then add those segments as new trials and label them with the respective trial label. Afterward, we resize those segments temporally to match the length of wake trial. We now reshape our data from trials x channels x timepoints to trials x channels_timepoints so we aggregate channels and timepoints into one dimension. We then feed this to PCA to reduce the dimensionality of channels_timepoints into principal components. We then feed the resultant features to a LDA classifier for classification. This whole process is repeated for every scaling factor and it is done within participant in the same fashion the main classification was done and the error bars were the standard errors and we tested the results from the experimental night to those of that adaptation night.”

More information on the cluster-based permutation (i.e., what was permuted?) would be useful.

We have now added the details of cluster-based permutation:

“In cluster-based permutation, we had a 2d time x time classification from each participant and each condition (experimental and adaptation). Each 2d classification outcome in the adaptation was subtracted from its corresponding classification outcome in the experimental night. Afterwards, each point of that 2d difference was tested among participants to see whether it significantly differs from 0 or not. This gave a 2d z-statistic values at each point. Then the z-statistic values of the significant contiguous points were added together to get the observed cluster statistic. Afterwards, permutations were done where 2d classification outcomes were randomly shuffled between conditions and the process of taking the difference, computing z-values to see significant points, and adding the z-values of the significant points is repeated. This was one shuffling iteration, we repeated this 10,000 times and constructed a distribution of those permutation results and tested whether our observed statistic of the actual data was significant.”

My second recommendation is for the authors to revisit the issues mentioned in the public review regarding theta-mediation, time dilation/compression, and sequence-specificity of the classification-behavior relationship. If the authors can not strengthen their claims, they should step back from making these claims.

Regarding the analyses looking at theta: The authors' median split approach is not the most statistically powerful way to address this question (see, e.g., papers by Gary McClelland about median splits). Using a more continuous regression approach might yield better results.

Thanks for the comment. In response to your comments and those of other reviewers we have now altered this analysis and no longer split the trials into high and low theta.

Regarding the analyses looking at dilation/compression: Here, the authors' main task is to show that their results necessitate a dilation/compression explanation, as opposed to simply being due to autocorrelation in feature patterns over time.

Thank you. We have removed the analysis of temporal compression and dilation from the manuscript, so our answer is no longer relevant to the review process. However, we provide some details about this analysis. Raw data was smoothed and used as time domain features. The data was then organized as trials x channels x timepoints then we segmented each trial in time based on the compression factor we are using. For instance, if we test if sleep is 2x faster than wake we look at the trial lengths in wake which was 1.1 sec. and we take half of this value which is 0.55 sec. we then take a different window in time from sleep data such that each sleep trial will have multiple smaller segments each of 0.55 sec., we then add those segments as new trials and label them with the respective trial label. Afterwards, we resize those segments temporally to match the length of wake trials. We now reshape our data from trials x channels x timepoints to trials x channels_timepoints so we aggregate channels and timepoints into one dimension. We then feed this to PCA to reduce the dimensionality of channels_timepoints into principal components. We then feed the resultant features to a LDA classifier for classification. This whole process is repeated for every scaling factor and it is done within participant in the same fashion the main classification was done and the error bars were the standard errors. We compared the results from the experimental night to those of the adaptation night.

Regarding the analyses looking at the "sequence-specificity" of the classifier-behavior relationship: To assess sequence-specificity, the authors could run a bootstrap where, for each resampling of subjects, they compute the difference between the fisher z of the cued sequence and the fisher z of the uncued sequence.

Thank you for the suggestion. We have now considered comments regarding sequence specificity, and have now examined sequence specific improvement by subtracting the improvement of the un-cued sequence from improvement on the cued sequence and then normalising the result by the improvement of the un-cued sequence. The resulting value, which we term ‘cued sequence improvement’ correlated significantly with classification performance (n = 14, r = 0.56, p = 0.04).

We have updated the text as follows:

“We therefore set out to determine whether there was a relationship between the extent to which we could classify reactivation and overnight improvement on the cued sequence. This revealed a positive correlation (n = 14, r = 0.56, p = 0.04), Figure 3b.”

I was intrigued by the authors' use of training on sleep data and testing on wake data. The justification for this (that features that are reactivated during sleep may be a subset of features that are reactivated during wake) is a sensible hypothesis but conjectural. If this hypothesis is correct, it implies that results should be worse if classifiers are trained on wake and tested during sleep. I don't think it's necessary to do this, but if the authors tried the reverse approach it would be useful to see the results (if wake-trained classifiers are indeed worse than sleep-trained classifiers, this would provide some converging support for the authors' claim that reactivated features during sleep are a subset of features activated during wake). It might also be useful to see the results if the authors train on sleep and test on sleep.

We like the suggestion of looking at the reverse approach, and therefore decided to conduct this extra analysis. This indeed showed that the classification performance is reduced when wake was used to train the classifiers as compared to when sleep was used. Figure 2-figure supplement 3 shows the classification results for (a) training the classifiers with wake and testing with sleep (b) training the classifiers with sleep and testing with wake. The difference in the AUC values is clear. To make sure that this is consistent among participants and not due to averaging the classification performance, we compared each of the classification outcomes to their corresponding classification of the adaptation night which gave two 2d zscores and in (c) we show the difference between the two zscores, when the zscore of the wake trained is subtracted from sleep trained approach. The difference of zscores is positive and higher than 2 around 1sec. which was the time of the cluster that gave significant effect in the original analysis. This shows that training classifiers with sleep enabled the classifiers to adapt to sleep data.

For the analyses looking at whether theta power is used for classification, the authors perform a "negative control" where they include theta-band oscillations (but filter out other bands) and show that reactivation goes away. They may also want to include a "positive control" where they filter out theta-band oscillations and show that reactivation persists.

Thanks for the suggestion. We have conducted this analysis and now include the results of the positive control where we filter out theta-band [4 8]Hz oscillations. This shows that reactivation evidence was significant against both chance and the adaptation night when non-theta band frequencies are included:

“For completeness, we also conducted a positive control in which we filtered-out theta band and performed our classification analysis. This showed a similar significant pattern as we had observed using the full spectrum without filtering, e.g. as in Figure 2c, d (n=14, p=0.007, corrected with cluster-based permutation vs. chance level) and (n=14, p=0.009, corrected with cluster-based permutation vs. the adaptation night, as shown in Figure 2—figure supplement 2).”

The z-statistic values and the classification AUC are now included as Figure 2—figure supplement 2.

The fact that objects and faces were associated with the motor responses (crossed with the left and right responses) suggests to me that the authors may have considered looking at object/face classification in addition to the left/right classification. If the authors did in fact perform other analyses (that yielded null results) I think it would add to the value of the paper to report them here.

Thanks for this comment. We had a priori-hypothesis about classification of left and right hands as we found higher classification rates with right- and left-hand previously with our classification of this task (Abdellahi et al., 2023). In the future we may try to perform classification of objects and faces with different features or perform a multi-class classification of the four classes with different features and test the reactivation of individual items, but that is outside the scope of this paper.

The authors match the number of "clean trials" across participants, using the smallest number that was obtained across all participants (366) – this seems like it is unnecessarily discarding useful data. If the AUC is computed within each subject and then averaged across subjects, then each subject will "count the same" regardless of how many trials are included.

Thanks for this. We have now changed this part of our analysis and no longer fix the number of trials at 366. We now also report the number of trials in supplementary file 1.

Lines 300-302: why did you include these four blocks that contained random sequences?

The random sequence was used to help us calculate the Sequence Specific Skill (SSS), the SRTT measure which consolidates most strongly with sleep (Spencer et al., 2006) and with TMR (Cousins et al., 2014, 2016). Specifically, this is a measure of how each fixed sequence improved in relation to the associated random sequence, showing how much of the improvement was sequence specific (Cousins et al., 2014, 2016; Rakowska et al., 2021). This is achieved by subtracting the reaction times of the fixed sequence from the random sequence, thus if the reaction times of the fixed sequence improves a lot across sleep, and those of the random sequence do not then there is a big difference which, will show up as higher SSS (Koopman et al., 2020; Rakowska et al., 2021).

How do the authors reconcile the overall lack of benefit from REM TMR with the finding that, across participants, the degree of reactivation evoked by REM TMR cues correlates with behavioural change? The explanation on lines 244-246 ("This absence of behavioural improvement could be due to individual differences in the way REM processes the task, potentially relating to an interaction between the extent or style of learning and REM processes") was not helpful to me in reconciling these two points – the authors might want to further unpack their argument here to make it clearer.

Thank you, we now discuss this as:

“We are at a very early phase in understanding what TMR does in REM sleep, however we do know that the connection between hippocampus and neocortex is inhibited by the high levels of Acetylcholine that are present in REM (Hasselmo, 1999). This means that the reactivation which we observe in the cortex is unlikely to be linked to corresponding hippocampal reactivation, so any consolidation which occurs as a result of this is also unlikely to be linked to the hippocampus. The SRTT is a sequencing task which relies heavily on the hippocampus, and our primary behavioural measure (Sequence Specific Skill) specifically examines the sequencing element of the task. Our own neuroimaging work has shown that TMR in non-REM sleep leads to extensive plasticity in the medial temporal lobe (Cousins et al., 2016). However, if TMR in REM sleep has no impact on the hippocampus then it is quite possible that it elicits cortical reactivation and leads to cortical plasticity but provides no measurable benefit to Sequence Specific Skill. Alternatively, because we only measured behavioural improvement right after sleep it is possible that we may have missed behavioural improvements that would have emerged several days later, as we know can occur in this task (Rakowska et al., 2021).”

Phrase: "wake-like memory reactivation" – I am not sure what this phrase means.

We have now changed this phrase:

Abstract:

“These findings provide the first evidence for memory reactivation in human REM sleep after TMR that is directly related to brain activity during wakeful task performance.”

Introduction:

“Nevertheless, the reinstatement of EEG activity in REM that directly relates to EEG activity during a task in wake has yet to be demonstrated in humans.”

Discussion:

“In this paper we demonstrated that memory reactivation after TMR cues in human REM sleep can be decoded using EEG classifiers. Such reactivation appears to be most prominent about one second after the sound cue onset.”

References

Abdellahi, M. E. A., Koopman, A. C. M., Treder, M. S., and Lewis, P. A. (2023). Targeting targeted memory reactivation: characteristics of cued reactivation in sleep. NeuroImage, 266, 2021.12.09.471945. https://www.sciencedirect.com/science/article/pii/S1053811922009417?via%3Dihub

Cairney, S. A., Guttesen, A. á. V., El Marj, N., and Staresina, B. P. (2018). Memory Consolidation Is Linked to Spindle-Mediated Information Processing during Sleep. Current Biology, 28(6), 948-954.e4. https://doi.org/10.1016/j.cub.2018.01.087

Cousins, J. N., El-Deredy, W., Parkes, L. M., Hennies, N., and Lewis, P. A. (2014). Cued memory reactivation during slow-wave sleep promotes explicit knowledge of a motor sequence. Journal of Neuroscience, 34(48), 15870–15876. https://doi.org/10.1523/JNEUROSCI.1011-14.2014

Cousins, J. N., El-Deredy, W., Parkes, L. M., Hennies, N., and Lewis, P. A. (2016). Cued Reactivation of Motor Learning during Sleep Leads to Overnight Changes in Functional Brain Activity and Connectivity. PLoS Biology, 14(5), e1002451. https://doi.org/10.1371/journal.pbio.1002451

Diekelmann, S., and Born, J. (2010). The memory function of sleep. In Nature Reviews Neuroscience. https://doi.org/10.1038/nrn2762

Hasselmo, M. E. (1999). Neuromodulation: Acetylcholine and memory consolidation. In Trends in Cognitive Sciences. https://doi.org/10.1016/S1364-6613(99)01365-0

Koopman, A. C. M., Abdellahi, M. E. A., Belal, S., Rakowska, M., Metcalf, A., Śledziowska, M., Hunter, T., and Lewis, P. (2020). Targeted memory reactivation of a serial reaction time task in SWS, but not REM, preferentially benefits the non-dominant hand. BioRxiv, 2020.11.17.381913. https://doi.org/10.1101/2020.11.17.381913

Pereira, S. I. R., Beijamini, F., Vincenzi, R. A., and Louzada, F. M. (2015). Re-examining sleep’s effect on motor skills: How to access performance on the finger tapping task? Sleep Science. https://doi.org/10.1016/j.slsci.2015.01.001

Rakowska, M., Abdellahi, M. E. A., Bagrowska, P., Navarrete, M., and Lewis, P. A. (2021). Long term effects of cueing procedural memory reactivation during NREM sleep. NeuroImage. https://doi.org/10.1016/j.neuroimage.2021.118573

Richard, B., Rita, B., Charlene, E. G., Harding;, S. M., Lloyd;, R. M., Marcus, C. L., and Vaughn, B. V. (2015). The AASM Manual for the Scoring of Sleep and Associated Events. American Academy of Sleep Medicine.

Schönauer, M., Alizadeh, S., Jamalabadi, H., Abraham, A., Pawlizki, A., and Gais, S. (2017). Decoding material-specific memory reprocessing during sleep in humans. Nature Communications. https://doi.org/10.1038/ncomms15404

Schreiner, T., and Rasch, B. (2015). Boosting vocabulary learning by verbal cueing during sleep. Cerebral Cortex. https://doi.org/10.1093/cercor/bhu139

Spencer, R. M. C., Sunm, M., and Ivry, R. B. (2006). Sleep-Dependent Consolidation of Contextual Learning. Current Biology. https://doi.org/10.1016/j.cub.2006.03.094

https://doi.org/10.7554/eLife.84324.sa2

Article and author information

Author details

  1. Mahmoud EA Abdellahi

    School of Psychology, Cardiff University Brain Research Imaging Centre (CUBRIC), Cardiff, United Kingdom
    Contribution
    Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7765-9028
  2. Anne CM Koopman

    School of Psychology, Cardiff University Brain Research Imaging Centre (CUBRIC), Cardiff, United Kingdom
    Contribution
    Conceptualization, Data curation, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Matthias S Treder

    School of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
    Contribution
    Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Penelope A Lewis

    School of Psychology, Cardiff University Brain Research Imaging Centre (CUBRIC), Cardiff, United Kingdom
    Contribution
    Conceptualization, Supervision, Validation, Investigation, Methodology, Project administration, Writing – review and editing
    For correspondence
    lewisp8@cardiff.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1793-3520

Funding

European Research Council (681607)

  • Penelope A Lewis

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to thank Martyna Rakowska and Lorena Santamaria and all members of our group NaPs for the useful comments. This work was funded by the ERC grant SolutionSleep, 681607, to PAL.

Ethics

This study was approved by the School of Psychology, Cardiff University Research Ethics Committee, and all participants gave written informed consents. Information of the participants are anonymised. Reference: EC.16.11.08.4772RA2. Risk Assessment: 1479917576_1583.

Senior and Reviewing Editor

  1. Laura L Colgin, University of Texas at Austin, United States

Reviewer

  1. Kenneth A Norman, Princeton University, United States

Version history

  1. Preprint posted: December 2, 2021 (view preprint)
  2. Received: October 19, 2022
  3. Accepted: June 22, 2023
  4. Accepted Manuscript published: June 23, 2023 (version 1)
  5. Version of Record published: August 14, 2023 (version 2)

Copyright

© 2023, Abdellahi et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,138
    Page views
  • 149
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Mahmoud EA Abdellahi
  2. Anne CM Koopman
  3. Matthias S Treder
  4. Penelope A Lewis
(2023)
Targeted memory reactivation in human REM sleep elicits detectable reactivation
eLife 12:e84324.
https://doi.org/10.7554/eLife.84324

Share this article

https://doi.org/10.7554/eLife.84324

Further reading

    1. Neuroscience
    Peibo Xu, Jian Peng ... Yuejun Chen
    Research Article

    Deciphering patterns of connectivity between neurons in the brain is a critical step toward understanding brain function. Imaging-based neuroanatomical tracing identifies area-to-area or sparse neuron-to-neuron connectivity patterns, but with limited throughput. Barcode-based connectomics maps large numbers of single-neuron projections, but remains a challenge for jointly analyzing single-cell transcriptomics. Here, we established a rAAV2-retro barcode-based multiplexed tracing method that simultaneously characterizes the projectome and transcriptome at the single neuron level. We uncovered dedicated and collateral projection patterns of ventromedial prefrontal cortex (vmPFC) neurons to five downstream targets and found that projection-defined vmPFC neurons are molecularly heterogeneous. We identified transcriptional signatures of projection-specific vmPFC neurons, and verified Pou3f1 as a marker gene enriched in neurons projecting to the lateral hypothalamus, denoting a distinct subset with collateral projections to both dorsomedial striatum and lateral hypothalamus. In summary, we have developed a new multiplexed technique whose paired connectome and gene expression data can help reveal organizational principles that form neural circuits and process information.

    1. Neuroscience
    Maureen van der Grinten, Jaap de Ruyter van Steveninck ... Yağmur Güçlütürk
    Tools and Resources

    Blindness affects millions of people around the world. A promising solution to restoring a form of vision for some individuals are cortical visual prostheses, which bypass part of the impaired visual pathway by converting camera input to electrical stimulation of the visual system. The artificially induced visual percept (a pattern of localized light flashes, or ‘phosphenes’) has limited resolution, and a great portion of the field’s research is devoted to optimizing the efficacy, efficiency, and practical usefulness of the encoding of visual information. A commonly exploited method is non-invasive functional evaluation in sighted subjects or with computational models by using simulated prosthetic vision (SPV) pipelines. An important challenge in this approach is to balance enhanced perceptual realism, biologically plausibility, and real-time performance in the simulation of cortical prosthetic vision. We present a biologically plausible, PyTorch-based phosphene simulator that can run in real-time and uses differentiable operations to allow for gradient-based computational optimization of phosphene encoding models. The simulator integrates a wide range of clinical results with neurophysiological evidence in humans and non-human primates. The pipeline includes a model of the retinotopic organization and cortical magnification of the visual cortex. Moreover, the quantitative effects of stimulation parameters and temporal dynamics on phosphene characteristics are incorporated. Our results demonstrate the simulator’s suitability for both computational applications such as end-to-end deep learning-based prosthetic vision optimization as well as behavioral experiments. The modular and open-source software provides a flexible simulation framework for computational, clinical, and behavioral neuroscientists working on visual neuroprosthetics.