1 Main

Over the past three decades, accumulating evidence supports the role of sleep neural oscillations and their cross-frequency coupling in the spatiotemporal coordination across brain regions, supporting sleep-dependent memory consolidation13. During nREM sleep, oscillatory activities including cortical slow oscillations (SO; 0.16-4 Hz, Figure 1A), thalamocortical sleep spindles (SP; 8-16 Hz), and hippocampal sharp-wave ripples (SWR; 80-300 Hz), along with their long-distance coordination, are widely believed to be closely associated with the process of transferring temporary encoded memory traces to cortical networks for consolidation46.

Measurement of phase-amplitude coupling (PAC) in slow oscillation and spindle events

Notes. (A) The origin of neural oscillations during sleep. SO slow oscillation; SP sleep spindle; SWR sharp wave ripple. In each subgraph, the vertical line indicates the typical amplitude of that sleep wave, while the horizontal line indicates the typical frequency and duration. Note that SPs also propagate along the cortex, and the figure only displays the origin. (B) Electrophysiology representation diagram of the SO-SP coupling. SO and SP amplitudes are normalized. The phase of SOs when SPs are at their maximum amplitude are recorded as the coupling phase. The occurrence of SPs and SO-SP coupling are not necessarily continuous as shown in the diagram. (C) Coupling preferred phase and strength diagram. (Left) The phase and strength used in the circular plot are simulated data from existing dataset for visualization purposes only. At the group level, the mean circular direction shows the preferred SO phase, while the mean vector length shows the strength of the precise coupling. (Right) The phase of SO peaks is noted as 0°, while the phase of SO troughs is noted as ±180°.

Given that non-invasive recordings (e.g., EEG) cannot accurately detect SWRs in deep brain structures, electrophysiological studies in humans primarily focus on the role of SOs and SPs in memory consolidation. Consistent evidence indicates that, following intensive learning, SP density and its cooccurrence with SOs significantly increase compared to base-line nights and control groups711. Correspondingly, measures of SPs and their coupling with SOs predict over-sleep retention performance of newly acquired memories1216.

The role of sleep oscillations in memory consolidation is supported by consistent neurobiological foundations. SOs are generated and dominant in the prefrontal cortex during slow-wave sleep (SWS) and propagate anteriorly to posteriorly as traveling waves1720. The corticothalamic input of SOs during their depolarization up-state phase organizes the synchronous occurrence of SPs in the thalamic reticular nucleus. Subsequently, SPs propagate back widely to cortical areas through synchronized thalamocortical projections, resulting in EEG-measured SPs2126. The peak discharge of SPs in the cortex is associated with increased dendritic Ca2+ synchronization20,27,28. This precise association enhances synaptic plasticity, leading to long-term changes in synaptic connections between cortical neurons, a mechanism for memory consolidation2932.

Despite extensive research on the functions of these oscillations during sleep and their causal sequences, it is unclear how the spatiotemporal coordination mechanisms facilitate specific temporal sequences of the peak discharge and the associated memory consolidation. Neuronal activity exhibits various forms of cross-frequency coupling, mainly including phase-phase coupling (PPC) and phase-amplitude coupling (PAC). The coupling is considered crucial in regulating the integration of information across multiple spatial and temporal scales33,34. Studies related to memory often focus on the unidirectional PAC regarding the strong amplitude modulation of fast oscillations (FO, SPs in our case) driven by the phase modulation of SOs. The phase of SOs influences rhythmic spikes of FOs near the up-state peak or down-state trough of SOs3537. In memory networks, PAC is considered a crucial mechanism supporting neural communication and plasticity36.

The active system consolidation theory supports the subdivision of memory consolidation processes into three coordinated steps, which occur in the precise hierarchical temporal structure of SO-FO coupling involving SWRs, SPs, and SOs3840: 1) Hippocampal SWRs driven by cortical SOs facilitate the repeated replay of newly encoded memories, 2) The maximum amplitude of SWRs precisely lock with the trough phase of SPs during the depolarized up-state of SOs (SP-SWR coupling), 3) The maximum amplitude of SPs couple with the up-state of SOs (SO-SP coupling). These phase-locked coupling dynamics are considered integral to consistent communication between the hippocampus and neocortex, constituting a key mechanism for the reactivation and redistribution of temporary memory traces across cortical areas41,42.

Pharmacological and stimulation research has provided evidence for causal relationships between coupling and memory consolidation. For example, the increasing level of inhibitory neurotransmission driven by GABAergic drug zolpidem improves the phase precision and strength of SO-SP coupling, and, in turn, contribute to enhanced memory retention performance5558. Studies using calcium imaging after SP stimulation explained the significance of the precise coupling phase for synaptic plasticity. SP spike discharges extracted through electrical stimulation during SO up-states, efficiently modify excitatory neocortical synapses27. Recently, Niethard and colleagues20 observed that only SP spikes occurring around upstate peaks of SOs were accompanied by amplified calcium activity patterns to optimize synaptic plasticity. This spatial-temporal mechanism represents a critical perspective in the study of SO-SP coupling in the memory consolidation.

When quantifying cross-frequency coupling, the spike-timing-dependent transfer theory41,59 signified the importance of coupling phases and inconsistency in FO amplitudes. Coupling phase has been widely shown to reflect dynamic functional configurations, indicating remote communication and precise timing changes in synaptic activity among neural populations sharing cognitive functions60,61. Comparing to the depolarization state (positive half-wave) of cortical SOs which drives FOs including SPs, its inhibitory hyperpolarization state exhibits relative silence of neurons24,62. The preferred phase represents the phase of SOs at the maximum amplitude of FOs (Figure 1B, C). Another common method is the peri-event time histogram (PETH), representing the proportion of FOs centered (i.e., maximum trough) at different SO phase bins51,63.

Phase alone does not reveal the inconsistent strength of FO amplitude between coupled and non-coupled SO phases within each time window. Therefore, the analysis of phase-amplitude distribution constitutes the so-called coupling strength, also defined as coupling inconsistency. In memory research, two commonly used quantification methods for coupling strength include mean vector length35 (MVL) and modulation index64 (MI). Both determine the strength of PAC by measuring the uneven distribution of FO amplitude in different SO phase directions. Past simulation studies have proved the effectiveness of these two methods in extracting coupling strength under different conditions and the existence of noise6567.

Compared to the coupling phase and strength, which reflect the contrast on the phase-amplitude level, recent research also focuses on the overall occurrence of coupling events across the night. This involves assessing the frequency of co-occurrence, including the coupling percentage45,47,48,51 (i.e., co-occurrence rate, % coupling events / oscillation events), coupling counts57, and the coupling density68. It is reasonable to categorize all measures related to co-occurrence under a category defined as ”coupling frequency”, in contrast to the phase and strength (see Figure 2).

Hierarchical diagram of coupling measures

Notes. PETH peri-event time histogram; MVL mean vector length; MI modulation index. SPcSO Percentage of SPs coupled with SOs in all SP events. In contrast to the term “frequency” used throughout the text in reference to the neural oscillation, the coupling frequency in the diagram indicates the frequency of occurrence of SO-SP coupling events.

In SO-SP coupling studies, although most of them supported the importance of coupling phase in predicting memory retention, this association varies in magnitude across memory types, sleep stages, and age4351,53,54. In contrast, the significance of coupling strength and percentage on the memory consolidation is inconsistent across studies5254,82.

Furthermore, previous meta-analyses of SP function consistently indicate that SP amplitude (and power) is the most effective predictor of memory consolidation and cognitive abilities14,69. The measure of neural oscillation amplitude is reported based on changes compared to mean amplitude, and the absolute amplitude difference between peak and trough reflects the intensity of synchronized neural activity near that specific scalp region70. Given the enhancement of synaptic plasticity through increased calcium synchronization during SP peaks, SP amplitude may play a crucial role in predicting memory consolidation. Therefore, we also include the mean peak-to-trough amplitude and power of SPs in the meta-analysis measures (Figure 1B).

Electrophysiology evidence indicates that the association between memory consolidation and SO-SP coupling is influenced by a variety of behavioral and physiological factors under different conditions. Among the moderators that have received significant attention in recent research are memory types, development and aging, pharmacological manipulations, disorders, and sleep stages. The early dual-process hypothesis proposed a dissociation between hippocampus-dependent and non-hippocampus-dependent memories reinforced during SWS and REM periods, respectively59: The standard active consolidation theory supported the role of hippocampal-cortical nesting during nREM sleep for the consolidation of hippocampus-dependent memories, including verbal, visual, and spatial declarative memories that are episodic-related59,71,72. In contrast, some research has linked the processing of non-hippocampus-dependent memory, such as emotional and procedural memories, with REM sleep7376.

This outdated theory may have led early SO-SP coupling studies to focus primarily on the experimental design of declarative memory inferences. Recent updates to this foundational theory propose that the hippocampus plays a crucial role in the consolidation of non-hippocampus-dependent memories during nREM sleep through the reactivation of spatiotemporal contextual features7781. This has sparked discussions and research on whether hippocampal involvement in SO-SP-SWR coupling constitutes a general physiological principle in all types of memory, or at least in memories with spatiotemporal contexts. Particularly in the association between SO-SP coupling and non-hippocampal-dependent memory consolidation, recent studies have obtained conflicting results11,13,14,44,53,68,8284.

In addition, early behavioral experiments have demonstrated the sensitivity of sleep-dependent memory consolidation to age-related changes in the brain, from development to aging85,86. The most significant changes of sleep neural oscillations during development include the maturation and dominance of frontal-originated global SOs and frontal detected SPs8789. This change is associated with the improvement of memory consolidation from childhood to adolescence. In older adults, the time spent in SWS sharply decreases, followed by a reduction in the number and amplitude of SOs90,91. Their prefrontal SP events decrease by over 40% compared to young adults and are correlated with their weakened memory performance92,93.

Taking a step further, Helfrich et al50. and Hahn et al47. have respectively identified the mediating roles of aging and development in the frontal SO-SP coupling, including changes in the precision of coupling phase, shifts in coupling topography, as well as improvements and impairments in memory retention performance. Without exception, all age-related studies on sleep-dependent memory consolidation emphasize the involvement of the prefrontal cortex and posterior hippocampus, which contribute to episodic memory processing and consolidation. Differences in their gray matter volumes, and changes in structural integrity during development and aging, are related to the representation of neural oscillations92,94,95.

Besides these moderators, overlooked in most studies is the region-specificity of different memory types and the frequency of SPs96. The topography and frequency distribution coupling are neglected or subject to misleading interpretations when correlating other measures. SP frequency has traditionally been defined between 12-16 Hz97. However, recent research has found that SPs of different frequencies dominate at different phases of the SO cycle and in distinct cortical areas98,99. Therefore, studies started to analyze the SO-SP coupling by splitting SPs into fast and slow subtypes11,44,100. Since some studies found that fast SPs predominate in the centro-parietal region, while slow SPs are more common in the frontal region, a significant amount of studies only extracted specific types of SPs from limited electrodes. Some studies even averaged all electrodes to estimate coupling (see more details in Section 3.2 and 3.4). This narrowed measure falls into the pitfall of considering all SPs as the regional oscillation, and overlooking the out-of-phase distribution of SOs in different cortical areas101.

Acknowledging the existence of regional SO and SP events, recent evidence supports that global SO and SPs, as traveling waves, constitute the majority of their oscillation cycles17,102,103. Traveling waves are considered to play a crucial role in the transmission of information, including memory, between specific brain areas104. Once projected onto the cortex, global SPs exhibit rotational characteristics, looping from the temporal lobe to the parietal and frontal lobe sequentially (TPF) and then travel back to the temporal lobe103,105. This results in phase gradients in the burst of SP spikes across different cortical areas, which represents the direction of asymmetric cortical propagation and might explain the phase shifts in coupling106,107. It is worth noting that in other types of oscillations, the traveling waves have been found to also follow descending frequency gradients108.

Dynamic spatiotemporal features of global SPs are believed to create necessary conditions for the synaptic plasticity102. Recently, relevant clues have emerged in research on SO-SP coupling. Hahn et al47. and Helfrich et al50. found only the phase and strength of coupling detected in frontal electrodes had sufficient predictive power for the success of memory retention. Denis et al.45 reported a phase gradient from anterior to posterior area in SO-fast SP coupling by channels. Although all this evidence indicates the necessity for investigation of the spatiotemporal specificity of SO-SP coupling, current between-study differences and the ambiguity in defining SP types (see Section 3.4) introduce uncertainty to the measure of coupling topography. To our knowledge, there is no previous study that has systematically analyzed the dynamic spatiotemporal distribution of SO-SP coupling and their association with the frequency of oscillations and memory consolidation.

In summary, current research on the SO-SP coupling and memory consolidation faces multiple methodological challenges. Disadvantages of polysomnographic (PSG) studies include long recording times, limitations on subject recruitment, and variations in signal processing approaches. Researchers have the flexibility to choose analytical methods and selectively report results that favor their hypotheses110,111, thereby increasing the risk of false positives and lack of reproducibility. Considering methodological limitations, varied analytical approaches, disparate quality of reports, and conflicting evidence, there is a clear urgency of a meta-analysis to consolidate the true effect sizes reported in studies, as well as propose standardized and targeted research and analysis methods.

To incorporate a broader range of studies without increasing between-study heterogeneity and publication bias, we requested unreported or selectively omitted effect sizes, as well as individual-level coupling data that were pre-processed but not included in correlation analysis. This approach helped us include more studies adopting a similar experimental design but using incomparable analysis methods, which is common in memory research. In addition, these data provide us with sufficient statistical power to compare spatio-temporal shifts in coupling phase and its association with memory consolidation.

The purpose of the current meta-analysis and review is to aggregate studies of SO-SP coupling to understand inconsistent results across studies. To be more specific, we used Bayesian hierarchical models to examine whether the cross-frequency coupling between SOs and SPs is associated with memory consolidation, and determine which specific measure(s) of SO-SP coupling most accurately predict memory retention performance. In addition, we conducted moderator analyses to understand if physiological and behavioral factors mediate the strength of the relationship between SO-SP coupling and memory consolidation. We propose five questions for moderators: 1) Is SO-SP coupling a general physiological mechanism for memory consolidation or targeted at specific type(s) of memory?; 2) How do development and aging affect the association between coupling and memory consolidation?; 3) Is the coupling detected under specific (functional) region(s) more indicative of the memory retention?; 4) In which frequency range(s) of SPs is coupling more likely to be linked to memory consolidation?; 5) Does the stage and bout of sleep affect the measure of coupling-memory association? Besides the main study, we also analyzed the preferred distribution of coupling phases under different cortical areas and SP frequencies. In the rest of the discussion, we focus on summarizing conflicts and misinterpretations in current research, proposing strategies for research standardization, as well as discussing the implications of our results for future research focus and clinical applications.

2 Results

2.1 Study characteristics

In the final dataset of 23 studies included in the data analysis, the participants had an average age of 24.8 years, ranging from 7.3 to 78.0. The average female representation among participants was 49.5%. The average sample size for each study was 31.7 participants, ranging from 10 to 151. Each study contributed 4 effect sizes for each coupling measure on average. The included studies consist of 17 overnight and 6 nap studies, in which a total of 19 tasks for declarative memory and 6 tasks for procedural memory were measured. In addition, we assessed the risk of bias for each study before conducting data analysis (see Figure 3).

Risk of bias assessment summary plot adapted from ROBINS-I109

Notes. The most significant heterogeneity is revealed in the measurement of outcome, while the overall assessment indicated a moderate risk of bias across studies after requesting unreported results and data transformation. Based on the complexity of the type of measures involved in phase amplitude coupling analysis, we believe that this degree of risk of bias is acceptable. Specific evaluations for each study were reported in the Supplemental Material 3.

2.2 Coupling phase

We first assessed the association between SO-SP coupling preferred phase and memory retention following sleep. 23 studies (k = 90) were included in the Bayesian hierarchical model. We transformed the circular linear correlation to standardized coefficients (see Section 5.5.2). Forest and regression plots for overall and moderation models are reported in Figure 4. In addition, the results of hypothesis tests for the overall and moderator models are reported in Table 1 using the Bayes factor and posterior probability. Consistent with the funnel plot (Figure S11A), neither Egger regression (p = 0.59) nor rank correlation test (p = 0.52) found the existence of publication bias.

Forest and regression plots for the association between SO-SP coupling phase and memory retention.

Notes. (A) Overall model forest plot at study-level. Dashed lines indicate the 95% credible interval (CrI) of the pooled effect size. The black point and error bar for each study shows the adjusted estimation of effect size and 95% CrI combining data and prior information. The gray dots under each distribution show raw effect sizes of each study. Effect size-level plots can be found in Figure S5.1. (B) Meta regression plot with age as moderator. Blue lines represent 200 overplotted spaghetti fit lines to visualize predictions. (C) Moderator-level forest plot. Each box represents a type of moderator. Mixed Effect sizes with mixed conditions from different factor levels listed above. Weight Stacked weight of each moderation model in the paired model performance comparison between the moderator and overall (intercept-only) model. The stacked weight of the overall model in each pair of comparisons can be calculated as 1 - weight of the moderation model.

Result of directional hypothesis tests for each pair of factor levels (conditions) in overall and each moderation model of the coupling phase-memory association

2.2.1 Overall model

The analysis of a random effect model on the overall phase-memory association revealed that, without considering moderating factors, very strong and consistent evidence supports a small-sized association between the preferred coupling phase and memory consolidation across studies, Zrpooled = 0.07 [0.01, 0.13], BF10 = 58.35, probability = 0.98. This implies that the likelihood of H1 is over 58 times greater than the H0, covering approximately 98% of posterior samples.

Multilevel model analysis indicates a heterogeneity similar to typical correlational meta-analyses112 (Mg = 0.13), including a between-study heterogeneity of g = 0.07 [0.00, 0.16], as well as a within-study heterogeneity of g = 0.04 [0.00, 0.11]. Focal analysis showed no difference from the original overall model, with a effect size Zrpooled = 0.07 [-0.06, 0.19]. Sensitivity analysis regarding prior robustness also revealed similar results (see Supplemental Material 9).

2.2.2 Moderator and sensitivity models

Sufficient evidence shows that moderators including memory tasks, age, spindle types, and PSG channels provide additional predictive power with strong favor for each hypothesis (all contain BF10≤0.1 or ≥10). Figure 4C indicates that most mixed conditions have a considerably higher uncertainty compared to other normal conditions.

Memory Task

Contrary to the assumption, there is no evidence to support a difference of phase-memory association between and motor memory (k = 36) and verbal memory retention (k = 35, Zr = Δ0.01, BF10 = 1.30, probability = 0.57), as well as between motor and emotional memory retention (k = 12, Zr = Δ0.01, BF10 = 1.09, probability = 0.52). However, strong evidence supports that spatial tasks have a considerably lower phase-memory association compared to motor tasks (k = 7, Zr = Δ0.19, BF10 = 0.07, probability = 0.06). The task model has a weight of 0.29, which indicates that the moderate effect of memory task is relatively weak and the task model performed poor than the overall model.

Participant Age

As shown in Figure 4B, extremely strong evidence supports that with the increase of age, the slope of correlation exhibits a strongly decreasing trend (Zr β = Δ -0.006 [-0.010, -0.001], BF10 = 160.94, probability = 0.99). Each ten-year increase in age is associated with a decrease in effect size of 0.06, which is consistent with our hypothesis that precise SO-SP coupling becomes less predictive of memory retention with the increase of age. Accounting for the factor of age significantly increased the predictive power compared to the intercept-only model, given a stacked weight of 1.00.

Spindle Frequency

Moderation model with SP frequency range highlights a stronger association between memory retention and SO-fast SP coupling phase (k = 43) rather than slow SPs (k = 28, Zr = Δ − 0.07, BF10 = 11.39, probability = 0.92), consistent with the hypothesis based on studies comparing the role of different SP types on the memory retention43,98. The SP model performed as well as the overall model, weight = 0.48.

PSG Channel

The highest pooled correlation was observed in the frontal electrode clusters (k = 29), compared to the central (k = 36, Zr = Δ− 0.08, BF10 = 13.58, probability = 0.93) and posterior channels (k = 22, Zr = Δ− 0.07, BF10 = 6.13, probability = 0.86), which represents a moderate-to-strong moderation effect. The frontal area has the largest phase-memory association, Zr = 0.12 [0.03, 0.21]. Effect sizes in posterior areas does not differ from central areas (Zr = Δ− 0.01, BF10 = 0.86, probability = 0.46).

Sleep Stage and Bout

The sleep stage appeared to be a weak predictor with a weight of 0.00, in which the nREM2 stage (k = 18) could not predict a higher phase-memory association than SWS (k = 30, Zr = Δ− 0.03, BF10 = 1.74, probability = 0.63). The sleep bout also revealed no difference in phase-memory association between overnight sleep (k = 73) and nap condition (k = 17, Zr = Δ− 0.03, BF10 = 2.23, probability = 0.69), with no predictive role (weight = 0.00) compared to the overall model.

2.2.3 Spatio-temporal analysis of coupling phase

By aggregating individual-level data across studies, we found that phase of SO-fast SP coupling has a significant quadratic association with memory retention in frontal regions (see Figure 5A), r = 0.26, Zr = 0.20, p < 0.01. However, this association is not significant in either central (r = 0.11, Zr = 0.05, p = 0.22) or posterior regions (r = 0.08, Zr = 0.01, p = 0.47).

Preferred slow oscillation-fast spindle coupling phase and its association with the memory retention

Notes. (A) Quadratic regression of the phase-memory association under different regions of PSG channels aggregated from studies included in the meta-analysis. 0° peak of SO upstate; ±π° trough of SO downstate; r circular-linear correlation coefficient; Zr standardized circular-linear correlation coefficient. Bars represent the mean memory retention scores per π/4 radian (45°). Dashed vertical line represents the mean preferred phase across studies. Colored quadratic fit line represents the direction of the relationship. The direction of their relationship gradually flips as the PSG channel moves from the front to the posterior area. Only under the frontal and central channels, fit lines display a quadratic relationship, with a peak of memory score near the up-state peak of SOs. In posterior channels, in contrast, the relationship is convex although it is not significant. Non-significant quadratic regression between SO-slow SP coupling and memory are reported in Supplemental Material 13. (B) Posterior distributions of mean preferred phases from the Bayesian circular mixed-effect model. The circular posterior distribution has been reported in the top-right corner, and the area between two black lines has been projected in a linear scale in the main graph. The vertical line reflects the up-state peak of slow oscillation. Dots and error bars reflect the mean and 95% credible intervals of phases detected from each channel cluster. Phase values have been reported as degrees. (C) Circular plot of the preferred coupling phase. Arranged from top to bottom in the order of frontal, central, and posterior. The direction of each colored dot represents the preferred coupling phase of each subject recorded from PSG channels in each cluster. The direction of the mean resultant vector indicates the mean preferred coupling phase across subjects, the width indicates the 95% credible interval of the mean coupling phase, while the length from 0 (center) to 1 (circumference) indicates the consistency of coupling phase across subjects.

Another important finding is a considerable shift of the preferred coupling phase from frontal to posterior regions, similar to Kurz et al51. After taking into account the repeated measurement, the frontal area has a preferred phase around 7.68° [0.18, 15.20], which is the closest to the peak of SO (0°). The coupling consistently happened earlier when moving towards further dorsally, reflected by the phase in central (−15.40° [-20.99, -10.16]) and posterior area (−23.46° [-31.07, -15.67]).

Extremely strong evidence supports a considerable difference of the preferred coupling phase between frontal and central areas, Δ− 23.08°, BF10 = + ∞, probability = 1.00; between frontal and posterior areas, Δ− 31.14°, BF10 = + ∞, probability = 1.00; and between central and posterior areas, Δ− 8.06°, BF10 = 34.4, probability = 0.97. Differences can be observed from the shift of posterior distribution in Figure 5B and location of data clusters in Figure 5C. Moreover, the consistency of coupling phase across subjects also decreases from frontal (z = 0.69, p < 0.05, Rayleigh test) to posterior areas (z = 0.59, p < 0.05, Rayleigh test). The differences of coupling phase can explain part of the discrepancy of the timing of coupling occurrence among previous studies43,47,63,113

In summary, in frontal regions we found a latest coupled but most precise and strong SO-fast SP coupling around the peak of SOs. We also found the strongest phase-memory association in frontal regions following a typical quadratic relationship.

2.3 Spindle amplitude

We next assessed the association between the amplitude/power of SP and memory retention during the learning night. 18 original studies (k = 78) were included in the Bayesian hierarchical model. The mean amplitude of fast SPs (in µv) is 33.99 [26.44, 41.55], while the mean amplitude of slow SPs is 39.41 [33.61, 45.22]. Forest and regression plots for overall and moderation models are reported in Figure 6. In addition, the results of hypothesis tests for the overall and moderator models are reported in Table 2. The publication bias of the amplitude-memory association studies is unclear but potentially provides evidence of asymmetry. Egger’s regression test (p = 0.02), rank correlation test (p = 0.11), and funnel plot (Figure S11C) did not demonstrate significant publication bias.

Forest and regression plots for the association between SP amplitude and memory retention.

Notes. (A) Overall model forest plot at study-level. The solid vertical line represents the mean Pearson correlation coefficient under the null hypothesis. Dashed lines indicate the 95% credible interval (CrI) of the pooled effect size. The black point and error bar for each study shows the adjusted estimation of effect size and 95% CrI combining data and prior information. The gray dots under each distribution show raw effect sizes of each study. Effect size-level plots can be found in Figure S5.2. (B) Meta regression plot with age as moderator. Blue lines represent 200 overplotted spaghetti fit lines to visualize predictions. (C) Moderator-level forest plot. Each box represents a type of moderator. Mixed Effect sizes with mixed conditions from different factor levels listed above. Weight Stacked weight of each moderation model in the paired model performance comparison between the moderator and overall (intercept-only) model. The stacked weight of the overall model in each pair of comparisons can be calculated as 1 - weight of the moderation model.

Result of directional hypothesis tests for each pair of factor levels (conditions) in overall and each moderation model of the SP amplitude-memory association

2.3.1 Overall model

We observed that without accounting for moderation effects, there was moderate evidence supporting a positive association between SP peak-to-trough amplitude and memory consolidation, rpooled = 0.07 [-0.04, 0.18], BF10 = 8.28, probability = 0.89. The likelihood of our hypothesis being true is about 8 times greater than the null, covering approximately 89% of posterior samples. However, the pooled posterior distribution spans a wide range due to a large standard error, 95% credible intervals of most studies and the pooled effect size included 0. Three-level analysis has found a relatively high between-study heterogeneity of g = 0.20 [0.12, 0.32], and a normal level of within-study heterogeneity of g = 0.04 [0.00, 0.12]. The focal model indicated a slightly larger effect size compared to the overall model, with a pooled effect size rpooled = 0.12 [-0.03, 0.29]. Prior sensitivity analysis did not reveal different patterns of the association (see Supplemental Material 9).

2.3.2 Moderator and sensitivity models

Although evidence shows that memory type, spindle type, and PSG channel location modulates the magnitude of correlation between SP amplitude and memory (BF10≤ 0.1 or≥10), it is worth noting that all moderation models did not provide enough additional information to increase the performance of model prediction relative to the overall model (all weight ≤ 0.31).

Memory Task

The result of memory task moderation model shows that overnight retention of declarative tasks, including verbal tasks (k = 34, r = Δ 0.21, BF10 = 22.19, probability = 0.96); emotional tasks (k = 12, r = Δ 0.04, BF10 = 1.50, probability = 0.60); and spatial tasks (k = 14, r = Δ 0.19, BF10 = 9.30, probability = 0.90), have a higher association with the SP amplitude compared to the motor memory retention (k = 36). It is worth noting that two types of hippocampus-dependent memory, including verbal and spatial memory, both have larger estimates compared to the non-hippocampus-dependent memory. However, only the comparison between verbal and motor tasks provided strong favor for our hypothesis. Strong evidence supports a positive association between the SP amplitude and verbal declarative memory retention, r = 0.15 [0.01, 0.29].

Spindle Frequency

The model results using the SP type as the moderator indicate a stronger positive relationship between fast SP amplitude (k = 36) and memory retention compared to slow SP amplitude (k = 20, r = Δ -0.09, BF10 = 12.21, probability = 0.92). A higher fast SP amplitude tends to be associated with a higher memory retention ability, r = 0.1 [-0.04, 0.23].

PSG Channel

Strong evidence supports differences of pooled correlation between the frontal (k = 29) and posterior region (k = 16, r = Δ -0.13, BF10 = 24.67, probability = 0.96), as well as between the central (k = 30) and posterior region (r = Δ -0.11, BF10 = 13.65, probability = 0.93), which is consistent with our assumptions. Frontal regions have the largest positive amplitude-memory association, r = 0.11 [-0.02, 0.25]. In contrast, we did not observe a notable difference in amplitude-memory association between frontal and central regions (r = Δ -0.02, BF10 = 1.88, probability = 0.65).

Other Moderators

There is a moderate evidence supporting a difference in the amplitude-memory association between the N2 and SWS stage (r = Δ− 0.12, BF10 = 5.37, probability = 0.84), but no reliable difference between overnight and nap conditions (r = Δ -0.07, BF10 = 2.54, probability = 0.72). Also, the effect of age on the prediction of amplitude-memory association is highly limited (rβ = Δ -0.002 (−0.007, 0.003), BF10 = 3.70, probability = 0.79).

2.4 Coupling strength

Next, we measured the association between coupling strength and memory retention. 21 original studies (k = 86) were included in the Bayesian hierarchical model. The mean SO-fast SP coupling strength measured by mean vector length is 0.33 [0.27, 0.39], while the mean SO-slow SP coupling strength is 0.23 [0.19, 0.27]. Forest and regression plots for overall and moderation models are reported in Figure 7. In addition, the results of hypothesis tests for the overall and moderation models are reported in Table 3. Neither Egger regression (p = 0.53) nor rank correlation test (p = 0.67) showed evidence of potential publication biases.

Forest and regression plots for the association between coupling strength and memory retention.

Notes. (A) Overall model forest plot at study-level. The solid vertical line represents the mean Pearson correlation coefficient under the null hypothesis. Dashed lines indicate the 95% credible interval (CrI) of the pooled effect size. The black point and error bar for each study shows the adjusted estimation of effect size and 95% CrI combining data and prior information. The gray dots under each distribution show raw effect sizes of each study. Effect size-level plots can be found in Figure S5.3. (B) Meta regression plot with age as moderator. Blue lines represent 200 overplotted spaghetti fit lines to visualize predictions. (C) Moderator-level forest plot. Each box represents a type of moderator. Mixed Effect sizes with mixed conditions from different factor levels listed above. Weight Stacked weight of each moderation model in the paired model performance comparison between the moderator and overall (intercept-only) model. The stacked weight of the overall model in each pair of comparisons can be calculated as 1 - weight of the moderation model.

Result of directional hypothesis tests for each pair of factor levels (conditions) in overall and each moderation model of the coupling strength-memory association

2.4.1 Overall Model

Consistent with results of the overall phase-memory association, extremely strong evidence supports a positive coupling strength-memory association in the intercept-only model (rpooled = 0.08 [0.02, 0.15], BF10 = 111.04, probability = 0.99), which strongly favors our hypothesis. The likelihood of data under our hypothesis is over 110 times greater than the negative direction, covering above 99% of posterior samples.

Similar to typical correlational meta-analyses, the overall model consists of a between-study heterogeneity of g = 0.08 [0.01, 0.17], and a within-study heterogeneity of g = 0.05 [0.00, 0.14]. The result of focal analysis showed no difference from the overall model but has a wider credible interval due to a smaller number of effect sizes, rpooled = 0.09 [-0.02, 0.20]. Sensitivity analysis regarding prior robustness revealed consistent results with the overall model (see Supplemental Material 9).

2.4.2 Moderator and sensitivity model

Surprisingly, there is no enough evidence supporting a difference between each pair of factor levels for almost all moderators (all 0.1 < BF10 < 10, except emotional versus motor tasks), and there was no single moderator model that had better a performance than the intercept-only model (all weight < 0.5). This result represents a relatively consistent strength-memory association, regardless of the impact of moderators.

Memory Task

With the exception of the emotional task condition (k = 12) which predicts a larger strength-memory association than the motor task condition (k = 32, r = Δ -0.20, BF10 = 10.62, probability = 0.91), the type of memory task only had a weak impact on the association (all other 0.33 < BF10 < 3). In addition, given the emotional task condition has a relatively small number of effect sizes, the evidence of group differences also need to be interpreted with caution.

Age Model

The strongest predictive power among moderation models has been found with participant’s age, and performs similar in the trade-off between complexity and power with the intercept-only model (weight = 0.44). As the age increases, the slope of the correlation exhibits a decreasing trend that is nearly in strong favor of our hypothesis (rβ = Δ 0.003 (−0.007, 0.001), BF10 = 9.97, probability = 0.91). The strength-memory association becomes weaker with the increase of age.

Other Moderators

There was no strong evidence of a difference of strength-memory association between the fast (k = 41) and slow SPs (k = 26, r = Δ -0.02, BF10 = 1.91, probability = 0.66). Also, there is no evidence for a group difference among any pair of conditions of PSG channels, sleep stages, or sleep bout type (all 0.33 < BF10 < 3). However, consistent with the results for the coupling phase and SP amplitude, the largest positive strength-memory association moderated by PSG channels was still found in frontal channels (r = 0.10 [0.00, 0.19]).

2.5 Coupling percentage

The last part of our analysis focused on the association between the percentage of SPs coupled with SOs, and memory consolidation. 11 studies (k = 43) were included in the Bayesian hierarchical model. The mean SO-fast SP coupling percentage (%) is 21.17 [15.95, 26.39], while the SO-slow SP coupling percentage (%) is 24.07 [17.59, 30.56]. Forest and regression plots for overall and moderator models are reported in Figure 8. In addition, the results of hypothesis tests for the overall and moderator models are reported in Table 4. Consistent with the funnel plot (Figure S11E), both Egger regression (p = 0.78) and rank correlation test (p = 0.87) showed no publication bias.

Forest and regression plots for the association between coupling percentage and memory retention.

Notes. (A) Overall model forest plot at study-level. The solid vertical line represents the mean Pearson correlation coefficient under the null hypothesis. Dashed lines indicate the 95% credible interval (CrI) of the pooled effect size. The black point and error bar for each study shows the adjusted estimation of effect size and 95% CrI combining data and prior information. The gray dots under each distribution show raw effect sizes of each study. Effect size-level plots can be found in Figure S5.4. (B) Meta regression plot with age as moderator. Blue lines represent 200 overplotted spaghetti fit lines to visualize predictions. (C) Moderator-level forest plot. Each box represents a type of moderator. Mixed Effect sizes with mixed conditions from different factor levels listed above. Weight Stacked weight of each moderation model in the paired model performance comparison between the moderator and overall (intercept-only) model. The stacked weight of the overall model in each pair of comparisons can be calculated as 1 - weight of the moderation model.

Result of directional hypothesis tests for each pair of factor levels (conditions) in overall and each moderation model of the coupling percentage-memory association

2.5.1 Overall Model

Compared to all other coupling measures, we observed a weak association between coupling percentage and memory consolidation (rpooled = -0.03 [-0.15, 0.07], BF10 = 0.38, probability = 0.28). The overall model consists of a moderate between-study heterogeneity of g = 0.11 [0.01, 0.25], and a within-study heterogeneity of g = 0.05 [0.00, 0.14]. 95% credible intervals of all studies and the pooled effect size include 0. Due to the limited number of effect sizes, we did not perform a focal analysis. Prior sensitivity analysis revealed consistent results with the overall model (see Supplemental Material 9).

2.5.2 Moderator and Sensitivity Models

Few moderators had an influential impact on the percentage-memory association (All 0.1 < BF10 < 10, except verbal versus motor tasks, all weight < 0.40), which is not surprising given the weak association in the overall model. For memory types, we observed a moderate trend of declarative memory retention towards a negative association with coupling percentage, in contrast to the positive trend of percentage-motor memory association. Therefore, the association measured by motor memory tasks (k = 13) is considerably higher than verbal tasks, k = 13, r = Δ -0.25, BF10 = 0.05, probability = 0.05. All other moderators did not trend toward a specific direction and all 95% credible intervals include 0.

3 Discussion

As the first meta-analysis focusing on the coupling between slow oscillation and spindle events, our results, combining 297 effect sizes, provide reliable evidence for the involvement of thalamocortical SO-SP coupling in memory consolidation. In particular, the precision and strength of coupling, represented by the preferred phase and coupling strength, showed significant effect sizes in their correlation with memory retention performance. Moderators including age, memory type, cortical area, and SP frequency mediated the magnitude of these associations. The conclusive results are summarized in Table 5.

Descriptive table of the meta-analysis result of measures of SO-SP coupling characteristics

3.1 Precision and strength of SO-SP coupling as strong predictors of memory consolidation

In previous meta-analyses focusing on spindle events, the amplitude and power of SPs have been considered the most predictive measures for memory consolidation14 and cognitive abilities69. However, our findings indicate that, the SP amplitude-memory association is subject to high variabilities. Only the association between fast SP amplitude and hippocampal dependent memory consolidation is supported by strong evidence. In contrast, the precision and strength of coupling between the fast SP peak amplitude and the up-state peak of SOs, is more crucial indicators for predicting memory retention performance. Our result confirmed the importance of cross-frequency coupling in the hierarchical temporal nesting within the hippocampus-thalamus-cortex information transmission loop, proposed in the active system consolidation theory3,6,38,114, and further emphasized the role of coupling compared to isolated oscillation events in memory consolidation47,50,57,63,115.

Moreover, we found that the phase and strength of coupling between fast SPs and SOs, retains strong predictive ability for memory retention performance in most sub-groups of the moderator analysis. This result confirms the robustness of the connection between coupled SO-fast SP phase and memory consolidation as a general physiological mechanism. Similar results were not observed with slow SPs. Since we did not find publication bias in both analyses regarding coupling phase and strength, it reduces the likelihood of overestimating true effect sizes. Given we have identified predictive powers for memory retention in both coupling phase and strength, and Weiner and colleagues54 reported a synergistic interaction between phase and strength, future studies are necessary to further investigate the relationship between these phase-amplitude level measures.

3.2 The mediation role of cortical area and oscillation frequency

Most of the studies included in the meta-analysis supported the significance of the association between the coupling phase and memory consolidation. However, conflicting conclusions have been reported regarding the direction of the phase. In terms of the preferred coupling phase we find two predominant views, with a subset of studies reporting that coupling events are concentrated before the up-state peak (−90-0°) of SOs44,51,57,100,116,117. Other authors found that the preferred phase is when fast SPs precisely coupled with the up-state peak (0°)47,48,50,53,68,136. In either case, however, most of the studies agreed that coupling occurring closer to the up-state peak of SOs could predict better memory retention performance.

In addition to differences in signal processing approaches and age leading to discrepancies in phases reported, we believe that another reason is different PSG electrodes and SP frequencies filtered to detect coupling events (see Table 6). Research found that fast SPs occur more frequently in centro-parietal regions, while slow SPs are predominately in frontal regions98,118120. This evidence led some research to apply this conclusion to coupling studies, and detecting fast SPs only from centro-parietal electrodes or slow SPs only from frontal electrodes43,57,100,117. However, we suspect that this approach restricts researchers from considering the origin and spread of oscillations. The prefrontal cortex has been proposed to be responsible for the generation of posterior-propagating global SOs6,17,23,123, while global SPs, the majority of SP events, propagate in an rotating direction following gradients102,103,105. The significance of this dynamic interaction of oscillations for spatiotemporal coordination remains poorly understood.

Interpretation of the standardized circular-linear correlation coefficient

Although SO-SP coupling can be widely detected across the frontoparietal cortex122, their interaction in the frontal lobe is proposed to be crucial for the transmission of memory information from thalamus to neocortex5,23,42,47,124. Our spatial-temporal analysis provides extremely strong evidence that the post-peak SO-fast SP coupling phase recorded from frontal regions occurs closer to the up-state peak of SOs, and considerably later than the pre-peak phase in centro-parietal regions, which supports phase shifts across electrodes found in previous studies44,45,121. Moderator models of all four coupling measures also consistently indicate that coupling detected from frontal regions has the largest association with the memory consolidation. In addition, a significant quadratic phase-memory relationship is only observed in frontal regions. To the best of our knowledge, our study is the first to provide consistent evidence supporting that successful memory consolidation is mediated by spatiotemporal specificity, emphasizing the precise coupling of frontal fast SPs targeting the up-state peak of SOs.

3.3 Further moderation through memory types, aging, and sleep conditions

The predictive ability of coupling for memory consolidation has been studied across various types of memory, including emotional, spatial, verbal, and motor tasks. Surprisingly, our results indicate that the coupling phase has a widespread predictive role for both declarative and procedural memory. In contrast, evidence only supports a correlation between coupling strength and hippocampus-dependent memory. In addition, Hahn et al.47,82 provided a region-specific view, claiming significant associations of SO-SP coupling with declarative and procedural memory exists in the frontal lobe and motor cortex, respectively. Therefore, it is worthwhile for future research to study the both shared mechanisms and region-specificity of coupling across different memory types, including those traditionally considered non-hippocampus-dependent. However, we must exercise caution that the number of effect sizes extracted for emotional and spatial tasks is limited, making it susceptible to the influence of any single study.

One disadvantage in our analysis of age as a moderator is the limited number of studies on children and older adults. The linear meta-regression as a function of age did not capture the moderating pattern of development but only reflected aging through the decreasing trend of the coupling-memory association. For older adults, the meta-regression showed that all four measures approached null hypothesis levels of non-significance, indicating a diminishing effect in predicting memory consolidation through coupling phase or strength with aging. These results support previous studies on the memory consolidation impairment in older adults50,63. While our results did not directly reflect the influence of development on the coupling-memory association, evidence from past research suggested that the strength of SO-SP coupling in frontal lobes increases with developmental age, indicating the development of long-term memory networks47. We suggest future studies to further compare SO-SP coupling across different age groups, contrasting the regional distribution of coupling changes during development and aging.

However, evidence did not support the moderation role of stage conditions, including sleep stages and bouts, in the coupling-memory association, which is somewhat surprising. SOs are predominant in SWS125, while sleep SPs are the primary oscillation feature in N2119. Our included studies showed that the co-occurrence rate of SO-SP coupling is higher during SWS compared to N2. One possible explanation, as suggested by our meta-analysis results, is that the precision and strength of coupling is more predictive of memory consolidation than the coupling percentage. Thus considering the contribution of different stages to memory consolidation through co-occurrence rates may not be meaningful. Regarding sleep bouts (naps versus overnight sleep), no difference were found in any coupling measures between overnight sleep and naps, consistent with the results of a meta-analysis focusing on spindle-memory association14,126. It implies the potential benefits of napping in memory consolidation and the clearance of hippocampal traces for storing new knowledge during the daytime.

3.4 Challenges of current statistical approaches in measuring EEG-behavior associations

One of the crucial factors limiting further interpretation of our results is the variation in statistical methods across studies, including differences in the definition of behavioral and physiological measures, the event detection and analysis methods employed, and limitations in estimating nonlinear relationships. Firstly, we agree with McConnell and colleagues127 that there has been confusion in previous studies, where the term ”slow spindle” is inconsistently used to refer to either frontal fast SPs (∼ 10-14 Hz) or low-frequency slow SPs (∼ 7-11 Hz). Our result reveals that frontal fast SPs (∼ 10-14 Hz) occurring near the up-state peak of frontal-predominant global SOs with lower frequency than the centro-parietal fast SPs (∼ 14-17 Hz), which becomes more predominant with development87,88,127,129. In contrast, slow SP occurs before the trough of the SO downstate. Our findings support a detailed categorization of SP types proposed in previous studies127130, involving separate measures between pre-peak early-fast SPs, post-peak (or peak) late-fast SPs (see Figure 5B), and pre-trough slow SPs (see Figure S13.1B). Alternatively, an effective approach could also involve extracting signals from electrodes across different cortical areas for each type of SP for comparison. In addition, the current definition of moderators is also quite vague and conflicted. Some studies conducted analyses across mixed conditions, including using a global frequency range (e.g.; 10-16 Hz) for SP detection, as well as reporting only one shared effect size for all conditions.

Moreover, we observed significant between-study differences in the measures for memory retention performance. Specifically, multiple retention measures already exist when solely considering verbal tasks for measuring declarative memory, including 1) the value difference in the number of correctly recalled words (Post-sleep – Pre-sleep number of correct words)54,57,100; 2) the value difference in the percentage of correctly recalled words (Post-sleep – Pre-sleep % of correct words)47,58,116; and 3) the ratio difference in the percentage of correctly recalled words (Post-sleep / Pre-sleep % of correct words)45,53,117. Similar situations exist in measures for other memory retention tasks. When we tested the sleep-memory association using these three formulas separately, we found that in some extreme cases, different measures could even alter the direction of the association. Thus, we strongly suggest the consistency in memory retention measures in future studies131.

In addition, misinterpretation of phase-memory associations in some studies poses a threat to the validity of results. Most studies use circular-linear correlation to measure this association, but we observed a prevalence of exaggeration when explaining the effect size. Circular-linear correlations lack directionality132, making it challenging to precisely estimate the improvement or decay of memory consolidation at specific SO phases. It is likely to be influenced by fluctuations in memory scores in any segment of SO phases, so associating it with hypotheses targetting specific phases might lead to incorrect conclusions. An effective solution is to visualize the regression and superimpose quadratic fit lines to assess whether it follows a typical quadratic relationship around the global maximum or minimum. In addition, through simulation studies (see Supplemental Material 12), we observed that circular-linear correlation coefficients exhibit weak robustness and severe deviation from normal distribution for small to medium sample sizes (n < 100), commonly encountered limitations in PSG studies.

To suppress the bias introduced by non-linear relationships, we believe there are two available solutions: 1) Hahn et al.,47 calculated the absolute distance of the preferred phase of each participant from the upstate peak (0°), while Kruz52 and Weiner54 applied this method to transform the phase-memory association into a linear relationship for subsequent testing. Our results provided a solid support for using SO up-state peaks as the center of transformation. 2) We developed a method to standardize the circular-linear correlation coefficient by transforming the sampling distribution of correlation to be normally distributed and centered at 0 under null, which takes into account of the sample size to eliminate the overestimation of effect sizes, and becomes comparable with Pearson’s correlation to enhance the comparability across studies (see Section 5.5.2).

In reporting results, despite the tendency to introduce multiple comparison issues due to the feature of multiple time points and electrodes during PSG recording133, less than half of the included studies conducted corrections for multiple comparisons, such as cluster analysis. Furthermore, we found that the majority of studies tend to selectively report results only for significant electrodes, memory types, or coupling measures. These practices increase the likelihood of false positives and overes-timation of the effect size134. While only 2 out of the 23 studies included in our analysis disclosed all processed data and analysis code in their publication, we appreciate the responses of almost all authors who provided valid data or clarifications upon email requests that helped us mitigate heterogeneity and publication bias in subsequent analyses.

3.5 Limitations and future research directions

The between-study discrepancy in measuring cross-frequency coupling presents a potential challenge to the comprehensive inclusion of studies. Due to methodological disparities and the limited prevalence of PETH studies, our current analysis only included studies using the preferred phase to measure the precision of coupling. Additionally, due to the expectation of high heterogeneity, we excluded gray literature without peer-review and memory measures with low comparability to other studies, such as the targeted memory reactivation135. While the inclusion of a restricted number of published studies may introduce a potential publication bias, we effectively addressed this concern through data requests, focal and sensitivity analysis, and meta-regression, finally constraining the overall risk of bias of most studies to a small to moderate level.

One of the advantages of our study is attributable to a substantial number of effect sizes and sample sizes. However, only a few studies reached a sample size of 30-50. In addition, our analysis focused on studies with healthy populations. Future research should consider adopting more homogeneous analysis methods and openly sharing processed sleep and memory data, thereby aggregating sufficiently large and representative sample sizes in larger-scale studies. Moreover, our findings of no differences in the coupling-memory association between naps and overnight sleep suggest that naps paradigms may be efficient and thereby allow for a larger number of subjects.

Besides our suggestion for the classification of memory types and SP frequency, our results offer insights in studying causal sequences through more precise detection and stimulation of neural oscillations, and offer insights for enhancing memory consolidation under aging or pathological conditions. We believe tracing dynamic distributions of SO-SP coupling and considering memory type and age-related regional changes would be intriguing, and might be crucial for the understanding of the selective mediation of the spike-timing-dependent synaptic plasticity. High-density PSG has been instrumental in enhancing spatial sampling density for localizing SOs, subcortical areas, and even analyzing intra- and inter-regional PAC17,137143. Moreover, integrating techniques with higher spatial resolutions, such as MRI and iEEG, will enhance the interpretative capacity of PSG results.

Finally, efforts must be made in future sleep research to-wards open science. To the best of our knowledge, as the first meta-analysis conducted on cross-frequency couplings, we have made our analysis code and data publicly available. Our posterior parameters predicted for each coupling measure can serve as priors in future meta-analyses, allowing for the integration of new data to update the model. We also encourage future original studies to share individual-level data and code. These efforts will contribute to enhancing reproducibility and comprehensive inclusion of results in future meta-analyses.

4 Conclusion

This meta-analysis revealed the crucial role of the precise and strong SO-fast SP cross-frequency coupling in promoting memory consolidation with data demonstrating sufficient statistical power for the first time. By aggregating effect sizes of associations between four commonly used coupling measures and memory retention performance, we observed a very strong evidence that the precision and strength of coupling can both predicts enhanced memory retention scores across almost all conditions as a general physiological mechanism, while fast SP amplitudes detected in the frontal lobe also showed associations with memory consolidation. Although all these effect sizes of associations are relatively small, it provides important information regarding complex memory consolidation mechanisms, and our moderator analysis uncovered the crucial mediating roles of age, cortical region, spindle frequency, and memory type in the coupling-memory associations.

This evidence provides insights for future research on how the contribution of coupling to memory consolidation is mediated by behavioral and physiological factors, as well as meaningful information for the simultaneous implementation of MRI and electrical stimulation in the SO-SP coupling analysis. We believe that SO-SP coupling can offer more precise predictions for memory consolidation within the context of spatiotemporal specificity. This meta-analysis emphasizes the need for standardization and replication studies and points to those measures to focus on for future research in the field.

5 Method

5.1 Literature search

The retrieval and screening of relevant studies were conducted in accordance with the 2020 PRISMA statement144. A comprehensive literature search was performed in three databases based on the retrieval qualities evaluation145,146: PubMed, Web of Science, and PsycINFO, covering the period up to July 1, 2023. Only studies which have undergone peer review and have been published were included in the meta-analysis. Boolean operators were utilized to combine the following search terms: (sleep OR nap) AND (slow oscillat* OR slow wave OR sleep oscillat* OR slow-wave OR SO) AND (spindle OR sigma OR SP) AND (coupl* OR pair* OR lock* OR coordinat* OR interact* OR synchro*) AND (motor learning OR memory OR cogniti*). After filtering the papers based on inclusion criteria, a citation search was conducted and all papers identified from article citations underwent manual screening.

5.2 Inclusion criteria

We applied the following criteria147 to identify articles eligible for the meta-analysis as of July 1, 2023. Studies were included if the study 1) had memory encoding task before and recall measures after a sleep interval; 2) was published in English; 3) measured at least one of the standard SO-SP coupling measures (coupling phase, strength, percentage, and/or SP amplitude) by phase-amplitude coupling (PAC) or comparable methods; 4) assessed the correlation between memory retention and SO-SP coupling during N2 and/or SWS stage(s); 5) was published as an original research paper in a peer-reviewed journal. Studies were excluded if they 1) only measured SO-SP coupling influenced by the impact of medication or interventions (e.g. brain stimulation method) with no control condition; 2) only included non-human subjects, patient groups, or clinical groups (no control healthy group); 3) was a group under the wake condition. If a study included both groups that meet the inclusion and exclusion criteria and those that do not, only data from the group(s) that meet the criteria (e.g., control condition, control group) were included in the following analysis.

5.3 Study selection

For the study selection, data retrieved from the three databases listed above were imported into EndNote 21 for automatic and manual duplication removal. The screen process was reported in Figure 9. After preliminary screening for titles, abstracts and article types, 105 original research papers were screened by independent full-text review. Reviewers discussed all papers with discrepancies regarding their inclusion and ultimately achieved consistency. Finally, we included 23 eligible studies into the meta-analysis, comprising a total of 297 effect sizes from four types of coupling measurements with 730 samples.

PRISMA flow diagram of literature search, screening, and inclusion for systematic review and meta-analysis

5.4 Data extraction

In the studies included in the meta-analysis, relevant effect sizes and specified study characteristics were extracted by two reviewers independently in accordance with PRISMA guide-lines144 to minimize the bias introduced by subjective judgments. The consistency of data extraction between reviewers is 99.6%, and all discrepancies were resolved through discussion. Four specific measures of SO-SP coupling (Figure 1) were examined for their association with memory retention performance: 1) preferred coupling phase; 2) SP peak-to-trough amplitude; 3) coupling strength; and 4) coupling percentage. After extracting effect sizes, phase-radian alignment was corrected (Figure 1C). Extracted study characteristics included: 1) sample size, age, gender distribution; 2) pre-specified moderators (see Section 5.6 for more details); 3) publication details. A summary of main information for each study is out-lined in Table 6. The detailed data for each effect size can be found in Table S2.1-S2.5 and the publicly available repository: https://osf.io/9mh5d/. The overall quality of studies is assessed in accordance with the Robins-Ijuni2016risk, 109 and NIH criteria148, with adjustments (detailed in Table S3.1) made to accommodate the specific attributes of the study.

Due to relatively large heterogeneity and discrepancy in methodology and results reported between studies, as well as the non-parametric nature of the direction in circular-linear correlations, we improved comparability and credibility across studies by requesting both unreported effect sizes and processed individual-level memory and physiological data to reduce the publication bias. If a study measured relevant sleep and memory features but either: 1) did not report any extractable or convertible effect sizes; 2) reported imprecise p-values due to insignificance; or 3) reported only one effect size for cross-group data; 4) reported effect sizes that were not comparable to other studies, coupled with the absence of data in supplementary material, we requested missing data or processed individual data from corresponding authors via email. 21 out of 23 authors responded positively and provided the requested data. Studies without a response after two months or declined our requests were excluded or partially excluded from the metaanalysis for the part where insufficient information was provided for calculating effect sizes. For studies reporting correlations only through scatterplots, linear correlations were estimated using software from the ShinyDigitise package149,150 in R, while circular-linear correlations were first assessed by extracting estimated individual data from the plots by the online software WebPlotDigitizer151, followed by the circular linear correlation analysis in R. Additionally, key missing study characteristic data was requested via email at the same time to broaden the scope of moderator analysis.

5.5 Effect size calculations

To standardize effect sizes for comparability across studies, we chose to standardize the bounded circular-linear correlation coefficient r132,152 to unbounded Zr to examine the association between coupling phase (in radians) and memory consolidation. We used the Pearson product-moment correlation coefficient r153 (hereinafter referred to as Pearson’s r) and transformed fisher’s z to report the linear relationships between SP amplitude (in µV), coupling strength (mean vector length or modulation index), coupling percentage (%), with memory consolidation separately. All transformable effect size measures including t-statistic, p-value, β statistic, and η2, were extracted and converted to Pearson’s r. In cases where effect measures were reported as non-parametric correlation including Spearman’s rho (ρ, i.e.; rs) and no author response could reanalyze for parametric correlation, we used it as an imperfect estimation of Pearson’s r154 and excluded from sensitivity analyses.

5.5.1 Pearson product-moment correlation coefficient

The constrained range of values between -1 and 1 measured by Pearson’s r restricts the selection of an ideal unbounded prior distribution for Bayesian hierarchical models155 (discussed in Section 5.7). The deviation of its sampling distribution from the assumptions of normal, especially when the effect size is large, might provide inaccurate estimation for the sampling variance156. To address this limitation, we used the metafor package157 in R to transform Pearson’s r into normalized and unbounded Fisher’s z by formula156:

When reporting and interpreting the results, we reversed the transformation from Fisher’s z back to Pearson’s r to present each estimated effect size and credible interval (CrI) by formula:

5.5.2 Standardized circular-linear correlation coefficient

Meanwhile, the circular-linear correlation coefficient r (here-inafter referred to as circlin r) is a type of pearson correlation coefficient (PCC) distributed from 0 to 1 without direction, achieved by transforming the phase into sin θ and cos θ to create linear parameters132,152, to assess the relationship between a random unit vector and another linear random variable:

Since there is no existing method to standardize the circlin r, we developed a approximation approach to transform the bounded and non-linear distribution to unbounded standardized normal distribution 𝒩 (0, 1). The verification and performance can be found in Supplemental Material 12.

The population distribution of circular linear correlation ρ with n samples can be approximate to a chi-square distribution with 2 degree of freedom158, . We derived that the circular-linear correlation coefficient has the following mean and variance in an approximate form and verified in R:

We can observe that different than pearson’s r, which always has a population mean H0 : ρ = 0 under the null hypothesis, the circlin r has a population mean between 0 and 1, which tends to approach 1 and display left-skewness as the sample size n decreases, while approach 0 and exhibit right-skewness as the n increases (also see Figure S12.1). The sampling distribution of correlated circlin r also shows the same property of skewness when approaching lower and upper bounds (Figure S12.2-12.3). Therefore, it is clearly not appropriate to use raw coefficients in the meta-analysis.

To transform its sampling distribution to be unbounded, normally distributed and centered at 0 for our meta-analysis, we first scaled the circlin r as a form of n r2 to approximate the distribution158. Here the sample size has been weighted thus eliminate the bias introduced by large r under small sample sizes. We can find the quantile of circlin r in the upper tail.

Finally, by transforming non-normal chi-square deviates to standardized normal distribution and scale the distribution by the sample size, circlin r can be transformed into a standardized normal scale Zr:

The sampling distribution of transformed circlin Zr follows a approximate normal distribution when n > 15. When n < 15 it would be slightly right skewed. Its population distribution can be approximate to Pearson’s r and it is bounded between -1 (null) and 1 (alternative). The interpretation the strength of Zr is comparable with Pearson’s r (see Table 7), and now Zr can also be transformed to Fisher’s z and include in the metaanalysis. Monte Carlo simulation results and the code used for the transformation were also reported in Supplemental Material 12.

Summary of memory tasks

Interpretation of Bayes Factor (BF10) for the strength of evidence

Main characteristics for each study included in the meta analysis

5.6 Methodological characteristics and moderators

In previous studies, researchers have applied different experimental designs to investigate coupling and summarized divergent conclusions regarding each coupling parameter. To develop a more systematic understanding about the heterogeneity of approaches, six moderators were selected in the moderator analysis. Table 1 provides a complete list of moderators in each study, and categories that define moderators are listed below, the range represents the number of effect size included in the analysis across four types of coupling measures:

  1. Sleep Stage: Sleep stages were grouped by N2 (nREM2; k = 11-18) and SWS (slow-wave sleep; k = 9-30), while uncategorized sleep stages were encoded as “mixed” (k = 23-49).

  2. Sleep Phase: Nighttime sleep lasting typically more than 6 hours was encoded as “overnight” (k = 32-73), while short naps lasting around 1-2 hours during the early morning or afternoon were classified in the “nap” group (k = 11-27).

  3. Age: Due to the limited number of studies on children, adolescents and older adults, age was encoded as a continuous variable based on the mean age of each study for the meta-regression.

  4. PSG Channel: Effect size has been categorized into three clusters based on the cortical area under PSG electrodes, including frontal (F3, Fz, and F4; k = 15-36), central (C3, Cz, and C4; k = 13-29), and posterior (P3, Pz, P4, O1, and O2; k = 8-22), and mixed (k = 3-7). Other electrodes within the same area but exhibiting excessive deviation from the midline or containing insufficient information have been excluded (E.g. C1, PO8). Sleep parameters were computed by averaging SP amplitude, coupling phase, strength, and percentage across electrodes within each cluster.

  5. Spindle Type: To include a wider range of comparable studies, we specified frequency boundaries for fast SPs (12-16 Hz; k = 22-43) and slow SPs (8-12 Hz; k = 16-28). Effect sizes that transversed frequency boundaries or reported together are classified as “mixed” (k = 16-28).

  6. Memory Type: We classified comparable memory tasks into four domains presented in Table 2. Verbal memory can also be interpreted as neutral non-spatial memory, in contrast to the other two types of declarative memory.

5.7 Statistical analysis

We chose to fit overall and subgroup (moderator) models using Bayesian hierarchical random-effects and mixed-effects models, respectively, due to the common occurrence of multiple effect sizes for the same set of participants reported within the same study, violating the assumption of independence of effect sizes, as well as the limited and unequal group size and potential high heterogeneity. In comparison to frequentist models, Bayesian models incorporate prior probabilities for each parameter by considering likelihood information (i.e., the effect sizes we extracted) to establish a model for predicting the posterior probability distribution of parameters. In the posterior distribution, confidence intervals (CIs) reported by frequentist methods are replaced by credible intervals (CrI), which directly indicate the probability statements about the true parameter values, thereby providing a more precise prediction of true probabilities. Additionally, Bayesian methods could more effectively build models that account for multiple sources of heterogeneity and allow the analysis of the impact of moderator variables representing different measures and participant groups on the pooled effect size159. All statistical analyses and tests for the Bayesian models were conducted in R using the brms package160 along with supporting packages bayesplot161, metaviz162, and customized codes for visualization.

For each overall and subgroup model (Table 4), we applied the Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler to set up four chains, with each chain undergoing 12,000 iterations (including 2,000 warm-up), which is the minimum requirement for calculating accurate Bayes Factors (BF)163. The target average acceptance probability was set to 0.99 with the maximum tree depth to 15 to ensure robustness in the posterior distribution. Convergence of the MCMC was checked following the suggestion of WAMBS-Checklist164 through graphical posterior predictive checks, trace plots, and the Gelman and Rubin diagnostic. Ideally, the posterior distribution should overlap with the distribution of the test data generated, the trace plot distribution should resemble a uniformly undulating wave with high overlap between chains, and the Potential Scale Reduction Factor should be less than 1.1. Autocorrelation plots were used to detect autocorrelation between iterations. Any non-convergence at the aforementioned stages led to reconfiguration of chains and iterations for analysis. Estimation of each intercept, moderator and heterogeneity in the posterior distribution were extracted, and the estimation of mean, distribution and 95% credible interval of each effect size and the pooled effect size were reported in forest plots.

5.7.1 Overall model

We fitted random-effects models that predict the relationships between coupling phase, SP amplitude, coupling strength, and coupling percentage with memory consolidation, considering only heterogeneity and sampling errors. The three-level Bayesian model superimposed random effects for sampling error (first level), between-study heterogeneity (second level), and within-study heterogeneity (third level):

By introducing priors and likelihood information to model intercept and heterogeneity parameters, our random-effects model can be represented using the following formula:

where τ2 represents between-study heterogeneity, represents within-study heterogeneity, and represents sampling error. Since the distribution of Fisher’s z, standardized circling Zr, and heterogeneity165 all follow certain probability density functions, and non-informative priors could not provide reasonable estimates, we selected weak informative priors in the model to tradeoff between allowing collected real data to influence posterior distributions and excluding extreme outliers. For the intercept of all models, we chose a standardized normal distribution 𝒩 (0, 1) as the prior distribution.

The prior for heterogeneity used the Half-Cauchy distribution, known for its heavy-tailed property, improving it to be more tolerant of high heterogeneity166. Also, half-cauchy trun-cated at 0, which is consistent with the fact that heterogeneity cannot be less than 0. For between-study heterogeneity τ2, we obtained a mean between-study heterogeneity of µ = 0.13 from a dataset of heterogeneity reported in 498 correlational metaanalyses112 using Fisher’s z or Pearson’s r as the measure of effect size, which represent a replication of the methodology by McKinney et al.167 regarding prior selection. Additionally, due to differences in measurement methods and memory tasks across studies, we held a prior belief in some degree of between-study heterogeneity. For reasons discussed above, we set the prior using half-Cauchy distribution HC(0, 0.5) by extending the scale parameter from 0.13 to 0.5 to account for more extreme heterogeneity. Lastly, the prior for within-study heterogeneity is also set as to be HC(0, 0.5) to balance the uncertainty arising from different measurement approaches or sample groups.

5.7.2 Moderator models

For moderator analysis, we introduced continuous and categorical variables into the overall model as fixed effects, forming a following mixed-effects model:

in which X denotes random variables for moderators, k is the random variable of moderators, and n represents the number of moderators specified in the model. In the subgroup model, we chose to use the weak informative prior 𝒩 (0, 1) for each moderator as fixed effects to account for the uncertainty in the magnitude of the aggregate effects caused by moderators. In the meta-regression model accounting for the effect of age, we Chose 𝒩 (0, 0.1) to model the change over years. Except for the exploratory analysis, all statistical models used in data analysis are summarized in Table 6.

To assess directional hypotheses proposed for each moderator model, we conducted non-linear hypothesis testing within the Bayesian framework, obtaining Bayesian Factors (BF10, defined as Evidence Ratio (ER) in the package brms) for different levels within each model, which performed tests of evidence for the ratio of marginal likelihoods between two levels. In addition, we conducted hypothesis testing on the overall model compared to null. Table 7 presents the strength of evidence represented by different Bayesian Factors, which is analogous to the one-tailed t-test in the interpretation. In directional hypotheses, BF10 = 3 means the hypothesis is 3 times more likely than alternative. Additionally, we used the posterior probability, also defined as credibility score168, to report the percentage of posterior samples consistent with the direction of the hypothesis. A posterior probability of 1 indicates that all posterior sample draws align with the hypothesis.

The predictive power of models were compared in pairs by computing model weights via leave-one-out (LOO) stacking of posterior predictive distributions169, including the comparison between the overall model and a single moderator model, as well as a single moderator model and a moderator model with an additional moderator and interaction term. Models with additional predictor(s) that had higher stacked weights indicate that it enhanced the overall predictive power of the model. This approach could help identify moderators that influence the correlation between SO-SP coupling and memory retention. Model weights have also been computed via Pseudo-BMA method as the controlled analysis.

5.7.3 Sensitivity analysis

Finally, in sensitivity testing, we conducted separate assessments for 1) publication bias; 2) focal models excluding studies that introduced significant heterogeneity, and 3) the impact of priors. Regarding publication bias, although efforts were made to minimize potential controllable biases through the data request (especially for effect size computed but not reported by studies), effect size transformation, and multi-level models, differences in data analysis approaches and experimental conditions could not be fully addressed at the meta-analysis level. Therefore, we additionally used a frequentist approach by using Restricted Maximum Likelihood (REML) using the metafor package157 to fit three-level models to extract data for generating funnel plots and quantifying publication bias. Beside conducting Egger’s regression test and rank correlation test, we chose the counter enhanced funnel plot170 and superimposed the Egger’s regression line. We also performed the time-lag bias test to quantify the impact of heterogeneity and effects of the year of publication.

In the focal analysis, we excluded studies focused only on non-declarative memory tasks, those with a high risk of bias (see Supplemental Material 3) or significant methodological differences, those only tested post-sleep memory retention, or with a Pareto k diagnostic value larger than 0.7. We also removed studies that adopted modulation index as their measures in the focal model of coupling strength. In addition, the method of prior sensitivity test, as well as the posterior predictive check, were reported in the Supplemental Material 4.

5.7.4 Phase spatio-temporal analysis

The non-negative and non-linear distribution of circular linear correlation makes it impossible to determine the direction of correlation. This is one of the reasons why we have requested individual-level processed data from most of authors of included studies, in addition to requesting the effect size. By standardizing memory scores of each individual in each study using z-scores, we overlaid the data from most of comparable studies to fit and visualize the nonlinear relationship between coupling phase and memory consolidation using the best fit second-degree quadratic line for combination of each SP type and channel location. It could effectively improve the interpretation of nonlinear relationships and circular direction of the SO-SP coupling phase across studies.

For the comparison of coupling phase across PSG channel clusters, we used the Bayesian circular mixed-effects model171 with 12,000 iterations to account for repeated measurement and differences of sample sizes between channels. PSG channel location has been set as a fixed e ffect, w hile s ubject h as been taken into account as a random effect. 95% credible interval and the posterior distribution of each circular direction was reported in the circular plot to compare the timing of occurrence of SO-SP coupling across different cortical areas. Similar to the main analysis, the Bayes factor and posterior probability has been used to evaluate evidence for the circular mean difference. In addition, the inconsistency of coupling phase has been assessed by the Rayleigh test.

Additional information

Acknowledgements

We would like to express our sincere gratitude to all authors of included studies who have generously shared demographic, physiological, behavioral data, or effect sizes, or clarified experimental methods with us through email communication. Work of R.M.C.S was funded in part by NIH R01 AG040133.

Data availability

All effect size level and study level data used for analysis are publicly released in the Open Science Framework repository: https://osf.io/9mh5d/, and shared in the supplemental dataset.

Code availability

All code for data analysis, custom functions, and Bayesian models are publicly released in the Open Science Framework repository: https://osf.io/9mh5d/

Declaration of competing interest

The authors have declared that no competing interests exist.