Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorXiaoqing HuUniversity of Hong Kong, Hong Kong, China
- Senior EditorHuan LuoPeking University, Beijing, China
Reviewer #1 (Public review):
Summary:
This manuscript examines whether retrieval practice protects memory-based inference from acute stress and proposes rapid neural reactivation of a bridging memory element as the underlying mechanism. Using a two-day associative inference paradigm combined with EEG decoding, the authors report that stress impairs inference accuracy and speed, while retrieval practice eliminates these deficits and restores neural signatures associated with bridge-element reactivation. The study addresses an important and timely question by integrating research on retrieval-based learning, stress effects on memory, and neural dynamics of inference. While the work provides promising multi-level evidence linking behavioral and neural findings, limitations in experimental design, causal interpretation, and decoding specificity weaken the strength of the mechanistic claims and suggest that further work is needed to disentangle strengthened associative memory from inference-specific protection effects
Strengths:
(1) Strong theoretical integration
The study integrates three influential frameworks: memory integration through associative inference, stress-induced retrieval impairment, and the testing effect. The authors present a clear theoretical narrative linking these domains and derive testable hypotheses that retrieval practice protects inference by strengthening neural reactivation of a bridge element. The conceptual framing is well-grounded in prior literature and addresses an important gap regarding neural dynamics during inference.
(2) Multi-level evidence
The study provides converging behavioral and neural evidence. The authors demonstrate that stress reduces inference accuracy and speed, while retrieval practice eliminates these deficits. EEG decoding further suggests that bridge element reactivation predicts successful inference. The combination of behavioral performance and neural decoding strengthens the overall argument.
(3) Transparent experimental implementation
The procedures are described in substantial detail, including stimulus construction, stress manipulation, and decoding pipelines. Data and code availability are also strengths, facilitating reproducibility.
Weaknesses:
(1) Insufficient evidence that retrieval practice specifically protects inference rather than strengthening associative memories
A central claim of the manuscript is that retrieval practice specifically protects inference ability rather than simply strengthening underlying associative memories. However, the current data do not convincingly distinguish between these possibilities. Although the authors limited analyses to trials in which AB and BC pairs were correctly retrieved in the subsequent memory test, this procedure does not fully rule out the possibility that improved inference performance reflects stronger base associative memories rather than enhanced integrative processes.
Importantly, the direct memory retrieval test used a two-alternative forced-choice (2AFC) format, which inherently allows a substantial proportion of correct responses to arise from guessing. Consequently, trials classified as "successfully retrieved" may still include weak associative memory traces, making it difficult to conclude that failures in inference reflect deficits in integration rather than incomplete associative learning.
The authors further argue that retrieval practice does not improve inference in the absence of stress, suggesting independence between inference and associative memory strength. However, this null effect does not sufficiently rule out mediation through strengthened premise memory. A factorial design and/or mediation analysis would be necessary to determine whether inference resilience emerges independently of premise memory strength.
(2) Apparent below-chance inference performance raises interpretational concerns
One surprising aspect of the results is that inference performance across experiments and groups appears to fall below the theoretical chance level (0.33) in Figure 4A. This is particularly unexpected because analyses were restricted to trials in which participants correctly retrieved both AB and BC associations.
If performance is indeed below chance, this raises concerns regarding whether participants fully understood the task instructions or whether other methodological factors influenced performance. Additionally, below-chance performance complicates the interpretation of subsequent behavioral and neural analyses. It is possible that this reflects my misunderstanding of the figure; therefore, clarification from the authors regarding how inference accuracy is calculated and presented would be helpful.
(3) Between-experiment implementation of retrieval practice weakens causal inference
The retrieval practice manipulation was implemented as a separate experiment rather than as part of a factorial design. Experiment 2 was conducted after results from Experiment 1 were known, and the authors acknowledge this post hoc decision. This design introduces several potential confounds, including cohort differences between experiments, possible differences in participant motivation or task familiarity, and reduced ability to rigorously test interaction effects.
Although the authors combined data across experiments to test interactions between stress and retrieval practice, such post hoc aggregation cannot fully substitute for a factorial design. A within-experiment 2 × 2 design (Stress × Retrieval Practice) would provide substantially stronger causal evidence and reduce confounding influences.
(4) Lack of an appropriate comparison condition for retrieval practice limits the interpretation of the mechanism
Although acknowledged briefly in the discussion, the absence of an appropriate comparison condition for retrieval practice represents a critical limitation. Without a matched re-exposure or restudy control condition, it remains unclear whether observed benefits are attributable specifically to retrieval practice or to additional exposure to AB and BC associations.
Furthermore, it is unclear whether retrieval practice operates at the trial level or the participant level. Retrieval practice could enhance memory representations for specific practiced items, making those trials more resistant to stress, or it could induce a more global change in cognitive strategy or stress resilience across participants. One way to address this issue would be to analyze inference performance separately for trials that were successfully retrieved during the retrieval practice phase versus those that were not.
(5) Interpretation of EEG decoding as bridge-element reactivation may be overstated
The neural decoding results form the mechanistic foundation of the manuscript; however, the interpretation that decoding reflects reactivation of specific bridging memories may be overstated. The classifier distinguishes between face and building categories, and because the bridging element belongs to one of these categories, successful decoding may reflect category-level semantic activation rather than reinstatement of item-specific episodic representations.
Alternative explanations include category-level retrieval, strategic task differences, or even attentional biases. Because only two categories were used, the decoding analysis lacks the specificity necessary to distinguish between category-level and item-level reactivation. As such, conclusions regarding the reinstatement of specific bridging memories should be tempered or supported with additional analyses.
Reviewer #2 (Public review):
Summary:
Guo et al. investigate the neural and behavioral mechanisms of stress-induced impairments in memory-based inference. Across two well-powered experiments (N=136), the authors demonstrate that acute stress disrupts the rapid neural reactivation of "bridge" elements necessary for novel inferences. Crucially, they identify retrieval practice as a robust behavioral buffer that restores both inferential performance and the underlying neural signatures of memory reactivation.
Strengths:
(1) The use of two independent experiments provides high confidence in the behavioral findings.
(2) Utilizing time-resolved EEG decoding allows the authors to pinpoint the "online" moment of inferential failure, a significant advancement over the lower temporal resolution of fMRI.
Weaknesses:
(1) The authors correctly timed the inference task to begin approximately 20 minutes after the onset of the stressor. While this window aligns with the expected peak of the glucocorticoid (HPA) response, it also represents a period where the rapid adrenergic (SAM) response, confirmed by heart rate elevation, is still highly influential. As the authors acknowledge, because they did not collect saliva samples due to safety protocols, they cannot definitively separate the influence of peak cortisol from the tail-end of the adrenergic surge on the observed memory impairments.
(2) Figures 4 and 6: Without asterisks is really difficult to compare the significant group differences.
Appraisal and Impact:
This study provides high-quality evidence that acute stress impairs the rapid neural reactivation of "bridge" elements necessary for novel memory-based inferences. By leveraging the high temporal resolution of EEG decoding, the authors identify the specific neural "chokepoint" where inferential failure occurs. The research is strengthened by two independent experiments and the identification of retrieval practice as a powerful buffer that not only preserves but also enhances neural reactivation under pressure. The findings have significant implications for both cognitive neuroscience and applied learning science.
Reviewer #3 (Public review):
Summary:
In this study, Guo and colleagues investigated the effects of stress and retrieval practice on memory inference. In the first experiment, they found that memory inference was significantly worse after induced stress. Conversely, when participants received retrieval practice in the second experiment, they found no significant differences between these conditions. They monitored EEG during the inference phase and applied multivariate decoding analysis to examine evidence of neural reactivation. Complementing the behavioural findings of the first experiment, they found that they were able to decode the stimulus category of the inference item with more fidelity in the no stress condition. Surprisingly, they found the opposite direction when participants had retrieval practice, with stronger evidence of reactivation in the stress condition than in the control condition.
Strengths:
(1) The authors have carefully designed two studies investigating the effects of stress and memory retrieval on memory inference.
(2) The use of multivariate decoding on the inference phase data sheds new light on how stress and retrieval may impact the neural signatures of inference processing.
Weaknesses:
(1) There are some key gaps in the reporting of the data. In particular, data is missing on how many trials were included in the inference phase and how many were retrieved in the direct memory task. This is important to know as the main conclusions are based on inference trials proportional to the direct retrieval trials. Considering that the direct retrieval performance differs significantly between the experiments, there could be issues with floor/ceiling effects (in the behaviour) and statistical power (in the EEG results) that confound the comparisons between experiments. Without the data, it is difficult to draw conclusions.
(2) There are some relatively strong conclusions drawn without the data to support them. An important example is the title suggesting a mechanistic role of memory reactivation for these effects; however, the data instead suggest a relationship between successful inference and evidence of reactivation. Additionally, one-tailed t-tests have been used in follow-up tests, and, as I understand it, no multiple comparisons corrections have been applied to the post-hoc tests, suggesting that these findings should be interpreted with caution.
(3) In places, the structure is unclear, making the narrative difficult to follow, often making it necessary for the reader to go back and forth between the sections to understand the study and analyses. I have made some recommendations for how to improve this.