Retrieval practice prevents stress-induced inference impairment by restoring rapid memory reactivation

Jinpeng Guo; Ruixin Chen; Qi Zhao; Xiaojun Sun; Wei Liu

doi:10.7554/eLife.110350.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Reviewing Editor
Xiaoqing Hu
University of Hong Kong, Hong Kong, China
Senior Editor
Huan Luo
Peking University, Beijing, China

Reviewer #1 (Public review):

Summary:

This manuscript examines whether retrieval practice protects memory-based inference from acute stress and proposes rapid neural reactivation of a bridging memory element as the underlying mechanism. Using a two-day associative inference paradigm combined with EEG decoding, the authors report that stress impairs inference accuracy and speed, while retrieval practice eliminates these deficits and restores neural signatures associated with bridge-element reactivation. The study addresses an important and timely question by integrating research on retrieval-based learning, stress effects on memory, and neural dynamics of inference. While the work provides promising multi-level evidence linking behavioral and neural findings, limitations in experimental design, causal interpretation, and decoding specificity weaken the strength of the mechanistic claims and suggest that further work is needed to disentangle strengthened associative memory from inference-specific protection effects

Strengths:

(1) Strong theoretical integration
The study integrates three influential frameworks: memory integration through associative inference, stress-induced retrieval impairment, and the testing effect. The authors present a clear theoretical narrative linking these domains and derive testable hypotheses that retrieval practice protects inference by strengthening neural reactivation of a bridge element. The conceptual framing is well-grounded in prior literature and addresses an important gap regarding neural dynamics during inference.

(2) Multi-level evidence
The study provides converging behavioral and neural evidence. The authors demonstrate that stress reduces inference accuracy and speed, while retrieval practice eliminates these deficits. EEG decoding further suggests that bridge element reactivation predicts successful inference. The combination of behavioral performance and neural decoding strengthens the overall argument.

(3) Transparent experimental implementation
The procedures are described in substantial detail, including stimulus construction, stress manipulation, and decoding pipelines. Data and code availability are also strengths, facilitating reproducibility.

Weaknesses:

(1) Insufficient evidence that retrieval practice specifically protects inference rather than strengthening associative memories

A central claim of the manuscript is that retrieval practice specifically protects inference ability rather than simply strengthening underlying associative memories. However, the current data do not convincingly distinguish between these possibilities. Although the authors limited analyses to trials in which AB and BC pairs were correctly retrieved in the subsequent memory test, this procedure does not fully rule out the possibility that improved inference performance reflects stronger base associative memories rather than enhanced integrative processes.

Importantly, the direct memory retrieval test used a two-alternative forced-choice (2AFC) format, which inherently allows a substantial proportion of correct responses to arise from guessing. Consequently, trials classified as "successfully retrieved" may still include weak associative memory traces, making it difficult to conclude that failures in inference reflect deficits in integration rather than incomplete associative learning.

The authors further argue that retrieval practice does not improve inference in the absence of stress, suggesting independence between inference and associative memory strength. However, this null effect does not sufficiently rule out mediation through strengthened premise memory. A factorial design and/or mediation analysis would be necessary to determine whether inference resilience emerges independently of premise memory strength.

(2) Apparent below-chance inference performance raises interpretational concerns

One surprising aspect of the results is that inference performance across experiments and groups appears to fall below the theoretical chance level (0.33) in Figure 4A. This is particularly unexpected because analyses were restricted to trials in which participants correctly retrieved both AB and BC associations.

If performance is indeed below chance, this raises concerns regarding whether participants fully understood the task instructions or whether other methodological factors influenced performance. Additionally, below-chance performance complicates the interpretation of subsequent behavioral and neural analyses. It is possible that this reflects my misunderstanding of the figure; therefore, clarification from the authors regarding how inference accuracy is calculated and presented would be helpful.

(3) Between-experiment implementation of retrieval practice weakens causal inference

The retrieval practice manipulation was implemented as a separate experiment rather than as part of a factorial design. Experiment 2 was conducted after results from Experiment 1 were known, and the authors acknowledge this post hoc decision. This design introduces several potential confounds, including cohort differences between experiments, possible differences in participant motivation or task familiarity, and reduced ability to rigorously test interaction effects.

Although the authors combined data across experiments to test interactions between stress and retrieval practice, such post hoc aggregation cannot fully substitute for a factorial design. A within-experiment 2 × 2 design (Stress × Retrieval Practice) would provide substantially stronger causal evidence and reduce confounding influences.

(4) Lack of an appropriate comparison condition for retrieval practice limits the interpretation of the mechanism

Although acknowledged briefly in the discussion, the absence of an appropriate comparison condition for retrieval practice represents a critical limitation. Without a matched re-exposure or restudy control condition, it remains unclear whether observed benefits are attributable specifically to retrieval practice or to additional exposure to AB and BC associations.

Furthermore, it is unclear whether retrieval practice operates at the trial level or the participant level. Retrieval practice could enhance memory representations for specific practiced items, making those trials more resistant to stress, or it could induce a more global change in cognitive strategy or stress resilience across participants. One way to address this issue would be to analyze inference performance separately for trials that were successfully retrieved during the retrieval practice phase versus those that were not.

(5) Interpretation of EEG decoding as bridge-element reactivation may be overstated

The neural decoding results form the mechanistic foundation of the manuscript; however, the interpretation that decoding reflects reactivation of specific bridging memories may be overstated. The classifier distinguishes between face and building categories, and because the bridging element belongs to one of these categories, successful decoding may reflect category-level semantic activation rather than reinstatement of item-specific episodic representations.

Alternative explanations include category-level retrieval, strategic task differences, or even attentional biases. Because only two categories were used, the decoding analysis lacks the specificity necessary to distinguish between category-level and item-level reactivation. As such, conclusions regarding the reinstatement of specific bridging memories should be tempered or supported with additional analyses.

https://doi.org/10.7554/eLife.110350.1.sa3

Reviewer #2 (Public review):

Summary:

Guo et al. investigate the neural and behavioral mechanisms of stress-induced impairments in memory-based inference. Across two well-powered experiments (N=136), the authors demonstrate that acute stress disrupts the rapid neural reactivation of "bridge" elements necessary for novel inferences. Crucially, they identify retrieval practice as a robust behavioral buffer that restores both inferential performance and the underlying neural signatures of memory reactivation.

Strengths:

(1) The use of two independent experiments provides high confidence in the behavioral findings.

(2) Utilizing time-resolved EEG decoding allows the authors to pinpoint the "online" moment of inferential failure, a significant advancement over the lower temporal resolution of fMRI.

Weaknesses:

(1) The authors correctly timed the inference task to begin approximately 20 minutes after the onset of the stressor. While this window aligns with the expected peak of the glucocorticoid (HPA) response, it also represents a period where the rapid adrenergic (SAM) response, confirmed by heart rate elevation, is still highly influential. As the authors acknowledge, because they did not collect saliva samples due to safety protocols, they cannot definitively separate the influence of peak cortisol from the tail-end of the adrenergic surge on the observed memory impairments.

(2) Figures 4 and 6: Without asterisks is really difficult to compare the significant group differences.

Appraisal and Impact:

This study provides high-quality evidence that acute stress impairs the rapid neural reactivation of "bridge" elements necessary for novel memory-based inferences. By leveraging the high temporal resolution of EEG decoding, the authors identify the specific neural "chokepoint" where inferential failure occurs. The research is strengthened by two independent experiments and the identification of retrieval practice as a powerful buffer that not only preserves but also enhances neural reactivation under pressure. The findings have significant implications for both cognitive neuroscience and applied learning science.

https://doi.org/10.7554/eLife.110350.1.sa2

Reviewer #3 (Public review):

Summary:

In this study, Guo and colleagues investigated the effects of stress and retrieval practice on memory inference. In the first experiment, they found that memory inference was significantly worse after induced stress. Conversely, when participants received retrieval practice in the second experiment, they found no significant differences between these conditions. They monitored EEG during the inference phase and applied multivariate decoding analysis to examine evidence of neural reactivation. Complementing the behavioural findings of the first experiment, they found that they were able to decode the stimulus category of the inference item with more fidelity in the no stress condition. Surprisingly, they found the opposite direction when participants had retrieval practice, with stronger evidence of reactivation in the stress condition than in the control condition.

Strengths:

(1) The authors have carefully designed two studies investigating the effects of stress and memory retrieval on memory inference.

(2) The use of multivariate decoding on the inference phase data sheds new light on how stress and retrieval may impact the neural signatures of inference processing.

Weaknesses:

(1) There are some key gaps in the reporting of the data. In particular, data is missing on how many trials were included in the inference phase and how many were retrieved in the direct memory task. This is important to know as the main conclusions are based on inference trials proportional to the direct retrieval trials. Considering that the direct retrieval performance differs significantly between the experiments, there could be issues with floor/ceiling effects (in the behaviour) and statistical power (in the EEG results) that confound the comparisons between experiments. Without the data, it is difficult to draw conclusions.

(2) There are some relatively strong conclusions drawn without the data to support them. An important example is the title suggesting a mechanistic role of memory reactivation for these effects; however, the data instead suggest a relationship between successful inference and evidence of reactivation. Additionally, one-tailed t-tests have been used in follow-up tests, and, as I understand it, no multiple comparisons corrections have been applied to the post-hoc tests, suggesting that these findings should be interpreted with caution.

(3) In places, the structure is unclear, making the narrative difficult to follow, often making it necessary for the reader to go back and forth between the sections to understand the study and analyses. I have made some recommendations for how to improve this.

https://doi.org/10.7554/eLife.110350.1.sa1

Author response:

Public reviews:

Reviewer #1 (Public review):

(1) We agree that the current design does not allow us to cleanly dissociate whether the beneficial effect of retrieval practice on AC inference under stress reflects a selective enhancement of inferential processing or, instead, stronger memory for the underlying AB and BC premise pairs that supports later inference. We plan to revise the manuscript to remove wording that could be read as claiming that retrieval practice specifically protects inference independently of associative-memory strengthening.

Our intended interpretation is more modest. As shown in Section 3.2.3, retrieval practice improved direct premise-memory performance, consistent with the well-established testing effect. In the present paradigm, successful AC inference necessarily depends on access to the AB and BC premise associations. Accordingly, strengthened premise memory is not an alternative explanation that can be excluded by our data, but rather a plausible mechanism through which retrieval practice may promote more resilient inference performance under stress.

Because AC inference in our paradigm necessarily depends on retrieving and linking the AB and BC premise pairs, strengthened premise memory is not merely a competing explanation that can be separated from inference performance in the current dataset. Rather, it is a plausible mechanism through which retrieval practice may support inference, especially under stress. We therefore will revise the manuscript to avoid implying that retrieval practice protects inferential processing independently of associative-memory strengthening, and instead interpret the effect more conservatively as reflecting enhanced premise representations and/or more effective reactivation of bridge information during inference.

We also agree that the post-inference direct memory test, which used a 2AFC format, provides only a coarse measure of premise-memory strength and allows some proportion of correct responses to arise from guessing. Therefore, restricting analyses to trials in which AB and BC were later answered correctly does not fully guarantee that those trials were supported by strong associative memories. We will acknowledge this limitation explicitly in the manuscript and have tempered our interpretation of these “successfully retrieved” premise trials accordingly. More stringent measures, such as cued recall, confidence-based memory judgments, or other continuous indices of premise-memory strength, would be better suited to this question in future work.

Finally, we agree that the absence of a retrieval-practice benefit in the non-stress condition does not by itself rule out mediation through strengthened premise memory. Because the retrieval-practice manipulation was introduced in a follow-up study after completion of Study 1, the present dataset was not designed as a single fully crossed factorial experiment. In response to the reviewer’s suggestion, we will add an exploratory mediation analysis testing whether premise-memory performance statistically accounts for the relationship between retrieval practice and inference performance. We will report this analysis cautiously, given that premise memory was assessed using a post-inference 2AFC measure, and we note in the manuscript that a future fully crossed design with more sensitive premise-memory measures will be needed for a stronger test.

(2) We apologize that the presentation of Figure 4A was not sufficiently clear and may have created the impression of below-chance inference performance. The values shown in Figure 4A do not represent raw 3-alternative forced-choice (3AFC) A-C inference accuracy, for which the theoretical chance level would be 0.33. Instead, Figure 4A plots a normalized inference index, calculated as inference performance relative to direct retrieval performance, to account for individual differences in the availability of the directly learned premise pairs. Therefore, the raw 3AFC chance level is not the appropriate reference for interpreting this measure. To avoid this confusion, we will clarify in the revised manuscript and figure legend that Figure 4A shows a normalized inference index rather than raw inference accuracy.

(3) We agree that implementing retrieval practice in a separate experiment, rather than within a single 2 × 2 factorial design, limits the strength of the causal inference regarding retrieval practice and reduces our ability to formally test the retrieval practice × stress interaction within one unified design.

In response, we will revise the manuscript to more explicitly acknowledge this limitation and to temper our interpretation throughout. Specifically, we now avoid overstating retrieval practice as definitively preventing the effects of stress, and instead describe the findings more cautiously as evidence that retrieval practice was associated with attenuation of stress-related inference impairments across experiments. We also will add a limitation statement in the Discussion noting that the current design cannot fully rule out cohort-related confounds and that a fully crossed factorial design will be necessary in future work to provide a more rigorous test of the interaction between retrieval practice and stress.

At the same time, we have clarified that the two experiments were conducted under closely matched conditions: participants were recruited using the same protocol from the same campus population, demographic characteristics were matched, and both experiments were run in the same laboratory using the same EEG system, task procedures, and experimenter team. We agree, however, that these procedural consistencies reduce but do not eliminate the concern about between-experiment confounds.

(4) We agree that the absence of a matched re-exposure/restudy control condition limits the mechanistic interpretation of the retrieval-practice effect. In the revised manuscript, we will make this limitation more explicit in the Discussion and temper our conclusions accordingly. Specifically, we clarify that the present design shows that a post-encoding retrieval-practice intervention buffered the impact of acute stress on later inference, but it does not allow us to determine whether this benefit is specific to retrieval practice per se, rather than to additional exposure to the AB and BC associations.

We also agree that it is important to distinguish whether the effect operates at the level of specific practiced items or reflects a more global participant-level effect. In the current study, however, the retrieval-practice phase in Experiment 2 was implemented as a brief timed free-recall procedure rather than a trial-by-trial cued retrieval task, and the available records do not allow us to reliably link retrieval-practice success for individual associations to specific later AC inference trials. Therefore, we cannot directly compare later inference performance for successfully versus unsuccessfully retrieved items on a trial-by-trial basis.

To address this issue as far as possible with the current dataset, we instead plan to conduct an additional item-level robustness analysis using mixed-effects models that accounted for variability across ABC associations. Specifically, we tested whether the critical stress-by-retrieval-practice effect remained after modeling triad-level variability, and whether there was evidence that this effect differed substantially across triads. This analysis does not provide a direct test of whether successfully retrieved items benefit more than unsuccessfully retrieved items, but it does help assess whether the observed effect is broadly distributed across associations or driven by only a small subset of items.

(5) We agree that our current decoding approach does not justify a strong claim of item-specific reinstatement of a unique bridge memory. The classifier was trained to discriminate stimulus categories (faces vs. buildings) in the independent localizer and then applied during the inference phase. Therefore, the present analysis is better interpreted as indexing reactivation of bridge-related category information, rather than reinstatement of an item-specific episodic representation.

Importantly, however, we believe this signal remains theoretically informative for the inferential process examined here. In our design, the bridge element B belonged to one of the trained categories, and the classifier was applied during the cue period when no face or building stimulus was physically present. Thus, successful decoding in this time window suggests that task-relevant bridge-related information was re-expressed online during inference, rather than reflecting concurrent perceptual processing. At the same time, we agree that, because only two categories were used, the decoding analysis cannot fully dissociate bridge-related category reactivation from broader category-level retrieval, strategic task differences, or attentional contributions.

To address this concern, we plan to revise the manuscript in three ways. First, we will soften the interpretation throughout the Results and Discussion to avoid claims of item-specific bridge-memory reinstatement. Second, we now refer to the decoding effect more conservatively as bridge-related or category-level mnemonic reactivation during inference. Third, we have added an explicit limitation stating that the current design does not allow us to distinguish item-specific episodic reinstatement from category-level reactivation, and that future work using more fine-grained representational analyses and/or a larger stimulus set will be needed to resolve this issue more directly.

Reviewer #2 (Public review):

(1) We agree with this important point. The inference task was scheduled to begin approximately 20 minutes after stress onset based on prior human stress literature, with the intention of probing a time window commonly associated with glucocorticoid effects. However, as the reviewer notes, this period may also still reflect residual adrenergic/SAM influences. Because salivary cortisol was not collected due to the COVID-19-related safety protocol, we cannot disentangle the relative contributions of glucocorticoid and adrenergic responses to the observed stress-related effects on inference and neural reactivation. We will revise the manuscript to make this limitation more explicit in the Discussion and to avoid attributing the effects to a specific physiological component of the stress response.

(2) In the revised manuscript, we will add asterisks (or equivalent significance annotations) to Figures 4 and 6 to improve clarity and readability.

Reviewer #3 (Public review):

(1) We thank the reviewer for highlighting this important reporting issue. We agree that the number of trials contributing to the behavioral and EEG analyses should be reported more explicitly, particularly because inference performance was analyzed in relation to direct retrieval performance and because direct retrieval differed across experiments.

In the revised manuscript, we will report, for each group and experiment, the number of trials presented in the AC inference phase, the number of trials retained for the behavioral analyses, and the number of successfully retrieved direct-memory trials in the AB and BC tasks. These values will be summarized in the revised Results section and in Supplementary Tables.

To directly address the reviewer’s concern, we will also compared trial counts across groups/experiments and evaluated whether differences in direct retrieval performance could account for the inference and EEG effects. To further address the concern about potential unequal trial numbers, we plan to repeat the analyses such as trial-count-matched subsets analyses to see whether results remained qualitatively unchanged.

(2) We thank the reviewer for this important comment. We agree that our original title and some parts of the manuscript used language that was stronger than warranted by the data. Our results show that rapid reactivation of the bridge element is associated with successful inference and is modulated by stress and retrieval practice, but they do not by themselves establish a causal mechanistic role for reactivation. We therefore plan to revise the title and softened the relevant wording throughout the manuscript to better reflect the correlational nature of this evidence.

Specifically, we plan to change the title from “Retrieval practice prevents stress-induced inference impairment by restoring rapid memory reactivation” to “for example, Retrieval practice prevents stress-induced inference impairment and preserves rapid bridge-item memory reactivation” We also revised the Abstract, Results, and Discussion to replace stronger mechanistic wording such as “prevents,” “restoring,” and “essential neural mechanism” with more cautious phrasing such as “buffers” or “attenuates,” “preserves” or “is associated with,” and “neural correlate” or “candidate process,” as appropriate. This revision will led us to temper the overall interpretation of the EEG findings: rather than claiming that reactivation is the mechanism by which retrieval practice prevents stress-related inference deficits, we now conclude that rapid bridge-item reactivation is a neural correlate of successful inference that is sensitive to stress and enhanced by retrieval practice.

We also appreciate the reviewer’s concern regarding the use of one-tailed follow-up tests and the absence of multiple-comparison correction. With respect to the one-tailed t-tests, these follow-up comparisons were conducted because the relevant hypotheses were directional a priori. Based on prior work and our theoretical framework, we specifically predicted that acute stress would impair inference-related performance and neural reactivation, and that retrieval practice would mitigate these effects. The follow-up tests were therefore not exploratory post-hoc comparisons, but planned tests used to decompose the significant omnibus effects in the predicted direction. For this reason, we considered one-tailed testing appropriate for these comparisons.

Similarly, we did not apply an additional multiple-comparison correction to these planned follow-up tests because they were limited in number, theory-driven, and conducted to evaluate specific directional predictions rather than to search broadly across many possible contrasts. Importantly, our interpretation does not depend on any isolated post-hoc comparison, but on the consistency of the results across behavioral inference measures, neural decoding of bridge-item reactivation, and theta-band analyses. We have revised the manuscript to make this rationale clearer and to ensure that the follow-up results are interpreted in the context of the full pattern of evidence.

(3) We agree that, in the previous version, parts of the manuscript were not structured clearly enough, which may have made it difficult for readers to follow the logic of the study and the sequence of analyses without moving back and forth across sections. In the revised manuscript, we will reorganize the presentation to improve the overall narrative flow and readability. Specifically, we plan to clarify the study logic and analysis sequence, strengthened transitions between sections, and revised the relevant text in line with the #reviewer3’s detailed suggestions.

https://doi.org/10.7554/eLife.110350.1.sa0

Retrieval practice prevents stress-induced inference impairment by restoring rapid memory reactivation

Peer review process

Editors

Be the first to read new articles from eLife