1. Neuroscience
Download icon

Retrieval practice facilitates memory updating by enhancing and differentiating medial prefrontal cortex representations

  1. Zhifang Ye
  2. Liang Shi
  3. Anqi Li
  4. Chuansheng Chen
  5. Gui Xue  Is a corresponding author
  1. State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute of Brain Research, Beijing Normal University, China
  2. Department of Psychological Science, University of California, Irvine, United States
Research Article
  • Cited 1
  • Views 1,236
  • Annotations
Cite this article as: eLife 2020;9:e57023 doi: 10.7554/eLife.57023

Abstract

Updating old memories with new, more current information is critical for human survival, yet the neural mechanisms for memory updating in general and the effect of retrieval practice in particular are poorly understood. Using a three-day A-B/A-C memory updating paradigm, we found that compared to restudy, retrieval practice could strengthen new A-C memories and reduce old A-B memory intrusion, but did not suppress A-B memories. Neural activation pattern analysis revealed that compared to restudy, retrieval practice led to stronger target representation in the medial prefrontal cortex (MPFC) during the final test. Critically, it was only under the retrieval practice condition that the MPFC showed strong and comparable competitor evidence for both correct and incorrect trials during final test, and that the MPFC target representation during updating was predictive of subsequent memory. These results suggest that retrieval practice is able to facilitate memory updating by strongly engaging MPFC mechanisms in memory integration, differentiation and consolidation.

Introduction

Being able to remember and retain the most current information in a dynamically changing world requires the capacity to update one’s memory in a goal-directed manner. Updating occurs when some information is downgraded as outdated or irrelevant, and newer information is promoted as its replacement. Examples range from something as simple as trying to remember one’s new home address and phone number, to replacing maladaptive memories with adaptive ones in a therapeutic setting. Successful updating of memory involves strengthening the more current memory trace, weakening the older information, and/or differentiating old and new memories, in order to ensure that the old memory is less interfering.

Ample behavioral evidence suggests that although memory updating can be fostered by repeatedly studying the new replacement information, memory is more successfully updated by the act of retrieving the new knowledge via self-tests, a process called retrieval practice (Roediger and Butler, 2011). Compared to simple restudy of the same material again, retrieval of learnt information leads not only to better retention of relevant memory when no obvious interference is involved (Karpicke and Roediger, 2008; Pyc and Rawson, 2010), but also to better inhibition of competing, outdated memories (Anderson et al., 1994), reduced proactive interference (Szpunar et al., 2008), enhanced memory integration (Hupbach et al., 2007), and greater susceptibility to subsequent modification (Chan et al., 2009). These findings indicate that retrieval practice modifies the state of memory according to mnemonic goals. Despite this ubiquitous behavioral effect, the neural basis for memory updating and the mechanism that underlies the retrieval-practice benefit remain unclear.

A large body of work on memory reconsolidation (Dudai, 2006) suggests that consolidated memories, when retrieved and reactivated, enter into a transient, labile state, rendering them vulnerable to modification (Lee et al., 2017). Electrical shock (Kroes et al., 2014) and pharmacological treatments (Kindt et al., 2009; Nader et al., 2000) that block protein synthesis can cause long-lasting impairment of existing memories when they are reactivated. Reactivated memories can also be disrupted by behavioral methods, including retrieval-extinction (Schiller et al., 2010; Xue et al., 2012), counterconditioning (Goltseker et al., 2017), and interference approaches (James et al., 2015), which have been found to reduce the expression of fear or drug memories and amygdala responses associated with fear (Agren et al., 2012). Using representational similarity analysis, recent studies further show that the reactivated memories could be selectively strengthened (Jonker et al., 2018; Lu et al., 2015; Xue et al., 2010), weakened (Wimber et al., 2015), integrated (Schlichting et al., 2015), and/or differentiated (Hulbert and Norman, 2015), depending on the mnemonic goals and the characteristics of the reactivated memory (Tambini and Davachi, 2019).

Emerging studies suggest that the medial prefrontal cortex (MPFC) probably plays an important role in the rapid formation of cortical memories, especially during retrieval practice. Specifically, it has been hypothesized that retrieval practice will reactivate related memory traces, and that the MPFC is able to develop integrated neocortical representations of these memory traces rapidly, in a way that resembles the characteristics of rapid system consolidation (Antony et al., 2017). Consistently, the MPFC has been found to be involved in the integration and updating of reactivated memory traces (Gilboa and Marlatte, 2017; Preston and Eichenbaum, 2013). For example, the MPFC is critically involved in encoding novel but related information into existing knowledge (Sommer, 2016), in representing overlapping memories (Tompary and Davachi, 2017), and in inferring relationships between distinct events that share common features (Zeithamova et al., 2012). In these studies, the reactivation of related memories has been found to be important for the MPFC-mediated processes. Nevertheless, it is unclear how the MPFC would be involved when competing memories were reactivated.

Several studies suggest that the lateral prefrontal cortex (LPFC) may play a role in regulating reactivated memories to support memory changes. First, the LPFC could bias the competition and reduce the intrusion of unwanted memory in later memory retrieval (Kuhl et al., 2012). Intentional suppression of memory retrieval reduces hippocampal activity via control mechanisms mediated by the LPFC (Hulbert et al., 2016). These studies suggest an important role of LPFC in control of memory. Second, studies examining retrieval-induced forgetting (Anderson et al., 1994; Norman et al., 2007) have found that retrieval practice could suppress (Wimber et al., 2015) or differentiate (Hulbert and Norman, 2015) the neural representation of competitive memories in the sensory cortices or hippocampus, and that these processes are mediated by the LPFC. Third, reactivation during wakefulness has been found to destabilize memories and has been associated with LPFC activation (Diekelmann et al., 2011), supporting its role in modifying reactivated memory representations.

The present study compared the neural reactivation during retrieval practice with that during restudy, and examined how the reactivated memory representations interact with the lateral PFC to achieve subsequent memory updating. In the Restudy condition, we simply asked subjects to study new replacement memories repeatedly. In the retrieval practice (RetPrac) condition, we asked subjects to retrieve the new replacement memory repeatedly. Feedback was provided as previous studies have shown that it could boost the behavioral performance (Butler and Roediger, 2008; Pashler et al., 2005). We hypothesized that during updating, retrieval practice would elicit greater reactivation of the outdated competitor (i.e., B) than would restudy exposures, reflecting the greater tendency of retrieval to trigger competition that requires correction. Despite these added difficulties, we predicted that during the final memory test, retrieval practice would lead to a better long-term effect: stronger reactivation of the replacement memory (i.e., the target, C) as well as weaker reactivation of the outdated competitor.

Results

We used a multi-day design to examine both the short-term and long-term effects of memory updating under these two conditions. On Day 1, we extensively trained 19 subjects on associations between words (A) and pictures (B). On Day 2, we introduced new replacement A-C associations (B and C were of different visual categories) and asked subjects to replace the old associations with the new ones before entering the scanner. In the scanner, subjects encountered the new A-C associations three times, either via retrieval practice (RetPrac) or extra study exposures (Restudy). For RetPrac, the cue word was paired with a black rectangle, and subjects were asked to recall the picture associated with the word. When the black rectangle turned red after 2 s, they were asked to judge the category of picture C by pressing one of the four buttons corresponding to Face, Object, Scene, and Don’t know. Then the correct picture C was shown on the screen for 1 s as a feedback (Figure 1A, upper panel). For Restudy, the procedure was identical except the cue word was paired with the correct picture C at the beginning of the trial and subjects were asked to memorize the association and then make the category judgment. After these updating trials on Day 2, we waited for 24 hr to probe how successful retrieval practice and restudy were at accomplishing long-lasting memory updating. On Day 3, we scanned subjects again using a cued recall test. Subjects were asked to recall the visual details of the picture associated with the cue word presented on the screen, and then to respond by pressing one of the four buttons corresponding to Face, Object, Scene, and Don’t know. They were then asked to perform a perceptual orientation judgment task for 8 s (Figure 1A, lower panel) before the next trial started.

Experimental design and behavioral results.

(A) Experimental design. On Day 1, subjects were over-trained with 144 pairs of word (A) – picture (B) associations. On Day 2, subjects were introduced to 144 new A-C associations (B and C were always from different visual categories) and asked to replace the old associations with the new ones before entering the scanner. In the scanner, each new A-C association was studied three times under one of the two updating conditions: Retrieval Practice (RetPrac) vs Restudy. Each trial started with a recall phase showing the cue word A paired either with a black rectangle (RetPrac) or with the associated picture C (Restudy). Two seconds after the recall phase, a red rectangle lasting for 1 s was shown and subjects needed to judge the category of picture C within this response window. Then the correct picture C was shown on the screen for 1 s as a feedback. On Day 3, subjects performed the A-C memory test while being scanned. The recall phase lasted for 3 s followed by a 1 s response window. Then subjects were asked to perform a perceptual orientation judgment task for 8 s. (B) Proportions of responses of targets (correctly choosing A-C), competitors (wrongly choosing A-B), and 'others' as shown according to each of the three study repetitions during updating. (C) The proportions of recalled targets, competitors, and 'other' categories during the final memory test one day after updating practice.

Retrieval practice benefited long-lasting memory updating

Consistent with our hypothesis, we found that retrieval practice was a superior method for long-lasting memory updating. During the final memory test administered on Day 3, subjects recalled more updated targets (t(18) = 5.37, p<0.001, Cohen’s d = 1.30, Figure 1B) with shorter response time (t(18) = −4.13, p<0.001, Cohen’s d = 0.35), and showed fewer memory intrusions (i.e., accidental recall of the outdated competitors) (t(18) = −4.47, p<0.001, Cohen’s d = 1.04) for pairs updated under the RetPrac condition compared to those updated under the Restudy condition. These findings indicate a significant self-testing effect in memory updating. Notably, the added difficulty of retrieval practice in the short term may have benefited the long-term test. During the acquisition of the replacement memory on Day 2, updating, retrieval practice performance was initially near the chance level, but it improved dramatically across the three practice repetitions, as indicated by increased ability to recall the target response category (F(2, 36)=137.11, p<0.001), and by decreased production of responses in the competitor category (F(2, 36)=9.82, p<0.001) or 'other' responses (F(2, 36)=42.69, p<0.001) (Figure 1C). By contrast, behavioral performance for the restudy trials was near perfect across the three repetitions, reflecting the ease in processing material that is simply presented for re-study when no retrieval demand is involved (accuracy = 99.0%, and no difference across repetitions F(2, 36)=0.42, p=0.66).

Retrieval practice facilitated memory updating via differentiation

The above analysis suggests that retrieval practice could enhance the accessibility of A-C memories and reduce that of A-B memories in the A-C memory test. Several mechanisms could account for this result: (1) strengthening of A-C memory (Karpicke and Roediger, 2008; Roediger and Butler, 2011); (2) weakening of A-B memory (Anderson et al., 1994; Levy and Anderson, 2002); (3) differentiation of A-C and A-B memories (Hulbert and Norman, 2015; Storm et al., 2008). Previous studies suggest that reactivating the old memory and detecting and remembering the difference would help to resolve the proactive interference (van den Honert et al., 2016; Wahlheim and Jacoby, 2013). Thus, our results could be contributed by (4) integration and differentiation, which is a more specific version of the differentiation mechanism. To differentiate these hypotheses, we performed two additional behavioral experiments (Exp. 2 and 3) to examine how retrieval-practice affected A-B memory performance.

The inhibition hypothesis would predict greater C memory and weaker B memory intrusion in the A-C memory test, but worse B memory and greater C intrusion in the A-B memory test under the RetPrac condition than under the Restudy condition. By contrast, both the differentiation hypothesis and the integration and differentiation hypothesis would predict weaker B memory intrusion in the A-C memory test, and weaker or comparable C memory intrusion (given C memory was strengthen) in the A-B memory test.

In the new experiment (n = 46, Exp. 2), the procedure was nearly identical to the main experiment except that during the Day 3 test, we asked subjects to do both A-B and A-C memory tests, without the perceptual orientation judgment task between trials. To examine the effect of test order, half of the subjects did the A-B memory test first and the other half did the A-C memory test first. For both A-C and A-B memory tests and each response type (target, competitor, 'other'), test order (A-C first, A-B first) by update method (RetPrac, Restudy) two-way ANOVA revealed no significant main effect of test order (p-values>0.43, except for a trend towards a significant effect in 'other' responses during the A-B test, p=0.07, FDR corrected), nor of the test order by update method interaction (p-values >0.54, FDR corrected) (Supplementary file 1). Given our focus on the effect of updating method and the lack of updating method by test order interaction, we thus combined the data from both test order groups in the following analyses to increase the statistical power. Replicating the main effect, we found greater A-C memory under the RetPrac condition than under the Restudy condition during the A-C memory test, as indicated by the significantly higher correct recall of targets (t(45) = 3.93, p<0.001, Cohen’s d = 0.57) and fewer competitor intrusions (t(45) = −2.75, p=0.008, Cohen’s d = 0.36) under the RetPrac than under the Restudy condition (Figure 2A). However, we found that A-B memory was very strong during A-B recall (55.9% correct) and showed no effect of updating method on either B memory (t(45) = 1.43, p=0.19) or C intrusion (t(45) = 0.80, p=0.43).

Figure 2 with 1 supplement see all
Results of follow-up behavioral Experiments 2 and 3.

(A) Behavioral results from Exp. 2. The proportions of recalled targets, competitors, and 'other' categories during the final A-C and A-B memory tests one day after updating practice. Note, the targets and competitors were referred to as A-C and A-B memory in the A-C memory test, but as A-B and A-C memory in A-B memory test. (B) The proportions of A-C memory intrusions during A-B recall, as determined by whether the correct A-C memory was recalled during the A-C test. (C) Joint analysis for both A-B and A-C memories revealed memory differentiation. (D) Behavioral results from Exp. 3. The proportions of correct item recall during the final A-C and A-B memory tests one day after updating practice.

This pattern was consistent with the differentiation hypothesis. That is, although C memory was strengthened by retrieval practice, no greater C intrusion was found in the A-B memory test. On the other hand, although B memory was comparable between the two conditions, there were fewer B intrusions in the A-C memory test. We noticed that the number of C memory intrusions was overall low, perhaps because of the weak A-C memory. We therefore did a further analysis to focus on strong A-C memory trials (correct trials in the A-C memory test), which would produce more intrusions. Consistently, we found that the correct trials in the A-C memory test (reflecting strong A-C memory) showed more C memory intrusions overall during the A-B test than did the incorrect trials (t(45) = 3.38, p=0.001, Cohen’s d = 0.71). Interestingly, the intrusion rate was numerically smaller in the RetPrac condition than in the Restudy condition (t(45) = −1.59, p=0.12), which is consistent with the differentiation hypothesis (Figure 2B).

To further test the differentiation hypothesis, we did a joint analysis to examine the subjects’ answers in both A-B and A-C memory tests given the same word cues. The differentiation hypothesis would predict more correct trials in both tests (i.e., C response in the A-C memory test and B response in the A-B memory test) under the RetPrac condition, i.e., that subjects maintained stronger and nonoverlapping representations of both A-B and A-C memories. Meanwhile, we would predict fewer trials in which subjects responded with old B memory in both tests (due to differentiation), and also fewer (due to differentiation) or comparable (due to strengthening of C and differentiation) trials in which subjects responded with new C memory in both memory tests. Our data supported all three predictions. We found that subjects made more correct responses in both tests under the RetPrac condition than under the Restudy condition (24.0% vs 19.7%, t(45) = 3.81, p<0.001, Cohen’s d = 0.48), but showed less B memory (21.0% vs 23.2%, t(45) = −2.61, p=0.012, Cohen’s d = 0.25), and comparable C memory (11.7% vs 11.5%, t(45) = 0.18, p=0.86) in both memory tests (Figure 2C).

To further test whether RetPrac was able to modify the details of memory representation, in a third experiment (n = 28, Exp.3), we asked the subjects to write down the name of the associated picture (or any associated details if they could not recall the exact name) for each cue word, instead of choosing one of the three categories by pressing a button. Only answers with correct picture name or specific details were considered as a correct item recall. Our findings again replicated the retrieval-practice effect on A-C memory, as indicated by the significantly higher correct item recall (t(27) = 8.06, p<0.001, Cohen’s d = 0.50) under the RetPrac condition than under the Restudy condition, but comparable performance on the A-B memory test (t(27) = 0.81, p=0.43) (Figure 2D). These results are consistent with the hypothesis that RetPrac helped to achieve better memory updating by differentiation and do not favor the idea that updating is accomplished by inhibition.

The lack of suppression was not due to the weak proactive interference

Could the lack of suppression be due to the weak A-B intrusion? Subjects were well trained in the A-B association. However, we observed only slightly more competitor intrusions than unrelated errors (Exp. 1: 0.232 vs 0.209, t(18) = 1.93, p=0.069; Exp. 2: 0.233 vs 0.207, t(45) = 2.72, p=0.009; Exp. 3: 0.221 vs 0.200, t(27) = 2.33, p=0.027). This could reflect the fact that subjects were told explicitly that the associations had been changed and new associations should be explored and learnt, and is consistent with the differentiation account.

In additional analyses, we examined how the number of Day 2 intrusions was related to Day 3 performance. We found that pairs with more competitor intrusions during memory updating had worse A-C memory, more A-B intrusions, and comparable 'other' responses during the final A-C memory test. This pattern was consistent in both Exp. 1 and Exp. 2 (Figure 2—figure supplement 1A,B, left panel). Similarly, pairs with more 'other' responses during memory updating had worse A-C memory, more 'other' responses, and comparable A-B intrusions during the final A-C memory test (Figure 2—figure supplement 1A,B, right panel). These results suggested that the strength of proactive interference affected the new A-C learning. Importantly, there were many more A-B intrusions than 'other' responses during the final test, even when the number of responses during updating was matched, suggesting strong A-B interference (Figure 2—figure supplement 1A,B). Finally, we found that the number of competitor and 'other' responses during updating had no effect on A-B memory during the A-B memory test (Figure 2—figure supplement 1C), suggesting that the consolidated A-B memory was difficult to weaken. All of these results suggested that the lack of A-B memory suppression resulted from its strong representation.

The effect of retrieval practice on target and competitor representations

To understand the neural basis of the retrieval-practice advantage, we used fMRI and MVPA to track the reactivation of the outdated and replacement memories during the two types of updating, and to link those patterns to memory performance on the final test. First, we trained neural classifiers to differentiate three categories of materials, i.e., faces, scenes, and objects, on the basis of independent functional localizer data. We then used them to examine the degree of memory reactivation during Day 2 updating and Day 3 final test (Kuhl et al., 2012; Figure 3—figure supplement 1A). We focused our analysis on the medial prefrontal cortex (MPFC), ventral temporal cortex (VTC), angular gyrus (AG), and hippocampus (HPC), which overlap with the core recollection network (Rugg and Vilberg, 2013) and have consistently shown a neural reinstatement effect during memory retrieval (Kuhl and Chun, 2014; Wimber et al., 2015; Xiao et al., 2017). Confirming the relevance of these regions to our task, we found that the classifier output could predict subjects’ categorical judgments during the final test with significantly above-chance accuracy in the MPFC, VTC, and AG (ranging from 39.6% to 42.6%, all p-values <0.001, all survived FDR correction), but not in the HPC (35.5%, p=0.35) (Figure 3—figure supplement 1B). The following analysis thus focused on the first three regions (Figure 3A).

Figure 3 with 1 supplement see all
Neural reactivation during the final memory test.

(A) Depiction of the anatomical ROIs used in the main analysis. All ROIs consisted of regions from both hemispheres. (B) The reactivation of target (picture C) and competitor (picture B) during the final test as a function of updating method and memory outcome, based on classifier outputs (after subtracting 'other' evidence). Error bars indicate within-subject standard errors.

Next, we used these classifiers to examine how retrieval practice could shape the target and competitor representations in the brain during the final memory test. To further examine how the representations in these regions were differentially modulated by retrieval practice and behavioral performance, we separately examined the correct trials (i.e., when targets were chosen) and the incorrect trials (i.e., when competitors were chosen). Three-way repeated-measures ANOVA was conducted, with evidence (Target vs Competitor), outcome (Correct vs Incorrect) and updating method (RetPrac vs Restudy) as within-subject factors. Both the inhibition and differentiation hypothesis would predict stronger target representation and weaker competitor representation for RetPrac than for Restudy, whereas the integration and differentiation hypothesis would predict strong reactivation of both target and competitor representations for RetPrac, despite the superior behavioral performance.

In the MPFC, three-way ANOVA revealed a significant evidence-by-outcome interaction (F(1,18) = 23.25, p=0.0001, survived FDR correction, Figure 3B), suggesting that the activation in the MPFC tracked behavioral performance. We also found a significant method-by-evidence interaction (F(1,18) = 6.50, p=0.02, survived FDR correction). No three-way interaction or method-by-outcome interaction was found (p-values >0.08, Supplementary file 2a). We then did two separate two-way ANOVAs for target and competitor evidence reactivation. For target reactivation, there were significant main effects of updating method (RetPrac vs Restudy) (F(1,18) = 7.01, p=0.02, Supplementary file 2b) and outcome (chosen targets vs chosen competitors) (F(1,18) = 15.10, p=0.001, survived FDR correction), suggesting that correct responses were associated with stronger target evidence reactivation, and that retrieval practice was able to boost the target reactivation for both correct and incorrect trials. For competitor evidence, however, we found that there was stronger competitor evidence reactivation for incorrect trials than for correct trials under the Restudy condition (t(18) = 3.46, p=0.003, Cohen’s d = 0.86, survived FDR correction), whereas no such difference was found for the RetPrac condition (t(18) = 0.35, p=0.73), although the outcome-by-condition interaction did not reach significance (F(1,18) = 2.53, p=0.13, Supplementary file 2b). The latter result indicated that even when correct responses were made, there was still strong and comparable competitor reactivation under the RetPrac condition, suggesting that RetPrac integrated and differentiated competitor and target evidence in the MPFC.

In the AG, three-way ANOVA also revealed significant a evidence-by-outcome interaction (F(1,18) = 18.46, p=0.0004, survived FDR correction, Figure 3B), suggesting that the representation in the AG tracked behavioral performance. We also found a significant method-by-evidence interaction (F(1,18) = 7.45, p=0.01, survived FDR correction). No other main effect or interaction was found (p-values >0.36, Supplementary file 2a). Once again, we performed two separate two-way ANOVAs for target and competitor reactivation. For target reactivation, there was a significant main effect of response type (F(1,18) = 7.23, p=0.02, survived FDR correction, Supplementary file 2b), with stronger target evidence for correct responses than for incorrect responses. For competitor reactivation, there was a significant main effect of response type (F(1,18) = 6.04, p=0.02, Supplementary file 2b), with stronger competitor evidence for incorrect responses than for correct responses. Together, these results suggested that the AG representation mainly tracked the behavioral performance.

In the VTC, we also found a significant evidence-by-outcome interaction (F(1,18) = 26.51, p<0.0001, survived FDR correction, Figure 3B), again suggesting that the representation in the VTC tracked behavioral performance. Interestingly, we found a significant main effect of method (F(1,18) = 5.58, p=0.03), which did not interact with other factors (p-values >0.20, Supplementary file 2a), suggesting that retrieval practice significantly suppressed competitor evidence (F(1,18) = 4.68, p=0.04) and marginally reduced the target evidence (F(1,18) = 3.26, p=0.09) in the VTC.

Retrieval practice and restudy were associated with distinct subsequent memory effects

The improved memory that arises from retrieval practice may be supported by neural mechanisms that are distinct from those involved in restudy, and the former mechanisms could produce more resilient traces than the latter. To test this possibility, we performed an analysis to determine whether the two updating methods were associated with different subsequent memory effects. In particular, we examined the pattern of target reactivation during updating and subsequent memory.

This analysis revealed distinct patterns for the Restudy and RetPrac conditions: correctly recalled items (i.e., target) showed stronger target activation than did incorrectly recalled ones (i.e., competitor) in the VTC (t(18) = 4.29, p<0.001, Cohen’s d = 0.79, survived FDR correction) during restudy (all other ROIs, p-values >0.13), whereas retrieval practiced items that were later correctly recalled were associated with stronger target activation than incorrectly recalled ones in the MPFC (t(18) = 2.66, p=0.016, Cohen’s d = 0.60, survived FDR correction, Figure 4A) (all other ROIs, p-values >0.12). This finding suggests that successful memory updating may involve different representations under the RetPrac and Restudy conditions. The greater association of MPFC representation with enduring retention is consistent with its putative involvement in the consolidation process (Antony et al., 2017; Tompary and Davachi, 2017), and suggests that retrieval practice improves updating by driving consolidation more successfully than does restudy.

Figure 4 with 3 supplements see all
Memory reactivation during updating and its change in the final test.

(A) Target memory representation (after subtracting 'other' evidence) during updating as a function of subsequent memory performance (correctly recalled targets vs incorrectly recalled competitors) during the final memory test. (B) Classifier evidence of competitor and other categories during the A-C updating phase under the Restudy and RetPrac conditions. Restudy was associated with weaker competitor reactivation. (C) Model fitting of the nonmonotonic plasticity hypothesis under the RetPrac condition. Only VTC showed the hypothesized pattern in which modest competitor reactivation (normalized into [0, 1] range) weakened, and strong competitor reactivation enhanced later competitor memory reactivation. Error bars indicate within-subject standard errors.

Retrieval practice was associated with greater competitor reactivation during updating

In addition to the greater engagement of MPFC, retrieval practice advantage on Day 3 might also derive in part from the need to overcome retrieval competition during the updating on Day 2. To test this hypothesis, we looked at mnemonic representations for competitors during the updating process, although the competitors were not presented under either the RetPrac or the Restudy conditions. As predicted, we found significant competitor reactivation (compared with 'other' evidence) under the RetPrac condition in the MPFC (t(18) = 3.26, p=0.004, Cohen’s d = 0.98), VTC (t(18) = 3.20, p=0.005, Cohen’s d = 1.03), and AG (t(18) = 4.31, p<0.001, Cohen’s d = 1.34), all survived FDR correction (Figure 4B). By contrast, there was no evidence of reactivation of the outdated competitors in any of the three ROIs in the Restudy condition (all p-values >0.13). Direct comparisons revealed significantly stronger competitor reactivation under the RetPrac condition than under the Restudy condition in the MPFC (t(18) = 4.32, p<0.001, Cohen’s d = 1.36), VTC (t(18) = 10.87, p<0.001, Cohen’s d = 3.46), and AG (t(18) = 6.98, p<0.001, Cohen’s d = 2.21), again all survived FDR correction. A similar pattern was found when only correct (target) trials were included (all p-values <0.017 for the RetPrac condition; all p-values >0.07 for the Restudy condition; all p-values <0.005 when directly comparing the RetPrac and Restudy conditions; Figure 4—figure supplement 1).

The targets were only briefly presented as feedback under the RetPrac condition, whereas they were shown throughout the whole trial under the Restudy condition, so the differences in trial structure might bias the classifier performance. In particular, the reduction in competitor evidence under the Restudy condition might be due simply to the strong target evidence accumulated over a longer interval, but not to the lack of competitor reactivation. This possibility predicts lower evidence not only for the competitor, but also for the 'other' (third) category under the Restudy condition. Contrary to this baseline shift hypothesis, we found no significant differences in the evidence for the 'other' category between the RetPrac and the Restudy conditions in MFPC (t(18) = −0.42, p=0.68). We did, however, find evidence for a baseline shift in the VTC (t(18) = 9.25, p<0.001) and AG (t(18) = 3.60, p=0.002) (Figure 4B), suggesting that reduced competitor activation in that structure during restudy may be due in part to differences in trial structure. There was, however, significant evidence (Competitor vs Other) by updating method (RetPrac vs Restudy) interaction in all three regions (all p-values <0.01), indicating that retrieval practice, as a method of updating, elicited significantly greater competition from distracting representations. This additional competition posed extra difficulties that needed to be overcome, difficulties that did not arise during restudy. This finding corresponds well with behavioral evidence of the increased incidence of competitor intrusions during retrieval practice, relative to during restudy.

Across the three repetitions, we found that the target evidence increased with the number of repetitions in the VTC (t(18) = 3.44, p=0.003), but not in the MPFC (t(18) = 1.75, p=0.10) or AG (t(18) = 1.78, p=0.09) (Figure 4—figure supplement 2). Strikingly, although subjects made fewer competitor responses across repetitions, we did not find a significant reduction in competitor evidence across repetitions (all p-values >0.73). This result fits very well with the integration and differentiation hypothesis.

Reactivation-dependent memory updating in VTC during retrieval practice

The analyses so far revealed that compared to restudy, retrieval practice led to greater competitor reactivation during updating, but to reduced competitor reactivation during the final test in the VTC, but not in the MPFC. These findings are consistent with the possibility that retrieval practice may drive memory suppression in the VTC but not MPFC, triggered by their reactivation during the updating process. According to the nonmonotonic plasticity hypothesis, there is a nonlinear relationship between the strength of the memory reactivation and its later change, such that moderate reactivation has a weakening effect whereas strong reactivation has a strengthening effect (Kim et al., 2014; Newman and Norman, 2010; Ritvo et al., 2019). If this is the case, we would predict a U-shaped relationship between competitor reactivation strength during updating and during the final test.

To test this hypothesis, we used the P-CIT Bayesian curve-fitting algorithm to estimate the shape of the curve between competitor evidence during updating and the final test (Detre et al., 2013). The model hypothesizes a U-shaped curve, and the posterior probability of the theory consistency [P(Theory consistent)] indicates how well the fitted curve aligns with this hypothesis (the greater, the better, chance level = 0.5). We fitted this model in all three ROIs separately. The results suggest that VTC showed a pattern that was highly consistent with the model under the RetPrac condition [P(Theory consistent)=0.836, p=0.012, Figure 4C]. No such pattern was found in the MPFC [P(Theory consistent)=0.526] and the model fitting failed for the AG. All model fitting in these three regions failed under the Restudy condition (Figure 4—figure supplement 3). This may be the result of weak competitor reactivation during restudy, which did not cover the full range of the nonmonotonic plasticity curve.

Together, the results suggested that competitor reactivation was associated with subsequent suppression, which could be attributed to nonmonotonic synaptic plasticity. However, we found this effect in only the VTC and not in the MPFC or AG, suggesting region-specific effect of memory suppression.

The LPFC contributed to MPFC memory updating under the RetPrac condition

To identify processes that contributed to goal-directed modulation of reactivated memories, we compared brain activity during updating between the RetPrac and Restudy conditions. We found that updating by retrieval practice engaged the left lateral prefrontal cortex (LPFC), dorsal anterior cingulate gyrus (dACC), bilateral anterior insular cortex (AI), and caudate nucleus (Figure 5A, Supplementary file 3a) more than did updating by restudy, whereas the hippocampus and other regions showed significantly weaker activation (Figure 5—figure supplement 1, Supplementary file 3b). These findings are consistent with prior work showing the engagement of cognitive control during retrieval (Wimber et al., 2015). To further probe the function of these regions in overcoming intrusions from outdated competitors, we examined how these prefrontal and striatal activations varied with updating performance during retrieval practice. We distinguished between those retrieval-practiced trials that subjects recalled incorrectly (incorrect trials, IC), those they got correct the first time a given pair was shown (First Correct, FC), and those that they got correct the second or third time for a given pair was shown (Later Correct, LC). The rationale is that compared to LC trials. IC and FC trials should have a greater need for competition resolution. In addition, the FC trials may engage stronger reward-based learning than IC and LC trials because of the former’s greater positive prediction error. Both mechanisms could contribute to representational change and memory differentiation.

Figure 5 with 2 supplements see all
LPFC activity and memory updating under the RetPrac condition.

(A) Brain regions that showed greater activation during memory updating under the RetPrac than under the Restudy condition. The color bar indicates one minus the P-value (corrected). (B) Activity in the LPFC was sensitive to pairs’ updating performance. The failed recall (IC) and the successfully recalled the first time (first correct, FC) trials showed greater activation than the successfully recalled the second or the third time (later correct, LC) trials, indicating that the LPFC was involved in inhibiting competitive memories. (C) MPFC memory updating (target representation minus competitor representation) as a function of LPFC activation during updating (divided by quartiles). Error bars indicate within-subject standard errors.

Consistently, we found that the left LPFC activation for LC trials was significantly lower than that for FC trials (t(18) = −7.17, p<0.001) and IC trials (t(18) = −4.76, p<0.001) (Figure 5B), suggesting that the left LPFC activation was mainly driven by the extent of memory competition. A similar pattern was also found in the dACC and AI (Figure 5—figure supplement 2, Supplementary file 3c). By contrast, the caudate activation for the FC trials was significantly greater than that for the other two types of trials (all p-values <0.004, Supplementary file 3c), consistent with its role in prediction-error-based processing.

To link the LPFC and caudate activation to representational change during updating, we examined whether the caudate and LPFC activation in the current repetition was related to competitor suppression in the subsequent repetition. Owing to the caudate’s role in prediction error, we focused on the first correct trial (FC). This revealed that strong caudate activation was associated with greater competitor evidence reduction in the next repetition in the VTC (χ2(1)=5.86, p=0.015), but not in MPFC or AG (all p-values >0.15), suggesting that the VTC evidence could be temporary weakened by reinforcement learning. No such effect was found for LPFC when focusing on the incorrect and first correct trials (p-values >0.41), suggesting that LPFC did not temporally suppress the competitor evidence.

We further examined whether the LPFC and caudate activity during retrieval practice was associated with long-term memory updating on Day 3. This analysis revealed that trials with greater LPFC activity during retrieval practice ultimately showed superior memory updating (i.e., target – competitor evidence) during the final test on Day 3 in the MPFC (χ2(1)=4.62, p=0.032), but no effect was found in the caudate. Together, these results suggest that the LPFC is involved in resolving retrieval competition between targets and competitors, and ultimately contributes to successful long-term memory updating in the MPFC. By contrast, the caudate may suppress short-term representation through reward-based supervised learning.

Discussion

Memory updating serves an adaptive role in ensuring that the most relevant information is accessible in memory. Behavioral studies have long emphasized the role of retrieval in memory updating, yet the neural mechanisms behind this process are barely understood. We found that, compared to simple restudy, retrieval practice was associated with better memory updating without suppressing the old memories. Furthermore, by tracking the neural evidence of old and new memories during both final memory and updating, we demonstrated that superior memory updating under retrieval practice could be achieved by multiple mechanisms. These results provide important insights into the neural mechanisms of memory updating.

When updating memory with replacement information, one needs to enhance the new target memory, inhibit the outdated competing traces, and/or differentiate the old and new memory traces. These different mechanisms could be examined by testing A-B memory. We found that retrieval practice had no effect on A-B memory, albeit it significantly enhanced new memory and reduced old memory intrusions in the A-C memory test. Further supporting the differentiation hypothesis, retrieval practice increased the number of trials in which subjects appropriately chose the targets in different test conditions (an indication of differentiation), and reduced the number of the trials in which the same responses were made in both test conditions (an indication of indifferentiation). At least two factors might contribute to the lack of suppression of old memory trace. First, some studies have shown that retrieval-induced forgetting was more pronounced after a short delay (minutes to hours) than after a long delay (days) (Abel and Bäuml, 2014; Liu and Ranganath, 2019; Murayama et al., 2014). Second, the old memory was extensively trained and consolidated, which made it harder to inhibit. In any case, our results suggest that reduced intrusions could be achieved without significantly suppressing the old memories, but by strengthening the new memory traces and differentiating the old and new memory traces.

The current study revealed several neural mechanisms that could account for the advantages of retrieval practice in memory updating. First, we found that retrieval practice could shift the neural substrates from VTC to MPFC, which is involved in fast system consolidation (Antony et al., 2017). Existing studies have shown that during retrieval, item-specific reactivation is generally not found in the VTC (Favila et al., 2018; Xiao et al., 2017). As a result, with repeated retrieval practice, the brain may rely less on the sensory information for mnemonic decisions. These features are consistent with the behavioral findings that retrieval practice does not improve the quality of sensory memory (Sutterer and Awh, 2016), and may promote gist-based false memory (McDermott, 2006). Our study suggests that the MPFC may be responsible for this gist-based memory given its role in schema-based learning (Gilboa and Marlatte, 2017; Preston and Eichenbaum, 2013).

Second, feedback was provided during retrieval practice in the current study. Although existing studies have found a significant effect of retrieval practice when no feedback was provided (Karpicke and Roediger, 2008), feedback has been consistently shown to improve memory performance (Butler and Roediger, 2008; Pashler et al., 2005). The current study also found greater caudate activation under the RetPrac condition, in particular for the first correct trial, which is consistent with its role in processing positive prediction error (O'Doherty et al., 2004). Recent studies suggest that prediction error plays an important role in memory updating (Kim et al., 2014) and reconsolidation (Lee et al., 2017), and that the caudate is involved in modifying and re-encoding the retrieved memory representation (Scimeca and Badre, 2012). Extending these observations, we found that the caudate’s activation during first correct response was associated with reduced competitor evidence in the visual cortex, lending support to the idea that the supervised-learning mechanism could lead to representational changes (Ritvo et al., 2019).

Third, retrieval practice is an effortful process as compared to simple restudy. Consistently, neuroimaging studies have found greater neural activity during retrieval than during restudy (Wing et al., 2013). In addition, retrieval practice could also potentiate subsequent learning, which is associated with greater frontoparietal activity (Nelson et al., 2013). Our results are highly consistent with these observations, revealing stronger activation in the LPFC, dACC and insula. We further found that LPFC activation was greater when there was a greater competitor intrusion, which is consistent with its role in controlled memory retrieval among competitors (Badre et al., 2005). Previous studies have further implicated the LPFC in reducing the intrusion of competitors in memory retrieval (Kuhl et al., 2012), and in reducing competition memories through cortical pattern suppression (Wimber et al., 2015). The current study did not find a strong association between LPFC activation and competitor suppression during updating, possibly because the to-be-suppressed competing memories in the current study were well trained and consolidated by overnight sleep. We however found that the LPFC activation was associated with long-term memory updating in the MPFC, suggesting that the LPFC might help to resolve the interference between old and new memories.

Critically, the current study identified a region-specific relationship between competitor reactivation during updating and later memory changes. Consistent with previous studies (Kuhl et al., 2012; Wimber et al., 2015), we found significant VTC competitor reactivation during updating and competitor suppression during the final test. One difference is that the current study also found a trend of target suppression in this region, whereas previous studies found target enhancement during retrieval practice. Furthermore, we found a U-shaped relationship between competitor reactivation in the VTC during updating and the final test, which is consistent with the nonmonotonic plasticity mechanism (Ritvo et al., 2019).

A different pattern was found in the MPFC, where the target evidence was strengthened. More importantly, it was integrated but differentiated from the competitor evidence, as indicated by the comparable competitor reactivation for correct and incorrect responses. Furthermore, we found that the MPFC’s memory updating was not predicted by the nonmonotonic plasticity principle, but was rather associated with LPFC activity driven by the competitor reactivation. The MPFC has been implicated in memory integration and updating (Preston and Eichenbaum, 2013; Zeithamova et al., 2012). Our results replicated and extended these observations by showing that memory integration could occur even when competing memories were simultaneously reactivated. This fits very well with the hypothesis that MPFC is able to develop rapidly integrated neocortical representations of reactivated memory traces during retrieval practice (Antony et al., 2017).

These results suggest that during retrieval practice, the co-activation of old and new memories might provide a unique opportunity to modify these representations and to facilitate memory updating. The MPFC could form integrated representations of co-activated and competing memories, while the LPFC control mechanism might contribute to memory updating by selectively strengthening the reactivated target memory and differentiating the old and new memory representations in the MPFC. The differentiation could be achieved by adding contextual representations into the memory trace, thus forming more unique representations of old and new memories, and/or by linking old and new memory representations to different aspects of cue representations. These processes could be further enhanced by feedback-driven supervised learning during retrieval practice, as well as by non-supervised Hebbian learning that involves nonmonotonic plasticity (Ritvo et al., 2019). Future studies should further examine how the LPFC and MPFC could contribute to the representational change of the reactivated old and new memories.

In the current study, we also found significant memory reactivation in the angular gyrus during both updating and the final test. Unlike memory reactivation in the VTC or MPFC, we found that memory reactivation in the angular gyrus tracked closely the behavioral performance. Consistently, other studies have shown that the angular gyrus exhibits abstract yet item-specific mnemonic representations (Kuhl and Chun, 2014; Xiao et al., 2017), which is modulated by mnemonic goals (Favila et al., 2018). Through its connection with the more anterior region, i.e., the lateral intraparietal sulcus (latIPS), the representations in the angular gyrus can serve as a mnemonic buffer that helps to make mnemonic decisions (Sestieri et al., 2017; Wagner et al., 2005). Our results support the idea that AG functions as a multimodal convergent zone to combine memory signals from multiple brain regions and to form a memory representation that is closely related to subjective experience and memory decisions.

Successful memory retrieval is often associated with greater hippocampal activity. Interestingly, we observed weaker hippocampal activation under the RetPrac condition than under the Restudy condition. Previous studies suggested that the hippocampus might be inhibited when subjects were required to suppress thoughts and memories (Benoit and Anderson, 2012; Hulbert et al., 2016). It is thus tempting to speculate that the hippocampus might be inhibited as a result of strong competition from the old memory. The deactivation itself, however, might not be sufficient to support the inhibition hypothesis. For example, although the MPFC also showed weaker activation under the RetPrac condition, MVPA analysis suggested that the MPFC represented task-related information that was related to subsequent memory performance (Figure 4A), suggesting that it played an important role in memory updating. Several major factors might account for the chance-level decoding of memory information in the hippocampus during retrieval. On the one hand, the classifier accuracy during training was lower in the hippocampus than in other regions. This might be due to the sparse nature of hippocampal representation (Quiroga et al., 2008), the low signal-to-noise ratio in this region, and/or the weak categorical representation. Consistently, previous studies also found hippocampal representation when using representational similarity analysis to probe item-level representations (e.g., Jonker et al., 2018; Tompary and Davachi, 2017; Wimber et al., 2015; Xiao et al., 2017). On the other hand, we trained the classifiers during perception and applied them to memory retrieval. Previous studies have shown that memory representation could be transformed from encoding to retrieval (Chen et al., 2017; Xiao et al., 2017; Xue, 2018), and that this transformation could have further reduced the classifier performance (Albers et al., 2013). Future studies should use an optimized design and item-level analysis to further elucidate the role of the hippocampus in memory updating.

Memory is a dynamic process that depends on memory reactivation. On the one hand, reactivation of a target memory will strengthen the activated memory (Tambini and Davachi, 2019; Xue, 2018). On the other hand, reactivation of unwanted memories (e.g., competitors) can facilitate the suppression of those unwanted memories (Lee et al., 2017). Beyond the mechanisms of strengthening and weakening memory representations to achieve memory updating, the current study adds to the growing literature reporting that reactivation (via retrieval practice) can also integrate and differentiate co-activated memories (Chan et al., 2009; Hulbert and Norman, 2015; Schlichting et al., 2015; Zeithamova et al., 2012) and hence can increase the flexibility of memory updating in different contexts. A better understanding of these diverse mechanisms can be leveraged to develop more effective behavioral interventions to modify maladaptive memories in some psychiatric conditions.

Materials and methods

Subjects

Nineteen healthy college students (seven males; mean age = 21.5 years, range = 18–25 years) participated in the fMRI study, and two additional samples of 46 (18 males; mean age = 22.6 years, range = 18–29 years) and 28 college students (five males; mean age = 21.0 years, range = 18–24 years) participated the two behavior experiments (Exps 2 and 3), respectively. The sample size of the fMRI study was comparable with that in several previous studies using similar paradigm (Kuhl et al., 2012; Wimber et al., 2015). All subjects had normal or corrected-to-normal vision, and no history of psychiatric or neurological diseases. Three additional subjects were recruited into the fMRI study but excluded from the final analysis due to either scanner malfunction or chance-level memory performance. Written consent was obtained from each subject after a full explanation of the study procedure. The study was approved by the Institutional Review Boards at Beijing Normal University and the Center for MRI Research at Peking University.

Materials

Stimuli consisted of 144 Chinese words (cues) and 288 pictures (associates). All words were two-character Chinese verbs. Pictures were color photographs from three categories, including famous faces (e.g., Jet Li), common objects (e.g., toothbrush), and famous scenes (e.g., the Forbidden City). Each word was associated with two pictures from different categories (e.g., A-B, word-face associations; A-C, word-object associations). Half of the words were assigned to the RetPrac condition and the other half to the Restudy condition. The assignment was counterbalanced between subjects.

Experimental procedure

The whole experiment lasted for three consecutive days. On Day 1, subjects were trained on the 144 word-picture associations (A-B learning). On Day 2, subjects were asked to update the old memory with the new A-C associations, half under the RetPrac condition and the other half under the Restudy condition. On Day 3, subjects performed a cued recall task to test A-C associations.

A-B learning

Request a detailed protocol

Before learning, subjects were asked to view all 288 pictures and their corresponding labels to make sure that each picture was correctly identified. Subjects were then instructed to learn 144 word-picture associations. Each A-B association was presented for 4 s after a 0.5 s fixation, and subjects were asked to memorize the associations for a later test. After the initial learning, subjects went through an overtraining phase. For each trial, the cue word and a black rectangle were presented for 2 s. Subjects were instructed to recall the details of the picture associated with the cue word presented on the screen. The rectangle turned to red for 1 s and subjects needed to indicate their responses by pressing the button corresponding to the picture's category (Face, Object, Scene, and Don't Know). The correct B picture was presented for 1 s as a feedback and subjects were instructed to use this feedback to further strengthen their memory. All A-B associations were tested and those correctly recalled were removed from further testing. The training ended when each of the associations was correctly recalled once. On average, each association was tested 2.3 times (SD = 0.26). On Day 2, subjects were again instructed to recall A-B associations. The procedure was identical to A-B overtraining on Day 1. On average, each association was tested 2.03 times (SD = 0.24). The purpose of the A-B over-training was to ensure that subjects had strong A-B memory.

A-C updating

Request a detailed protocol

Fifteen minutes after the A-B over-training on Day 2, subjects were then introduced to the new A-C associations outside the scanner and were asked to replace the A-B associations with the new A-C associations. The procedure was identical to A-B learning except that the pictures associated with word cues were changed to pictures from a different category. Subjects were asked to study and remember the new A-C associations, and all old A-B associations would be irrelevant to any future tasks. The specific instructions were as follows: ‘Now you will study some new picture-word associates. The words are all from the previously studied associations, but the paired pictures are all new. Your task is to remember these new associations. The old associations are irrelevant to any future task, so you do not need to remember them’. Subjects studied the new A-C memory once outside the scanner, and five minutes after initial A-C updating, they were put into the scanner to finish additional A-C updating. During scanning, half of the A-C pairs were assigned to the RetPrac condition and the other half to the Restudy condition. Under the RetPrac condition, a word cue and a black rectangle were presented for 2 s and subjects were asked to recall the picture associated with the presented (cue) word as vividly as possible. A red rectangle was then shown for 1 s and subjects were asked to judge the category of picture C by pressing a button corresponding to the visual category of the picture within this 1 s response window. The button-category correspondence was shuffled across subjects and only presented on the screen during the response stage (i.e., when the red rectangle was shown). This was to prevent subjects from planning motor response during the recall stage. After the response window, the correct picture was presented on the screen for 1 s. The next trial started after a jittered fixation (ranging from 0.5 s to 6.5 s, mean = 2 s). Under the Restudy condition, the procedure was similar to that for the RetPrac condition except that the correct picture was presented on the screen during the entire trial. Subjects were still required to judge the category of picture C when the red rectangle was shown. The RetPrac and Restudy trials were pseudo-randomly intermixed within a run. Each trial was repeated three times, with an inter-repetition-interval (IRI) ranging from 10 to 36 (mean = 21.93). The IRI was matched between the two conditions. Each run of A-C updating contained 12 unique RetPrac pairs and 12 unique Restudy pairs and lasted 7.2 min. Subjects finished 6 runs of the learning task.

A-C final test

Request a detailed protocol

On Day 3, subjects were instructed to perform the A-C final test while being scanned. A slow event-related design (12 s for each trial) was used to obtain a better estimation of the activation pattern for each item. After a 1 s fixation, a cue word and a black rectangle were presented on the screen for 4 s, and subjects were instructed to recall the picture C associated with the cue word as vividly as possible. The rectangle then turned red for 1 s and subjects were required to press a button to indicate the category of the retrieved picture. Similar to the RetPrac condition on Day 2, we also introduced the response label to prevent motor planning during the recall stage. Subjects were then asked to perform a Gabor orientation judgment task for 6 s. During this task, a Gabor pointing 45 degrees either to the left or the right was presented and subjects were asked to judge the orientation of the Gabor as fast as possible. A self-paced procedure was used to make the task engaging. The A-C final test consisted of 3 runs of 9.6 min, each containing 48 associations.

Follow-up behavior experiments (Exps 2 and 3)

Request a detailed protocol

Exp. 2 was designed to examine how new A-C memory practices would affect the old A-B memory. The procedure for Exp. 2 was nearly identical to that used for the main fMRI experiment, except that both A-C and A-B memories were tested and there was no perceptual orientation judgment task between retrieval trials. To examine the effect of test order, half of the subjects did the A-B memory test first and the other half did the A-C memory test first.

Exp. 3 was conducted to further examine whether RetPrac could effectively modify the details of memory representations. In this experiment, the procedure was identical to that used in Exp. 2, except that during the Day 3 test, we asked the subjects to write down the name of the associated picture (or any associated details if they could not recall the exact name) for each cue word, instead of choosing one of the three categories by pressing a button. They were first asked to write down the new C memory for each cue word A. After the A-C memory test, they were then asked to write the old B memory for each cue word. Only items for which subjects recalled the correct picture name or provided specific details were coded as correct items.

Functional localizer

Request a detailed protocol

After the A-C final test, subjects were instructed to complete four runs of a functional localizer task, which was used to train the pattern classifier (see the multi-voxel pattern analysis section below for details). A mini block design was used in the task. Each run consisted of nine mini blocks of pictures from one of the three categories (three mini blocks per category). Within each mini block, six new word-picture associations were presented sequentially for 24 s, and subjects were asked to memorize these new associations. This procedure was used to match the perceptual and cognitive structure of the main task. The words used in this task were different from those used in the A-B or A-C pairs. The order of the mini blocks was counterbalanced across runs and subjects. After each mini block, there was a 12 s Gabor orientation judgment task using the self-paced procedure as described above.

MRI acquisition

Request a detailed protocol

MRI data were collected on a 3.0T Siemens Prisma scanner (Siemens, Erlangen, Germany) with a 64-channel head coil at the Center for MRI Research at Peking University. A high-resolution simultaneous multi-slice EPI sequence was used for functional scanning (TR/TE/θ = 2000 ms/30 ms/81°; FOV = 220 × 220 mm; matrix = 116 × 116; slice thickness = 1.9 mm; GRAPPA factor = 2, multi-band acceleration factor = 2; phase partial Fourier = 7/8). Seventy-two contiguous axial slices parallel to the AC-PC line were obtained to cover the whole cerebrum and cerebellum. A high-resolution structural image using a 3D T1-weighted MPRAGE sequence was acquired to cover the whole brain (TR/TE/θ = 2530/2.98 ms/7°; FOV = 256 × 256 mm; matrix = 256 × 256; slice thickness = 1 mm; GRAPPA factor = 2).

Image preprocessing

Request a detailed protocol

Image preprocessing and statistical analysis were performed using FEAT (FMRI Expert Analysis Tool) version 6.00, part of the FSL (FMRIB software library, version 5.0.9, www.fmrib.ox.ac.uk/fsl, RRID:SCR_002823) (Smith et al., 2004). The first 10 volumes before the task were automatically discarded by the scanner to allow for T1 equilibrium. The remaining images were then realigned to correct for head movements. Volumes with frame-wise displacement (FD) greater than 0.9 mm were discarded from further analysis. Data were spatially smoothed using a 5 mm FWHM Gaussian kernel and filtered in the temporal domain using a nonlinear high-pass filter with a 100 s cutoff. The EPI images were first registered to the MPRAGE structural image using affine transformation from FLIRT. Registration from structural image to the standard space was carried out using Advanced Normalization Tools nonlinear registration SyN (Avants et al., 2011). The transformation parameters from the two steps were combined into a single transform matrix in order to avoid multiple interpolations during EPI to standard space transformation.

For MVPA, fMRI data were preprocessed in the same way as for the univariate analysis except for spatial normalization. All preprocessed EPI volumes were registered to the first volume of the first A-C learning run using Advanced Normalization Tools (ANTs, RRID:SCR_004757) (Avants et al., 2011) and all MVPA were conducted in subjects’ native EPI space.

Definition of ROIs

Request a detailed protocol

We focused our MVPA on the ventral temporal cortex (VTC), medial prefrontal cortex (MPFC), angular gyrus (AG), and hippocampus (HPC), which overlap with the core recollection network (Rugg and Vilberg, 2013) and have consistently shown neural reinstatement effect during memory retrieval (Kuhl and Chun, 2014; Wimber et al., 2015; Xiao et al., 2017). The VTC and HPC were defined using the Harvard-Oxford probabilistic atlas (threshold at 25% probability). The VTC consisted of temporal fusiform, parahippocampus, and inferior temporal gyrus. The MPFC was defined on the basis of Automated Anatomical Labeling v2 and contained medial superior frontal gyrus, anterior cingulate gyrus, medial orbital superior frontal gyrus, and rectus gyrus (Rolls et al., 2015; Figure 3A). The AG was defined on the basis of the Schaefer2018 atlas (400-parcels) (Schaefer et al., 2018) and contained all parietal nodes within the default mode network (DMN, Network 15–17). All ROIs contained brain regions from both hemispheres.

Univariate analysis

Request a detailed protocol

The general linear model within the FILM module of FSL was used to model the data. Two separate models were specified for the encoding phase. The first GLM aimed to compare the neural activation under the RetPrac and Restudy conditions during A-C updating. In this model, each repetition for each condition was separately modeled. The ‘don’t know’ trials and the ‘no response’ trials from both conditions and all repetitions were separately modeled as two regressors of no interest.

The second model aimed to further examine the neural activations associated with updating performance in the RetPrac condition. Three trial types were defined on the basis of the updating history: trials in which a given item was correctly recalled the first time the pair was shown (first correct, FC), trials in which a given item was correctly recalled the second or third time the pair was shown (later correct: LC), and incorrect trials (IC). In addition, items that were correctly recalled first time but that were followed by incorrect responses in later repetitions, which were very rare, were separately modeled as two regressors (correct, incorrect) of no interest. In addition, each of the three repetitions in the Restudy condition, the ‘don't know’ and ‘no response’ trials were also separately modeled.

All regressors were convolved with a double gamma hemodynamic response function (HRF). Their temporal derivatives were also included. Each run was modeled separately in the first-level analysis. Using a fixed-effects model, cross-run averages for a set of contrast images were created for each subject. Each contrast image for all subjects was entered into group analysis using the non-parametric permutation method for inference on the statistic map. This was conducted by the Randomise program in FSL with 10,000 permutations. The significance of the derived statistical map was determined by the threshold-free cluster enhancement (TFCE) algorithm with p<0.05 (whole brain FWE corrected) (Smith et al., 2004).

In order to estimate the neural activation for each item across the three repetitions, we modeled each item’s three repetitions as a separate regressor in one GLM. The parameters estimated from this procedure served as an averaged activation measure across the three repetitions of a given association. The analysis was conducted at native space and the statistical maps were transformed to the MNI space. The activations in the ROIs were then used for the linear mixed-effect model.

Multi-voxel pattern analysis

Request a detailed protocol

Neural reactivation was quantified by the classifier output from a pattern classifier trained on separate functional localizer data. The trained classifier was used on both updating and testing data to assess the level of reactivation of categorical information. All MVPA were conducted on each subject's native anatomical space. The data were normalized with the following steps based on a previous study (Kuhl et al., 2012). The data were first z-scored within a scan, and then across voxels within each volume. After all relevant volumes had been selected, data were z-scored again across all updating/final test/localizer volumes. Because each trial in the updating/final test/localizer phase corresponded to multiple fMRI volumes, the data were averaged across volumes within a trial before pattern analysis. For the testing stage, a weighted average was performed across 3–6 TRs after cue word onset (corresponding to 4–12 s after cue word onset, weights = [0.35, 0.35, 0.15, 0.15]). For updating and localizer trials, data were averaged across 3–4 TRs after cue word/picture onset. The choice of time window was based on a previous study (Kuhl et al., 2012) with consideration of hemodynamic lag.

Classification analysis was performed using L2-norm regularization logistic regression with liblinear solver from scikit-learn package (RRID:SCR_002577) in Python. Three binary classifiers (one vs the rest) were trained on functional localizer data within each pre-defined ROI (Fan et al., 2008). Individual classifier’s output probabilities were averaged together on the basis of category to form the final output probability. The penalty parameter C was set to 0.01 following a previous study (Kuhl et al., 2012). Each picture category had 72 samples (trials) for classifier training. To examine the performance of the classifier, we applied the leave-one-run-out cross-validation procedure. The average cross-validation accuracy of the classifiers was significantly above the chance level (ranging from 51.0% to 83.9%, all p-values <0.001; Figure 3—figure supplement 1A).

Then, the classifier trained on all functional localizer runs was applied to both updating and testing data. Classifier output probability for each category (face, object, or scene) was assigned to target, competitor, or others, based on the categories of pictures B and C. Specifically, as each cue word A was associated with two pictures, and the goal was to memorize A-C associations and to inhibit A-B associations, so the classifier output corresponding to categories of the C and B images was assigned as the target and competitor outputs, respectively, whereas the classifier output corresponding to the remaining category was assigned to the 'other' output. The classifier evidence was normalized within each trial by subtracting the 'other' evidence from the target and competitor evidence.

Nonmonotonic plasticity curve fitting

Request a detailed protocol

A Bayesian curve-fitting procedure was used to fit the nonlinear relationship between competitor reactivation strength during the updating phase and the relative competitor reactivation strength change during the final test. This analysis was performed using the p-cit-toolbox with its default parameter settings (Detre et al., 2013). In brief, the algorithm approximates the most probable plasticity curve from the given data (competitor reactivation during updating and the final test). First, it generates the curve by sampling possible curves (linear curves with three segments) randomly. Then, an importance weight is given by how well the curve fits the actual data. Finally, the mean curve is generated by determining the weighted average of all the sampled curves. In addition, the fitted curves were divided into two groups: theory-consistent (a U-shaped curve) or theory-inconsistent. The P(theory-consistent) is computed as the fraction of posterior probability of the theory-consistent curve samples. This value indicates how well the data support the nonmonotonic curve. Trials under the RetPrac and the Restudy conditions were modeled separately for each ROI. A permutation test (1000 permutations) was used to determine the significance of the posterior probability of theory consistency [P(theory consistent)].

Statistical analysis

Request a detailed protocol

All repeated measures ANOVAs were conducted in the afex package using Type III sums of squares in R 3.3.3 (RRID:SCR_001905). The error bars in the figures denote within-subject errors that account for heterogeneity of variance. FDR correction (α = 0.05) was performed to correct for post-hoc multiple comparisons. We reported the uncorrected p-values in the main text and indicated whether they were significant with FDR correction. Cohen’s d was calculated as a measure of effect size for main comparisons.

Mixed-effects modeling

Request a detailed protocol

Mixed-effects modeling is a powerful statistical tool that offers many advantages over conventional t test, regression, and ANOVA in sophisticated fMRI designs. The linear mixed-effects model used in this study was implemented with lme4 in R (RRID:SCR_015654), fitted using restricted maximum likelihood. To determine the effect of the predictor of interest, we used the likelihood ratio test to compare models with (full model) and without (null model) predictor of interest. For the caudate/LPFC activation and competitor suppression model, FC or FC+IC trials’ activation for a given ROI was used as the predictor, and the difference in competitor evidence between the current repetition and subsequent repetition during updating was used as the dependent variable. For the caudate/LPFC activation and long-term memory updating model, a given ROI’s activation for all three repetitions were averaged as the predictor, and the target minus competitor evidence during the final test was used as the dependent variable. For all models reported in the main text, the random intercept was included as a random effect.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
    When does feedback facilitate learning of words?
    1. H Pashler
    2. NJ Cepeda
    3. JT Wixted
    4. D Rohrer
    (2005)
    Journal of Experimental Psychology: Learning, Memory, and Cognition 31:3–8.
    https://doi.org/10.1037/0278-7393.31.1.3
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70

Decision letter

  1. Thorsten Kahnt
    Reviewing Editor; Northwestern University, United States
  2. Laura L Colgin
    Senior Editor; University of Texas at Austin, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This elegant human study examined the effects of retrieval practice on memory performance and neural responses. The results from a set of experiments show that retrieval practice strengthens new memories and reduces intrusions of old memories without suppressing the old memories. Interestingly, this was related to enhanced representations in medial prefrontal cortex, further supporting the idea that this region is important for memory integration and consolidation.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "Retrieval practice facilitates reactivation-dependent memory updating" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The reviewers have opted to remain anonymous.

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

As is evident in the individual critiques copied below, the reviewers agreed that your manuscript provides nice and robust evidence for memory updating by means of strengthening the target memories. However, the reviewers felt that given the lack of strong evidence for either competitor suppression or integration, this alone is not novel enough for eLife. In addition, the reviewers agreed that results from the 3-way classifier are difficult to interpret and that two separate 2-way classifiers or an RSA approach would be required to reveal what is reactivated. Finally, there were concerns that the results of experiment 2 were inconclusive.

In light of these concerns, we decided that we cannot move forward with this paper at eLife. However, if you are willing and able to address these issues in full (including new behavioral experiments), you would be able to resubmit a substantially revised version of the manuscript to eLife in the future (if you choose to do this, please refer to this manuscript number and rejection decision in your future submission). However, we understand that you may prefer to submit the manuscript, in its current form, to a more specialized journal.

Reviewer #1:

The authors test the hypothesis that active remembering is an effective means of memory updating when there is proactive interference from outdated information. Participants learn novel associations (A-B) on Day 1, then are asked to replace these memories with overlapping associations (A-C) on Day 2, either by active retrieval practice (RetPrac) or Restudy practice of the new A-C memories. On Day 3 all groups undergo a final memory test. Behaviorally, participants in the RetPrac condition recall the target more often, and experience fewer competitor intrusions, on the Day 3 final test. Multivariate analyses of brain activity patterns suggest that competitors are co-activated on Day 2 in a number of brain regions, but subsequently suppressed and become less accessible on Day 3, again specifically in the RetPrac items. Neural evidence for target and competitor reactivation thus largely tracks the behavioral effects observed.

The manuscript addresses an interesting and timely topic, and the Introduction and Discussion are well written and accessible. The most central behavioral and MVPA findings seem very robust. These findings are largely to be expected based on the existing literature (as appropriately cited in the manuscript), but I do not know of a similar set of coherent results that demonstrates these aspects of memory updating, and therefore I think the manuscript should in principle be considered for publication in eLife. I have a few major concerns, however, regarding the methods, results and conclusions, as outlined in the following.

1) A fundamental technical problem in my view is the use of a 3-class classifier. When such a classifier provides evidence for a target, it is automatically confounded by "anti-evidence" for the competitor and the other, neutral category. The same goes for the competitors: if a 3-class classifier provides strong evidence for competitors being reactivated, this could be due to strong competitor evidence or weak target (and neutral) evidence. The problem is exaggerated by using a normalization where the "neutral" evidence is subtracted from target and competitor evidence. It would be much preferable if the authors used 2-class classifiers with the neutral category as baseline for all analyses trying to separate target and competitor reactivation.

2) During retrieval practice, were subjects significantly more likely to experience an intrusion than an unrelated (other) error? It seems from Figure 1C that they were not, speaking against any strong A-B proactive interference effect, and more for a general learning of the target response over time. This result seems crucial for the central claims of the paper. Similarly, the authors should report in both experiments how the number of Day 2 intrusions on relates to Day 3 performance, and whether this relationship is stronger for intrusions than unrelated errors.

3) The authors make strong claims about competitor suppression throughout the manuscript where the actual evidence appears relatively weak. In the behavioural Experiment 2, if there was a suppression effect, how do the authors explain that they did not find reduced A-B memory for RetPrac than Restudy on Day 3? Is this due to the delay of the final test? Relatedly, the negative correlation between A-B and A-C memory is interpreted as evidence for suppression but can as easily be seen as an associative interference effect. Finally, in Experiment 2, the authors test A-C associations before A-B associations, making it likely that output interference on the A-B items will overshadow potentially more subtle effects of suppression in this experiment.

4) While the most central results seem sound, some of the analyses reported later in the Results section appear less well motivated and somewhat arbitrary in their approaches and selective in the reporting. To avoid the impression of p-hacking, the authors should streamline these sections, and use a more consistent and well-motivated rationale for all of the analyses. To give a few concrete examples,

a) The analysis reported in paragraph two of subsection “The LPFC contributed to memory updating under the RetPrac condition” and in Figure S8 is difficult to follow in terms of rationale. The results are also a bit confusing: none of the ROIs shows a pattern where strong competitor reactivation is related to strong LPFC activation, which is surprising given existing literature. Did the authors average competitor evidence across the 3 repetitions? In my mind, the straightforward prediction here is be that strong competitor reactivation on early repetitions, and weak on late repetitions, should be related to effective LPFC-mediated suppression. Therefore, for these analyses it makes more sense to use the slope of competitor reactivation across repetitions, not the average evidence for reactivation

b) The analysis relating caudate activity to competition resolution seems arbitrary for readers not reading the supplements. It is unclear why a different metric is being used here (compared to LPFC) to relate univariate and multivariate effects.

c) The analysis splitting trial into incorrect (IC), first correct (FC) and second/third correct (LC) is not well motivated and difficult to follow.

5) In some instances, interpretations are quite a large step removed from the actual results. For example, why does a difference in LPFC activity between IC/FC and LC (paragraph two subsection “The LPFC contributed to memory updating under the RetPrac condition”) indicate a role in competitor suppression? Such a pattern is more likely driven by target-related processes.

6) The tertile analyses relating competitor reactivation on Day 2 to competitor reactivation (see Figure 5) on Day 3 are not convincing statistically. The interaction with region as a factor seems irrelevant. In IFG and AG the conclusions are based on null results, and in VTC there is a strong positive relationship speaking against reactivation-dependent updating. The only thing left therefore is the U-shaped effect, and this appears like a posthoc observation.

7) For the Day 2 MVPA analyses, the authors never show evidence for target reactivation (or rather, representation given the visual exposure), this result should be included.

Reviewer #2:

In this timely and creative study, the authors investigate in a within-subjects fMRI design with 19 subjects the neural mechanisms and behavioral effects of retrieval practice vs. re-study of A-B, A-C memory updating. In a second within-subjects behavioral study with 28 subjects, the authors probe their postulation that retrieval practice vs. re-study leads to prioritization of C and suppression of B as related to A in memory updating. There are a few drawbacks to the theoretical framing, analytic technique, and conclusions drawn from the data that should be addressed.

1) Theoretical framing:

The authors adopt a research paradigm that is akin to inference/integration/generalization memory work. I was surprised to see this conceptualization of the paradigm downplayed in the Introduction and Discussion, particularly because A-B pairs were overtrained on Day 1. I think that the manuscript would benefit from more explicit characterization of why the adopted paradigm speaks to a suppression account of a B vs. an integration account of a B in memory updating, and how their data adjudicate between these two accounts (e.g., MPFC is often shown as a schema/generalization area).

2) Analytic technique:

a) I was not convinced that the retrieval practice vs. re-study Day 2 design and analyses provided clear support for the idea that B is suppressed and C is prioritized in retrieval practice vs. re-study during memory updating. In fact, the authors do not find evidence of this claim in their follow-up behavioral study designed to address the issue; they have to split data to show that in some subjects you see this pattern but in others you don't. Because the evidence stemming from this follow-up is not clear, the mechanism of retrieval practice in memory updating (suppression vs. integration) is not clear.

b) The critical results of the study are subsection “Retrieval practice enhanced target reactivation and competitor suppression” of the manuscript. I was confused as to why the authors have one classifier analysis for final test performance (Day 3) that doesn't account for retrieval practice/re-study (Day 2), and then another classifier analysis for retrieval practice/re-study (Day 2) that doesn't account for final test performance (Day 3). Why was behavioral performance but not the Day 2 manipulation used in the first analysis, and why was the Day 2 manipulation but not behavioral performance used in the second analysis? I think an analysis that uses both these assays would get at the authors' question most directly.

c) How many trials were used in the fMRI analyses per condition?

d) Greater motivation for the ROI selection parameters for MVPA should be given.

e) Is it possible that shifts in decision criteria would be observed for those trials with retrieval practice on Day 2 vs. those trials with re-study on Day 2? Can the authors rule out a decision criteria account of behavioral findings on Day 3 final test?

3) Conclusions drawn from the data:

a) As noted, I did not think that the critical finding about retrieval practice during memory updating was supported by the data; the authors' own follow-up study did not provide strong behavioral support for this claim. Thus, it is hard to reconcile the lack of a clear mechanism with the fMRI side of the manuscript.

b) The lack of neuroimaging effects for AG and Hipp was striking, particularly given recent work finding reactivation effects in parietal cortex (Jonker, Ranganath, 2019, PNAS; Lee, Kuhl, 2019, Cerebral Cortex). These papers should be cited in text and discussion should be given as to the discrepancies.

Reviewer #3:

In this paper, the authors tested the hypothesis that retrieval facilitates memory updating through stronger suppression of competing memory. Using a word-picture association paradigm with fMRI, the authors found that brain regions, including mPFC, showed greater reactivation of new memory at the final test, but greater reactivation of old memory during practice in the test condition, compare with the restudy condition. In addition, LPFC showed stronger activity during retrieval practice than during restudy. Overall, this is a very interesting study addressing how memory retrieval interact with proactive memory interference. The paper is easy to read and the design and method are clearly described. However, some of our enthusiasm was dampened by significant questions and concerns regarding the novelty and central arguments in the paper. These areas of significant concern are detailed below:

1) This is a very interesting and fruitful set of results, but the framing does not seem to fit with the work that is reported. The logic of the paper is that neural changes in response to repeated retrievals reflect memory updating and suppression of old memory, but a priori this did not seem like an obvious prediction. It is possible that any differences (both in behavioral performance and neural activity) between the test and the restudy condition simply reflect superior learning during testing. In other words, without a no interference or weaker interference condition, it is hard to conclude that brain activity during retrieval practice supports memory updating/suppression.

2) From the Introduction, it is unclear how this study is different from prior studies examining neural mechanisms underlying retrieval induced forgetting using a similar paradigm, e.g. Wimber et al., 2015.

3) The authors capture many previous findings about activity in mPFC and LPFC, however, the hypothesis they form following this literature is too vague to be clearly falsifiable or to adjudicate between potentially contradictory findings. For instance, rather than reflecting suppression, stronger reactivation of competing memory during retrieval practice has also been associated with retrieval induced facilitation (e.g. Jonker et al., 2018). Moreover, the authors examine brain activity in 4 ROIs but did not present a rationale for including AG, IFG, VTC.

4) It is laudable the authors report a follow-up behavioral experiment examining the relationship between memory for new memory vs. old memory. However, the negative correlation could be driven by output interference, especially given that subjects recalled A-C pairs first. It is likely that better A-C recall produced stronger output interference to A-B pairs.

5) It is unclear in the analyses of retrieval practice reactivation, whether the authors included all test trials or only correct test trials. If all trials were included, all the results reported in these sections would not be surprising because subjects wrongly recalled a large portion old targets during retrieval practice. This also explains why restudy trials showed larger reactivation of new memory because new information was always directly presented.

6) A number of ad hoc hypotheses are given for results that are inconsistent with the prediction. For example, the authors claim that the null results for correct or incorrect trials are due to a small number of trials. However, there should be at lease 70 correct trials in the test condition and 50 correct trials in the restudy condition. Moreover, decreased update in the restudy condition is explained by the repetition suppression effect; null results of correlations between competitor reactivation and behavioral performance are explained by the claim that behavioral measure is not sensitive; chance level classification performance in the hippocampus is thought to reflect "technical limitations". These ad hoc explanations of conflicting results, which lack justification, suggest strong confirmation bias.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Retrieval practice facilitates memory updating by enhancing and differentiating medial prefrontal cortex representations" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Laura Colgin as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

Summary:

All reviewers agreed that the authors have done a very thorough job addressing the initial comments, and that, as a result, the paper is much improved. Reviewers also identified a few remaining issues that should be addressed with new analysis and re-writing.

Revisions:

1) The authors analyzed the behavioral data with a "test order (A-C first, A-B first) by memory test type (Recall A-C, Recall A-B) by update method (RetPrac, Restudy) by response type (Target, Competitor, Other) four-way mixed design ANOVA". Either the choice or the description of the analysis is incorrect. Given that the performance of A-C test and A-B test, absolute ratios of Target, Competitor and Others are not comparable, response type and memory test type should not be used as independent variables (factors). Rather, separate ANOVAs should be conducted, with appropriate multiple comparison correction, to examine target ratio and competitor ratio in each test.

2) Given the above reason, the authors cannot rule out the potential order confounding by just showing there was no significant effect of test order or interaction effect with test order in this four-way ANOVA. Rather, the authors need to directly compare A-B performance between when A-B was tested first vs. when A-B was tested after A-C. However, even if the correct analysis was done, non-significant difference between different orders cannot rule out the confounding of order either. A stronger test would be only examining A-B performance with subjects start with A-B test, and vice versa for A-C test.

https://doi.org/10.7554/eLife.57023.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

Reviewer #1:

[…]

1) A fundamental technical problem in my view is the use of a 3-class classifier. When such a classifier provides evidence for a target, it is automatically confounded by "anti-evidence" for the competitor and the other, neutral category. The same goes for the competitors: if a 3-class classifier provides strong evidence for competitors being reactivated, this could be due to strong competitor evidence or weak target (and neutral) evidence. The problem is exaggerated by using a normalization where the "neutral" evidence is subtracted from target and competitor evidence. It would be much preferable if the authors used 2-class classifiers with the neutral category as baseline for all analyses trying to separate target and competitor reactivation.

We thank this reviewer for the comment. In this study, instead of just using a target (T) and a competitor (C), the third neutral (other, O) condition was included exactly to deal with the anti-evidence issue. As a result, we could simultaneously detect strong target and competitor evidence.

As correctly pointed by this reviewer, the solver “liblinear” used a “one-vs-rest” fashion to do multi-class classification (i.e., T vs. O+C, C vs. T+O and O vs. T+C), which is a common practice in machine learning field et al.(Fan , 2008). Other studies have used three binary classifiers (i.e., T vs. C, T vs. O, and O vs. C) and combined the results from the classifiers. For example, the evidence for T would be obtained by averaging the evidence from classifier T vs. C and T vs. O. The reviewer here suggested that we did not use the evidence from the classifier T vs. C, since this would introduce the anti-evidence issue.

We very much appreciate this reviewer’s point. Nevertheless, it should be noted that with the classifier, we were to assess the relative strength of evidence from each class, but not so much about the absolute value. In addition, both 3-class and binary classifier approaches could maximize the number of training samples for each classifier, which potentially leads to more stable results.

In light of the reviewer’s comment, we additionally implemented the one-against-one classifiers as a comparison. Specifically, we implemented 3 binary classifiers, each using data from two categories. The target and competitor evidence was obtained from the T vs. O and the C vs. O classifier, respectively, without using the results from the T vs. C classifier. The Other evidence was the average from the two classifiers. The results from this analysis were overall very similar to our results in the manuscript (see Author response image 1).

Since the one-against-the-rest method is a common practice in the machine learning field and the three-classifier approach might be robust and consistent with the literature, we decided to use the results from three-class classification. In in the Materials and methods section of the revised manuscript, we have added some rationale for the use of this approach.

Author response image 1

2) During retrieval practice, were subjects significantly more likely to experience an intrusion than an unrelated (other) error? It seems fromFigure1C that they were not, speaking against any strong A-B proactive interference effect, and more for a general learning of the target response over time. This result seems crucial for the central claims of the paper. Similarly, the authors should report in both experiments how the number of Day 2 intrusions on relates to Day 3 performance, and whether this relationship is stronger for intrusions than unrelated errors.

This is a very good point. In all of our three experiments, we did observe slightly more competitor intrusions than unrelated errors (Experiment 1: 0.232 vs. 0.209, t(18) = 1.93, p = .069; Experiment 2: 0.233 vs. 0.207, t(45) = 2.72, p = .009; Experiment 3: 0.221 vs. 0.200, t(27) = 2.33, p = .027). We think this could reflect the fact that subjects were told explicitly that the associations had been changed, and new associations should be explored and learnt. That was why we also found many other responses. Still, there were stronger neural competitor reactivations than other evidence during memory updating. Finally, the competitor response was much stronger during the final test on Day 3. All these pieces of evidence suggest strong A-B proactive interference.

Following this reviewer suggestion, we did additional analyses to examine how the number of Day 2 intrusions was related to Day 3 performance. As shown in Figure 2—figure supplement 1, we found that pairs with more competitor intrusions during memory updating had worse A-C memory, stronger A-B intrusion, and comparable other response during final A-C memory test. This pattern was consistent in both Experiment 1 and Experiment 2. In contrast, pairs with more other response during memory updating had worse A-C memory, stronger other response, but comparable A-B intrusion, during the final A-C memory test. These results suggested that the strength of proactive interference affected the new A-C learning. Importantly, although the number of competitors and other responses had a similar effect on A-C memory, they had a much stronger effect on the accessibility of themselves during the final test. That is, there were much more A-B intrusions than other responses during the final test, even when the number of response during updating was matched, suggesting strong A-B interference.

Interestingly, we found that the number of competitor and other responses during updating had no effect during the A-B memory test, suggesting that the consolidated A-B memory was not weakened.

Together, these results suggest that there was very strong A-B proactive interference both during updating and the final test. It had a strong effect on the learning of new memory, but not on the strength of A-B memory itself, which is consistent with the differentiation hypothesis. We have added these results in the manuscript, which reads:

“Could the lack of suppression be due to the weak A-B intrusion? Subjects were well trained in the A-B association. […] All these results suggested that the lack of A-B memory suppression was due to its strong representations.”

3) The authors make strong claims about competitor suppression throughout the manuscript where the actual evidence appears relatively weak. In the behavioural Experiment 2, if there was a suppression effect, how do the authors explain that they did not find reduced A-B memory for RetPrac than Restudy on Day 3? Is this due to the delay of the final test? Relatedly, the negative correlation between A-B and A-C memory is interpreted as evidence for suppression but can as easily be seen as an associative interference effect. Finally, in Experiment 2, the authors test A-C associations before A-B associations, making it likely that output interference on the A-B items will overshadow potentially more subtle effects of suppression in this experiment.

All three reviewers raised this important issue as to whether our data reflected competitor suppression. In this revision, we did an additional behavioral experiment to examine the A-B memory in Day 3. Two groups of subjects (23 in each group) were included, with half tested A-B memory first and the other half tested A-C memory first. Our results replicated the RetPrac effect on A-C memory, but again revealed no effect on A-B memory. This was the case for both groups, suggesting that the order of the tests did not have an effect.

The results from the new experiment, together with those from the original Experiment 2, support the differentiation mechanism. That is, although A-C memory was strengthened by retrieval practice, but there were not more C intrusions in the A-B memory test. On the other hand, although A-B memory was comparable between the two conditions, there were fewer B intrusions in the A-C memory test. Both sets of results suggest that A-B and A-C memories were differentiated by retrieval practice. We noticed that the number of C memory intrusions was overall low, perhaps due to the weak A-C memory. We thus did a further analysis focusing on strong A-C memory trials (correct trials in the A-C memory test), which would produce stronger intrusion. Consistently, we found that the correct trials in the A-C memory test (reflecting strong A-C memory) showed overall more C memory intrusions during the A-B test than did the incorrect trials (t(45) = 3.38, p = .001). More interestingly, the intrusion rate was numerically smaller in the RetPrac than the Restudy condition (t(45) = -1.59, p = .12), which is consistent with the differentiation hypothesis.

To further test the differentiation hypothesis, we did a joint analysis to examine the subjects’ answers in both A-B and A-C memory tests given the same word cues. The differentiation hypothesis would predict more correct trials in both tests (i.e., C response in the A-C memory test and B response in the A-B memory test) under the RetPrac condition, i.e., subjects maintained stronger and nonoverlapping representations of both A-B and A-C memories. Meanwhile, we would predict fewer trials on which subjects responded with old B memory in both tests (due to differentiation), and also fewer (due to differentiation) or comparable (due to strengthening of C and differentiation) trials on which subjects responded with new C memory in both memory tests. Our data supported all three predictions. We found that subjects made more correct responses in both tests under the RetPrac than the Restudy condition (24.0% vs. 19.7%, t(45) = 3.81, p <.001), but showed less B memory (21.0% vs. 23.2%, t(45) = -2.61, p = .012) and comparable C memory (11.7% vs. 11.5%, t(45) = 0.18, p = .86) in both memory tests.

We also conducted new analyses on the fMRI reactivation data, by simultaneously examining the factors of updating method and behavioral performance. These results suggest memory integration and differentiation in the MPFC. In particular, retrieval practice was able to boost the target reactivation for both correct and incorrect trials. For competitor evidence, however, we found strong and comparable competitor reactivation under the RetPrac condition for both correct and incorrect responses, suggesting that RetPrac integrated and differentiated competitor and target evidence in the MPFC.

These new behavioral results, together with the neural representational data, suggest that reactivation during retrieval practice could enhance target representation, and meanwhile facilitate memory differentiation and reduce intrusion, which is very consistent with the recent hypothesis that emphasizes the role of reactivation in differentiating neural representations and (Ritvo, Turk-Browne Norman, 2019).

The reactivation-dependent memory differentiation is also quite consistent with previous behavioral observations and and(Keresztes Racsmány, 2013; Storm, Bjork Bjork, 2008). The lack of suppression might be due to several reasons. First, the old memory might be too strong to be suppressed. Second, there was a long delay between retrieval practice and the final memory test, as previous studies have found that strong retrieval induces suppression after a short delay, but much weaker effect after a 24-hour delay and(MacLeod Macrae, 2001).

In this revision, we have thoroughly rewritten the Introduction, Results and Discussion.

4) While the most central results seem sound, some of the analyses reported later in the Results section appear less well motivated and somewhat arbitrary in their approaches and selective in the reporting. To avoid the impression of p-hacking, the authors should streamline these sections, and use a more consistent and well-motivated rationale for all of the analyses. To give a few concrete examples,

a) The analysis reported in paragraph two of subsection “The LPFC contributed to memory updating under the RetPrac condition” and in Figure S8 is difficult to follow in terms of rationale. The results are also a bit confusing: none of the ROIs shows a pattern where strong competitor reactivation is related to strong LPFC activation, which is surprising given existing literature. Did the authors average competitor evidence across the 3 repetitions? In my mind, the straightforward prediction here is be that strong competitor reactivation on early repetitions, and weak on late repetitions, should be related to effective LPFC-mediated suppression. Therefore, for these analyses it makes more sense to use the slope of competitor reactivation across repetitions, not the average evidence for reactivation

b) The analysis relating caudate activity to competition resolution seems arbitrary for readers not reading the supplements. It is unclear why a different metric is being used here (compared to LPFC) to relate univariate and multivariate effects.

c) The analysis splitting trial into incorrect (IC), first correct (FC) and second/third correct (LC) is not well motivated and difficult to follow.

We apologize for not making this clear in our original submission. The second/third correct (LC) indicates that items were correctly answered two or three times, indicating a relatively more fluent response. Our prediction was that the incorrect and first correct trials would involve more conflict resolution than would the LC trials. Furthermore, the first correct trials would involve more reward response than would incorrect and later correct trials. This way, we could dissociate the regions associated with reinforcement learning and reward processing (such as the caudate) and the regions involved in conflict processing (such as the prefrontal cortex). As suggested by et al.Ritvo (2019), both mechanisms could lead to representational changes and memory differentiation. We have clarified the motivation of this analysis in the revised manuscript, which reads:

“To further probe the function of these regions in overcoming intrusions from outdated competitors, we examined how these prefrontal and striatal activations varied with updating performance during retrieval practice. […] Both mechanisms could contribute to representational change and memory differentiation.”

For comments 4a and b, we agree with this reviewer that it would make more sense to link the caudate and LPFC activation with subsequent change of competitor evidence. In particular, we examined the caudate and LPFC activation in the current trial and the evidence change from the current repetition to next repetition. Due to caudate response to reward, we focused on the first correct trial (FC) in the caudate, which revealed strong caudate activation was associated with greater competitor evidence reduction in the next repetition in the VTC (χ2(1) = 5.86, p = .015) but not in MPFC and AG (all ps >.15), suggesting reinforcement learning. However, no such effect was found for LPFC during the incorrect and first correct trials, suggesting the LPFC did not temporally suppress the competitor evidence.

We further examined whether the LPFC and caudate activations during updating were associated with long-term memory updating on Day 3. This analysis revealed that trials with greater LPFC activity during updating under the RetPrac condition ultimately showed superior memory updating (i.e., target – competitor evidence) during the final test on Day 3, in the MPFC (χ2(1) = 4.62, p = .032), but no effect was found in the caudate.

Together, these results suggest different roles of LPFC and caudate in modifying the memory representations during retrieval practice. We have updated the results, and added more discussions in this revision.

5) In some instances, interpretations are quite a large step removed from the actual results. For example, why does a difference in LPFC activity between IC/FC and LC (paragraph two subsection “The LPFC contributed to memory updating under the RetPrac condition”) indicate a role in competitor suppression? Such a pattern is more likely driven by target-related processes.

As explained above, IC/FC trials were slower and required more conflict resolution, whereas LC trials were more fluent. We would expect similar responses for FC and LC trials in regions associated with target processing. We have clarified the logic of our interpretation in the revised manuscript, as stated above.

6) The tertile analyses relating competitor reactivation on Day 2 to competitor reactivation (see Figure 5) on Day 3 are not convincing statistically. The interaction with region as a factor seems irrelevant. In IFG and AG the conclusions are based on null results, and in VTC there is a strong positive relationship speaking against reactivation-dependent updating. The only thing left therefore is the U-shaped effect, and this appears like a posthoc observation.

We agree with this reviewer that the tertile analysis might not provide very strong evidence. In this revision, we used the more established method (Bayesian curve-fitting implemented by p-cit-toolbox) et al.(Detre , 2013) to formally test the relationship between reactivation during updating and memory evidence in the final test. We found a pattern that was consistent with the nonmonotonic plasticity hypothesis in the visual cortex, but not in other regions. We have reported the results from this new analysis.

7) For the Day 2 MVPA analyses, the authors never show evidence for target reactivation (or rather, representation given the visual exposure), this result should be included.

Thanks for this suggestion. In the revised submission, we reported the target evidence reactivation and related it to subsequent memory performance, separately for restudy and RetPrac conditions. We found that the target evidence during updating in the MPFC and VTC was separately associated with subsequent memory performance in the RetPrac and Restudy conditions, respectively, suggesting that RetPrac might involve the MPFC mechanisms for quick system consolidation. We did not directly compare the target evidence between the two conditions due to the differences in their task structure.

Reviewer #2:

In this timely and creative study, the authors investigate in a within-subjects fMRI design with 19 subjects the neural mechanisms and behavioral effects of retrieval practice vs. re-study of A-B, A-C memory updating. In a second within-subjects behavioral study with 28 subjects, the authors probe their postulation that retrieval practice vs. re-study leads to prioritization of C and suppression of B as related to A in memory updating. There are a few drawbacks to the theoretical framing, analytic technique, and conclusions drawn from the data that should be addressed.

1) Theoretical framing:

The authors adopt a research paradigm that is akin to inference/integration/generalization memory work. I was surprised to see this conceptualization of the paradigm downplayed in the Introduction and Discussion, particularly because A-B pairs were overtrained on Day 1. I think that the manuscript would benefit from more explicit characterization of why the adopted paradigm speaks to a suppression account of a B vs. an integration account of a B in memory updating, and how their data adjudicate between these two accounts (e.g., MPFC is often shown as a schema/generalization area).

We thank the reviewer for this very thoughtful comment. We completely agree with this reviewer that our paradigm is akin to the broad area of memory changes, including integration, inhibition, and/or differentiation. The Introduction has now been significantly reframed. In particular, we have now introduced alternative hypotheses in the Introduction and provided results to adjudicate between these accounts. As stated in our response to the first reviewer’s comment, our new behavioral results and fMRI data provide support to the integration and differentiation hypothesis.

2) Analytic technique:

a) I was not convinced that the retrieval practice vs. re-study Day 2 design and analyses provided clear support for the idea that B is suppressed and C is prioritized in retrieval practice vs. re-study during memory updating. In fact, the authors do not find evidence of this claim in their follow-up behavioral study designed to address the issue; they have to split data to show that in some subjects you see this pattern but in others you don't. Because the evidence stemming from this follow-up is not clear, the mechanism of retrieval practice in memory updating (suppression vs. integration) is not clear.

Thanks for this question. As stated in our response to the first reviewer’s comments, we have done an additional experiment (new Experiment 2) to examine the suppression vs. integration vs. differentiation hypotheses. We found the A-B memory was very strong and was not suppressed by retrieval practice, despite the consistent retrieval practice effect on A-C memory. These new results, together with the data from Experiment 3 (Experiment 2 in original submission), clearly support the differentiation hypothesis. That is, retrieval practice might help to form non-overlapping representations that also reduce competition. Additional joint analysis on how subjects maintained both A-B and A-C memories further supported the differentiation hypothesis, as we found that compared to Restudy, RetPrac had more trials where subjects correctly remembered both A-B and A-C memories, but fewer trials where subjects answered the old B memory on both tests. Finally, the new analysis of fMRI data suggested that the MPFC showed strong and comparable competitor reactivation for both correct and incorrect responses under the RetPrac condition, further supporting the integration and differentiation account.

b) The critical results of the study are subsection “Retrieval practice enhanced target reactivation and competitor suppression” of the manuscript. I was confused as to why the authors have one classifier analysis for final test performance (Day 3) that doesn't account for retrieval practice/re-study (Day 2), and then another classifier analysis for retrieval practice/re-study (Day 2) that doesn't account for final test performance (Day 3). Why was behavioral performance but not the Day 2 manipulation used in the first analysis, and why was the Day 2 manipulation but not behavioral performance used in the second analysis? I think an analysis that uses both these assays would get at the authors' question most directly.

Following this reviewer’s insightful suggestion, we have conducted new three-way ANOVA to simultaneously examine the effects of behavioral performance and updating method on the target and competitor evidence. The new analysis revealed interesting results that supported the retrieval-induced integration and differentiation account in the MPFC. Meanwhile, we also found that RetPrac reduced the competitor evidence in VTC. Finally, the AG tracked the behavioral performance and RetPrac did not have an additional effect on AG representation. These results thus showed clearly dissociated effects of RetPrac in different brain regions. We have reported the results from the new analysis in the revised manuscript.

c) How many trials were used in the fMRI analyses per condition?

There were 72 trials in each condition.

d) Greater motivation for the ROI selection parameters for MVPA should be given.

We have added a more detailed description of the motivation for the ROI selection, which reads:

“We focused our MVPA on the ventral temporal cortex (VTC), medial prefrontal cortex (MPFC), angular gyrus (AG), and hippocampus (HPC), which overlap with the core recollection network (Rugg and Vilberg, 2013) and have consistently shown neural reinstatement effect during memory retrieval (Kuhl and Chun, 2014; Wimber et al., 2015; Xiao et al., 2017).”

e) Is it possible that shifts in decision criteria would be observed for those trials with retrieval practice on Day 2 vs. those trials with re-study on Day 2? Can the authors rule out a decision criteria account of behavioral findings on Day 3 final test?

Thanks for this comment. In the final test on Day 3, all trials from different conditions were mixed together, thus we believe a shift of decision criteria would be unlikely. We found that the response of the other category was matched between the two conditions.

3) Conclusions drawn from the data:

a) As noted, I did not think that the critical finding about retrieval practice during memory updating was supported by the data; the authors' own follow-up study did not provide strong behavioral support for this claim. Thus, it is hard to reconcile the lack of a clear mechanism with the fMRI side of the manuscript.

As stated above, with an additional behavioral experiment and a new analysis of the fMRI data, our results support the memory differentiation account. Memory differentiation helps to reduce the intrusion in the A-C memory test.

b) The lack of neuroimaging effects for AG and Hipp was striking, particularly given recent work finding reactivation effects in parietal cortex (Jonker, Ranganath, 2019, PNAS; Lee, Kuhl, 2019, Cerebral Cortex). These papers should be cited in text and discussion should be given as to the discrepancies.

Thanks for this comment and the references. In our new analysis that simultaneously examined the effects of updating method and performance, we found that AG was tracking the behavioral performance and updating method has no additional effect. This pattern was quite consistent with previous studies showing that AG reactivation was aligned with behavioral performance. For the hippocampus, we did not find strong representation of category information in this region, which was also consistent with existing literature. We have added more discussion on these findings, by referring to the literatures suggested by this reviewer, which reads:

“Several major reasons might account for the chance-level decoding of memory information in the hippocampus during retrieval. […] Future studies should use an optimized design and item-level analysis to further elucidate the hippocampus’s role in memory updating”

Reviewer #3:

[…]

1) This is a very interesting and fruitful set of results, but the framing does not seem to fit with the work that is reported. The logic of the paper is that neural changes in response to repeated retrievals reflect memory updating and suppression of old memory, but a priori this did not seem like an obvious prediction. It is possible that any differences (both in behavioral performance and neural activity) between the test and the restudy condition simply reflect superior learning during testing. In other words, without a no interference or weaker interference condition, it is hard to conclude that brain activity during retrieval practice supports memory updating/suppression.

We thank this reviewer for raising this important question. The testing effect is complicated, which could reflect the strong A-C learning, A-B inhibition, and/or memory differentiation. We completely agree with this reviewer that testing would lead to superior learning of new memory, which was found in our behavioral and fMRI data. In fact, we believe that if no interference or only weak interference is introduced, one major benefit of retrieval practice would be the superior learning of new memories. This effect has been consistently shown in the literature.

Building upon this literature, the current study aimed to examine whether and how retrieval practice could facilitate memory updating when there was strong interference. In particular, we examined whether there were additional mechanisms that allow retrieval practice to facilitate memory updating, such as competitor suppression and/or memory integration and differentiation. Our results showed that retrieval practice could facilitate target representation in the MPFC, and reduce competitor representation in the VTC. Furthermore, we found that retrieval practice could integrate and differentiate target and competitor evidence in the MPFC. These results suggest that retrieval practice engages multiple, region-specific mechanisms to facilitate memory updating. In this revision, we have clearly emphasized these points throughout the manuscript.

2) From the Introduction, it is unclear how this study is different from prior studies examining neural mechanisms underlying retrieval induced forgetting using a similar paradigm, e.g. Wimber et al., 2015.

As stated above, we have now significantly reframed the Introduction to emphasize the multiple mechanisms of retrieval practice on memory updating.

3) The authors capture many previous findings about activity in mPFC and LPFC, however, the hypothesis they form following this literature is too vague to be clearly falsifiable or to adjudicate between potentially contradictory findings. For instance, rather than reflecting suppression, stronger reactivation of competing memory during retrieval practice has also been associated with retrieval induce facilitation (e.g. Jonker et al., 2018). Moreover, the authors examine brain activity in 4 ROIs but did not present a rationale for including AG, IFG, VTC.

As correctly pointed out by this reviewer, existing literature found that the reactivated memory could be integrated, strengthened or differentiated, depending on mnemonic goals and the characteristics of the reactivated memory and(Tambini Davachi, 2019). We have made this idea clearer in the Introduction, and added more discussion on these different results. We have also added more rationales for including AG and VTC, and made clearer the prediction regarding the roles of MPFC and LPFC in retrieval practice and memory updating.

4) It is laudable the authors report a follow-up behavioral experiment examining the relationship between memory for new memory vs. old memory. However, the negative correlation could be driven by output interference, especially given that subjects recalled A-C pairs first. It is likely that better A-C recall produced stronger output interference to A-B pairs.

This is a very good point. As stated in our response the other two reviewers’ comments, we have done an additional behavioral experiment to examine the effect of retrieval practice on A-B memory, balancing the order of the memory tests (recall A-B or A-C first). Our results support the differentiation hypothesis.

5) It is unclear in the analyses of retrieval practice reactivation, whether the authors included all test trials or only correct test trials. If all trials were included, all the results reported in these sections would not be surprising because subjects wrongly recalled a large portion old targets during retrieval practice. This also explains why restudy trials showed larger reactivation of new memory because new information was always directly presented.

Thanks for this comment. In the analysis of competitor reactivation, we included all trials. The reviewer was correct that subjects recalled a large number of old targets, which would have contributed to the reactivation of competitor evidence. The rationale for including all trials was to show that subjects made many mistakes and reactivated the old target memory under the retrieval practice condition. Whereas under the restudy condition, the competitor evidence was not strongly reactivated because the new information was directly presented.

It should be noted that the strong competitor reactivation was not simply a result of behavioral intrusion of old memory. For example, we found the competitor evidence was stronger as compared to other evidence (i.e., baseline), although subjects made comparable competitor and other responses.

Following this reviewer’s comment, we did an additional analysis to include only correct trials, which still revealed stronger competitor reactivation under the RetPrac condition than under the restudy condition (all ps <.017). We have included the new result in the supplementary materials (Figure 4—figure supplement 1).

6) A number of ad hoc hypotheses are given for results that are inconsistent with the prediction. For example, the authors claim that the null results for correct or incorrect trials are due to a small number of trials. However, there should be at least 70 correct trials in the test condition and 50 correct trials in the restudy condition. Moreover, decreased update in the restudy condition is explained by the repetition suppression effect; null results of correlations between competitor reactivation and behavioral performance are explained by the claim that behavioral measure is not sensitive; chance level classification performance in the hippocampus is thought to reflect "technical limitations". These ad hoc explanations of conflicting results, which lack justification, suggest strong confirmation bias.

Thanks for these comments.

“For example, the authors claim that the null results for correct or incorrect trials are due to a small number of trials. However, there should be at least 70 correct trials in the test condition and 50 correct trials in the restudy condition.” We have done a new analysis to simultaneously examine the effects of updating method and behavioral performance on target and competitor evidence. We have updated the results in this revision.

“Moreover, decreased update in the restudy condition is explained by the repetition suppression effect”: We have removed the results from this analysis due to differences in task structure between RetPrac and Restudy.

“null results of correlations between competitor reactivation and behavioral performance are explained by the claim that behavioral measure is not sensitive”: This result has been removed from this revision.

“chance level classification performance in the hippocampus is thought to reflect "technical limitations": For the hippocampus, we found that during the localizer task, the hippocampus showed above-chance classification, but the accuracy was lower than other regions. Furthermore, we found that the hippocampal classifier could not predict subjects’ response during the final test. This could be due to several reasons. We have added more discussion on this, which reads:

“Several major reasons might account for the chance-level decoding of memory information in the hippocampus during retrieval. […]Future studies should use an optimized design and item-level analysis to further elucidate the hippocampus’s role in memory updating”

[Editors’ note: what follows is the authors’ response to the second round of review.]

Revisions:

1) The authors analyzed the behavioral data with a "test order (A-C first, A-B first) by memory test type (Recall A-C, Recall A-B) by update method (RetPrac, Restudy) by response type (Target, Competitor, Other) four-way mixed design ANOVA". Either the choice or the description of the analysis is incorrect. Given that the performance of A-C test and A-B test, absolute ratios of Target, Competitor and Others are not comparable, response type and memory test type should not be used as independent variables (factors). Rather, separate ANOVAs should be conducted, with appropriate multiple comparison correction, to examine target ratio and competitor ratio in each test.

2) Given the above reason, the authors cannot rule out the potential order confounding by just showing there was no significant effect of test order or interaction effect with test order in this four-way ANOVA. Rather, the authors need to directly compare A-B performance between when A-B was tested first vs when A-B was tested after A-C. However, even the correct analysis was done, non-significant difference between different orders cannot rule out the confounding of order either. A stronger test would be only examining A-B performance with subjects start with A-B test, and vice versa for A-C test.

We thank the reviewers for pointing out this important issue. Following the reviewers’ suggestion, we have done six test order (A-C first, A-B first) by update method (RetPrac, Restudy) two-way ANOVAs to examine the test order effect, separately for A-C memory test and A-B memory test, and each response type (Target, Competitor, Other). The results reveal no significant main effect of test order (ps >.43, except a trend of significant effect in Other responses during A-B test, p = .07, FDR corrected). Importantly, there was no update method by test order interaction in any of the model (all ps >.54, FDR corrected). These results indicated none or neglectable effect of test order in our data.

We agree with the reviewers that a stronger test of the difference between A-B and A-C memory performance should only use half the data, since we would predict that test order could affect the A-B and A-C memory performance. More specifically, A-B memory would be worse when A-C memory was tested first than when A-B memory was tested first, and vice versa for A-C memory. In this case, even though there was no statistically significant test order effect, it might still confound the data.

Nevertheless, the current study was interested in the effect of updating method. For several reasons, we think it might be better to use all data in our analysis. First, we should not expect strong test order by updating method interactions, and our data confirmed this, and none of the p value is even close to significant level. Second, we found very consistent results in both subjects group (A-C test first and A-B test first). For example, we found that both groups showed effect of updating method on A-C memory during A-C test (Author response image 2). However, it would be tedious to separately report these results when there was not even a trend of interaction. Third, more subjects could generate more reliable results and enable us to detect more subtle effect in our additional joint analysis. We have clarified this issue in this revision, which reads:

“For both A-C memory and A-B test and each response type (Target, Competitor, Other), test order (A-C first, A-B first) by update method (RetPrac, Restudy) two-way ANOVA revealed neither significant main effect of test order (ps >.43, except a trend of significant effect in Other responses during A-B test, p = .07, FDR corrected), nor test order by update method interaction (ps >.54, FDR corrected) (Supplementary file 1). Given our focus on the effect of updating method and the lack of updating method by test order interaction, we thus combined data from both groups in the following analyses to increase the statistical power.”

Author response image 2
https://doi.org/10.7554/eLife.57023.sa2

Article and author information

Author details

  1. Zhifang Ye

    State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute of Brain Research, Beijing Normal University, Beijing, China
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0489-2619
  2. Liang Shi

    State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute of Brain Research, Beijing Normal University, Beijing, China
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  3. Anqi Li

    State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute of Brain Research, Beijing Normal University, Beijing, China
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  4. Chuansheng Chen

    Department of Psychological Science, University of California, Irvine, Irvine, United States
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  5. Gui Xue

    State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute of Brain Research, Beijing Normal University, Beijing, China
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    gxue@bnu.edu.cn
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7891-8151

Funding

National Science Foundation of China (31730038)

  • Gui Xue

The NSFC and the Israel Science Foundation (ISF) joint project (31861143040)

  • Gui Xue

National Science Foundation of China (61621136008)

  • Gui Xue

German Research Foundation (TRR-169)

  • Gui Xue

Guangdong Pearl River Talents Plan Innovative and Entrepreneurial Team grant (2016ZT06S220)

  • Gui Xue

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Michael Anderson for constructive comments on the manuscript. This work was sponsored by the National Science Foundation of China (31730038), the NSFC and the Israel Science Foundation (ISF) joint project (31861143040), the NSFC and the German Research Foundation (DFG) joint project NSFC 61621136008/DFG TRR-169, and the Guangdong Pearl River Talents Plan Innovative and Entrepreneurial Team grant #2016ZT06S220.

Ethics

Human subjects: Written consent was obtained from each subject after a full explanation of the study procedure. The study was approved by the Institutional Review Boards at Beijing Normal University and the Center for MRI Research at Peking University (#20150401).

Senior Editor

  1. Laura L Colgin, University of Texas at Austin, United States

Reviewing Editor

  1. Thorsten Kahnt, Northwestern University, United States

Publication history

  1. Received: March 18, 2020
  2. Accepted: May 18, 2020
  3. Accepted Manuscript published: May 18, 2020 (version 1)
  4. Version of Record published: June 4, 2020 (version 2)

Copyright

© 2020, Ye et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,236
    Page views
  • 242
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Neven Borak, Johannes Kohl
    Insight

    Unexplained repeated pregnancy loss is associated with an altered perception of male odors and differences in brain regions that process smells.

    1. Genetics and Genomics
    2. Neuroscience
    Marina Kovalenko et al.
    Research Article

    Somatic expansion of the Huntington's disease (HD) CAG repeat drives the rate of a pathogenic process ultimately resulting in neuronal cell death. Although mechanisms of toxicity are poorly delineated, transcriptional dysregulation is a likely contributor. To identify modifiers that act at the level of CAG expansion and/or downstream pathogenic processes, we tested the impact of genetic knockout, in HttQ111 mice, of Hdac2 or Hdac3 in medium-spiny striatal neurons that exhibit extensive CAG expansion and exquisite disease vulnerability. Both knockouts moderately attenuated CAG expansion, with Hdac2 knockout decreasing nuclear huntingtin pathology. Hdac2 knockout resulted in a substantial transcriptional response that included modification of transcriptional dysregulation elicited by the HttQ111 allele, likely via mechanisms unrelated to instability suppression. Our results identify novel modifiers of different aspects of HD pathogenesis in MSNs and highlight a complex relationship between the expanded Htt allele and Hdac2 with implications for targeting transcriptional dysregulation in HD.