Formation of Task Representations and Replay in Mouse Medial Prefrontal Cortex

  1. Bernstein Center Freiburg, University of Freiburg, Freiburg im Breisgau, Germany
  2. Institute for Physiology 1, Medical Faculty, University of Freiburg, Freiburg im Breisgau, Germany
  3. BrainLinks-BrainTools, University of Freiburg, Freiburg im Breisgau, Germany
  4. Faculty of Biology, University of Freiburg, Freiburg im Breisgau, Germany
  5. Systemic Neurophysiology, Center for Integrative Physiology and Molecular Medicine, Medical Faculty, Saarland University, Saarbrücken, Germany

Peer review process

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Adrien Peyrache
    McGill University, Montreal, Canada
  • Senior Editor
    Michael Frank
    Brown University, Providence, United States of America

Reviewer #1 (Public review):

Summary:

The authors use longitudinal in vivo 1-photon calcium recordings in mouse prefrontal cortex throughout the learning of an odor-guided spatial memory task, with the goal of examining the development of task-related prefrontal representations over the course of learning in different task stages and during sleep sessions. They report replication of their previous results, Muysers et al. 2025, that task and representations in prefrontal cortex arise de novo after learning, comprising of goal selective cells that fire selectively for left or right goals during the spatial working memory component of the task, and generalized task phase selective cells that fire equivalently in the same place irrespective of goal, together comprising task-informative cells. The number of task-informative cells increases over learning, and covariance structure changes resulting in increased sequential activation in the learned condition, but with limited functional relevance to task representation. Finally, the authors report that similar to hippocampal trajectory replay, prefrontal sequences are replayed at reward locations.

Strengths:

The major strength of the study is the use of longitudinal recordings, allowing identification of task-related activity in the prefrontal cortex that emerges de novo after learning, and identification of sub-second sequences at reward wells.

Comments on revisions:

The authors have added additional analyses and clarifications that increase the strength of evidence, especially quantification of functional classes of cells using longitudinal calcium recordings in prefrontal cortex during learning of an odor cue guided task, and quantification of prefrontal sequences.

There are a few remaining issues:

(1) The manuscript quantifies changes over learning in prefrontal goal-selective cells (equated to "splitter" place cells in hippocampus) and task-phase selective cells (similar to non-splitter place cells that are not goal modulated). A subset of these task cells remain stable throughout learning, and are equated to schema representations in the study. In the memory literature, schemas are generally described as relational networks of abstract and generalized information, that enable adapting to novel context and inference by enabling retrieval of related information from previous contexts. The task-phase selective cells that stay stable throughout learning clearly will have a role in organizing task representations, but to this reviewer, denoting them as forming a schema is an unwarranted interpretation. By this definition, hippocampal non-splitter place cells that emerge early in learning and are stable over days would also form a schema. Therefore, schema notation cannot just be based on stability, it requires further evidence of abstraction such as cross-condition generalization.

(2) The quantification of prefrontal replay sequences during reward is useful, but it is still unconvincing that the distinction between existence of sequences in the odor sampling phase and reward phase is not trivially expected based on prior literature. This is odor guided task, not a spatial exploration task with no cues, and it is very well-established (as noted in citations in the previous review) that during odor sampling, animals' will sniff in an exploratory stage, resulting in strong beta and respiratory rhythms in prefrontal cortex. Not having LFP recordings in this task does not preclude considering prior literature that clearly shows that odor sampling results in a unique internal state network state, when animals are retrieving the odor-associated goal, vastly different from a reward sampling phase. The authors argue that this is not trivial since they see some sequences during sampling, although they also argue the opposite in response to a question from Reviewer 2 about shuffling controls for sequences, that 'not' seeing these sequences in the sampling phase is an internal control. The bigger issue here is equating these sequences during sampling to replay/ preplay or reactivation sequences similar to the reward phase, since the prefrontal network dynamics are engaged in odor-driven retrieval of associated goals during sampling, as has been shown in previous studies.

Reviewer #2 (Public review):

Summary:

The first part of the manuscript quantifies the proportion of goal-arm specific and task-phase specific cells during the learning and learned conditions and similar to their previously published Muysers et al., 2025 paper find that the task-phase coding cells (Muysers et al. call them path equivalent cells) increase in the learned condition. However, compared to the Muysers et al. 2025 paper, this work quantifies the proportion of cells that change coding type across learning and learned conditions. The second part of the paper reports firing sequences using a sequence similarity clustering-based method that the group developed previously and applied to hippocampal data in the past.

Strengths:

Identifying sequences by a clustering method in which sequence patterns of individual events are compared is an interesting idea.

Weaknesses:

Further controls are needed to validate the results.

Comments on revisions:

Further changes are needed to improve the description of the methods and the discussion needs to be extended to contrast the results with previously published results of the group. Some control figures would also be needed to quantitatively demonstrate, across the entire dataset, that sequence detection did not identify random events as sequences, even if the detection method was designed to exclude such sequences. For example, showing that sequences are not detected in randomised data with the current method would better convince readers of the method's validity.

Although differences in the classification scheme relative to the Muysers et al. (2025) paper have been explained, the similarity (perhaps equivalence of results) is not sufficiently acknowledged - e.g., at the beginning of the discussion.

Although the control of spurious sequences may have been built into the method, this is not sufficiently explained in the method. It is also not clear what kind of randomization was performed. Importantly, I do not see a quantification that shows that the detected sequences are significantly better than the sequence quality measure on randomized events. Or that randomized data do not lead to sequence clusters. Also, it is still not clear how the number of clusters was established. I understand that the previously published paper may have covered these questions; these should be explained here as well. Also, the sequence similarity description is still confusing in the method; please correct this sentence "Only the l neurons active in both sequences of a pair were taken into account. "

Reviewer #3 (Public review):

In the study the authors performed longitudinal 1P calcium imaging of mouse mPFC across 8 weeks during learning of an olfactory-guided task, including habituation, training, and sleep periods. The authors' goal was to determine how the mPFC representation of the task changed and what aspects of activity emerged between the learning and the learned conditions of the task. The task had 3 arms. Odor was sampled at the end of the middle arm (named the "Sample" period). The animal then needed to run to one of the two other arms (R or L) based on the odor. The whole period until they reached the end of one of the choice arms was the "Outward" period. The time at the reward end was the "Reward" period. They noted several changes from the learning condition to the learned condition:

(1) They classified cells in a few ways. First each cell was classified as SI (spatially informative) if it had significantly more spatial information than shuffled activity, and ~50% of cells ended up being SI cells. Then among the SI cells they classified a cell as a TC (task cell) if it had statistically similar activity maps for R versus L arms, and a GC (goal arm cell) otherwise. Note that there are 4 kinds of these cells: outer arm TCs and GCs and middle arm TCs and GCs (with middle arm GCs essentially being like "splitter cells" since they are not similarly active in the middle arm for R versus L trials). There was an increase in TCs from the learning to the learned condition sessions. They also note the sources of these TCs (some came from GCs, others from non-SI cells).

(2) They analyze activity sequences across cells. They extracted 500 ms duration bursts (defined as periods of activity > 0.5 standard deviations over what I assume is the mean, which is a permissive threshold encompassing a significant fraction of the activity in non-sleep, non-habituation periods). They first noted that the resulting "Burst rates were significantly larger during behavioral epochs than during sleep and during periods of habituation to the arena" and "Moreover, burst rates during correct trials were significantly lower than during error trials". For the sequence analysis they only considered bursts consisting of at least 5 active cells. A cell's activity within the burst was set to the center of mass calcium activity. Then they took all the sequences from all learned and learning sessions together and hierarchically clustered them based on the Spearman's rank correlation between the order of activity in each pair of sequences (among the cells active in both). The iterative hierarchical clustering process produces groups (clusters) of sequences such that there are multiple repeats of sequences within a cluster. Different sequences are expressed across all the longitudinally recorded sessions. They noted "large differences of sequence activation between learning and learned condition, both in the spatial patterns (example animal in Fig. 4D) and the distribution of the sequences (Fig. 4D,E). Rastermap plots (Fig. 4D) also reveal little similarity of sequence expression between task and habituation or sleep condition." They also note the difference in the sequences between learning and learned condition was larger than the different between correct and error trials within each condition. They conclude that during task learning new representations are established, as measured by the burst sequence content. They do additional analyses of the sequence clusters by assessing the spatial informativeness (SI) of each sequence cluster. Over learning they find an increase in clusters that are spatially informative (clusters that tend to occur in specific locations). Finally, they analyzed the SI clusters in a similar manner as SI cells and classified them as task phase selective sequences (TSs) and goal arm selective sequences (GSs) and did some further analysis. However, they themselves conclude that the frequency of TSs and GSs is limited because most sequence clusters were non-SI. In the discussion they say "In addition to GSs and TSs, we found that most of the recurring sequences are not related to behavior (not SI)".

(3) As an alternative to analyzing individual cells and sequences of individual cells, they then look for trajectory replay using Bayesian population decoding of location during bursts. They analyze TS bursts, GS bursts, and non-SI bursts. They say "we found correlations of decoded position with time bin (within a 500 ms burst) strongly exceeding chance level only during outward and reward phase, for both GSs and TSs (Fig 5H)." Fig5H shows distributions indicating statistically significant bias in the forward direction (using correlations of decoded location versus time bin across 10 bins of 50 ms each within each 500ms burst). They find that the Outward trajectories appear to reflect the actual trajectory during running itself, so are likely not replay. But the sequences at the Reward are replay as they do not reflect the current location. Furthermore, replay at the Reward is in the forward direction (unlike the reverse replay at Reward seen in the hippocampus) and this replay is only seen in the learned and not the learning condition. At the same time, they find that replay is not seen during odor Sampling, from which they conclude there is no evidence of replay used for planning. Instead they say the replay at the Reward could possibly be for evaluation during the Reward phase, though this would only be for the learned condition. They conclude "Together with our finding of strong changes in sequence expression after learning (Fig 4E) these findings suggest that a representation of task develops during learning".

This study provides valuable new information about the evolution of mPFC activity during the learning of a odor-based 2AFC T-maze-like task. They show convincing evidence of changes in single cell tuning, population sequences, and replay events. They also find novel forward replay at the Reward, and find that this is present only after the animal learned the task. In the discussion the authors note "the present study, to our knowledge, identified for the first time fast recurring neural sequence activity from 1-p calcium data, based on correlation analysis". Overall, they find a substantial amount of change among the features they analyzed and according to their methods, though they note a small amount of activity was preserved through learning.

One comment is that the threshold for extracting burst events (0.5 standard deviations, presumably above the mean) seems lower than what one usually sees as a threshold for population burst detection, and the authors show (in Supplementary Fig 1) that this means bursts cover ~20-40% of the data. However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

Author response:

The following is the authors’ response to the original reviews.

Reviewer #1 (Public review):

The study mainly replicates the authors' previously reported results about generalized and trajectory-specific coding of task structure by prefrontal neurons, and stable and changing representations over learning (Muysers et al., 2024, PMID: 38459033; Muysers et al., 2025, PMID: 40057953), although there are useful results about changes in goal-selective and taskphase selective cells over learning. There are basic shortcomings in the scientific premise of two new points in this manuscript, that of the contribution of pre-existing spatial representations, and the role of replay sequences in the prefrontal cortex, both of which cannot be adequately tested in this experimental design.

We agree with the reviewer that we have not made sufficiently clear which aspects of our paper add to previous publications. We have now better explained methodological differences.

Also, we agree that our very general statements on pre-existing spatial representations in the introduction and abstract in the previous manuscript were not properly followed up in the Results section. In the revision, the respective statements are clarified, and we also added analysis of a further control condition (see response to A), which shows that particularly a subset of task cells maintains there firing fields from an early habituation period, arguing that, while the population representation of the task largely develops during learning, there exists a scaffold of small but significant amount of cells that could be interpreted as a schema.

We also further clarified our view on replay sequences in the prefrontal cortex (see response to B). Particularly, we are grateful to the reviewer for the suggestion to also include other reactivation analysis which led to new results presented in new Figure 3.

[A] The study denotes neurons that show precise spatial firing equivalently irrespective of goal, as generalized task representations, and uses this as a means to testing whether pre-existing spatial representations can contribute to task coding and learning. …. [I]n order to establish generalization for abstract task rules or cognitive flexibility, as motivated in the manuscript, there is a need to show that these neurons "generalize" not just to firing in the same position during learning of a given task… For an adequate test of pre-existing spatial structure, either a comparison task, as in the examples above, is needed, or at least a control task in which animals can run similar trajectories without the task contingencies. An unambiguous conclusion about pre-existing spatial structure is not possible without these controls.

We thank the reviewer for this suggestion. We may, however, note that the previous manuscript did not make strong claims about pre-existing structures in the Results or Discussion. Also Schemas were only taken up as a discussion point. We nevertheless agree with the reviewer that assessment of the spatial prestructure requests further analysis. To address their point, we analyzed neuronal activity during the habituation phase before the start of task training, when the animals freely explored the same maze without any task contingency (animals explored mostly in the arms of the maze). We compared the place fields of neurons during this habituation period with their task-related activity. Consistent with the small overlap of firing rate maps between learning and learned phase, also this analysis revealed a small number of cells with significant correlations (up to 20% for task cells; a significant fraction according to a binomial test). The results are shown as a new Figure supplement to Figure 2.

[B] The scientific premise for the test of replay sequences is motivated using hippocampal activity in internally guided spatial working memory rule tasks [...] and applied here to prefrontal activity in a sensory-cue guided spatial memory task [...]. There are several issues with the conclusion in the manuscript that prefrontal replay sequences are involved in evaluating behavioral outcomes rather than planning future outcomes.

We agree with the reviewer that preplay in Hippocampus and mPFC are distinct. We further emphasized this distinctiveness in the respective paragraph in the discussion (see response to B1).

[B. 1] First, odor sampling in odor-guided memory tasks is an active sensory processing state that leads to beta and other oscillations in olfactory regions, hippocampus, prefrontal cortex, and many other downstream networks [...]. This is an active sensory state, not conducive to internal replay sequences, unlike references used in this manuscript to motivate this analysis, which are hippocampal spatial memory studies with internally guided rather than sensory-cue guided decisions, where internal replay is seen during immobility at reward wells. These two states cannot be compared with the expectation of finding similar replay sequences, so it is trivially expected that internal replay sequences will not be seen during odor sampling.

We agree with the reviewer that the sampling phase cannot be compared with the “preplay” state in the hippocampus. We have rewritten the manuscript in the results and discussion sections to clarify. We, however, disagree, that the absence of replay sequences in the mPFC 1P calcium data is trivial, since we actually do see many sequences during sampling (Fig 4E, Fig 4 suppl 2 A). These sequences are just not related to task activity and may e.g. reflect activity related to sensing, but do not contain information about goal arm.

[B. 2] Second, sequence replay is not the only signature of reactivation. Many studies have quantified prefrontal replay using template matching and reactivation strength metrics that do not involve sequences [...]. Third, previous studies have explicitly shown that prefrontal activity can be decoded during odor sampling to predict future spatial choices - this uses sensory-driven ensemble activity in prefrontal cortex and not replay, as odor sampling leads to sensory driven processing and recall rather than a reactivation state [..].

We thank the reviewer for the suggestion to also perform reactivation analysis (Peyrache et al., 2009, 2010). The results are summarized in the new Figure 3. And show that indeed reactivation is stronger during the sampling phase and it is goal arm specific, arguing that sequence analysis extracts information (partly) complementary to rate covariance based analysis.

We hope to have convinced the reviewer that, together, the complementary results of reactivation an sequence analysis, as well as the ability to follow these measures over an extended period of time, gives unique insights far beyond the previous publications of these data sets. A consistent analysis of population representation, however, required some reanalyses of previous findings, since we only could focus on a limited number of animals and cells, for which tracking was possible over such a long period of time.

Reviewer #2 (Public review):

Further controls are needed to validate the results.

We thank the reviewer for their generally supportive statements. The revised manuscript contains a number of controls in several new figure supplements.

Reviewer #3 (Public review):

[They] conclude that the frequency of TSs and GSs is limited (I believe because most sequence clusters were non-SI - the authors can verify this and write it in the text?). In the discussion, they say, "In addition to GSs and TSs, we found that most of the recurring sequences are not related to behavior".

The reviewer is correct most clusters were not SI (Fig 5 A). We have added this information in the MS.

[...] They conclude "Together with our finding of strong changes in sequence expression after learning (Figure 3E) these findings suggest that a representation of task develops during learning, however, it does not reflect previous network structure." I am not sure what is meant here by the second part of this sentence (after "however ..."). Is it the idea that the replay represents network structure, and the lack of Reward replay in the learning condition means that the network structure must have been changed to get to the learned condition? Please clarify.

The reviewer is correct in their assertion. We rewrote the sentence to clarify: “Together with our finding of strong changes in sequence expression after learning (now Fig 4E) these findings suggest that a representation of task develops during learning, however, it does not reflect sequence structure during learning and habituation”.

(1) There are some statements that are not clear, such as at the end of the introduction, where the authors write, "Both findings suggest that the mPFC task code is locally established during learning." What is the reasoning behind the "locally established" statement? Couldn't the learning be happening in other areas and be inherited by the mPFC? Or are the authors assuming that newly appearing sequences within a 500-ms burst period must be due to local plasticity?

We agree that the wording “local” can be misleading, we rephrased the corresponding sentences.

(2) The threshold for extracting burst events (0.5 standard deviations, presumably above the mean, but the authors should verify this) seems lower than what one usually sees as a threshold for population burst detection. What fraction of all data is covered by 500 ms periods around each such burst? However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

Since we work with a slow calcium signal, we cannot use as strict thresholds as usually employed using electrophysiology. In addition, our sequence detection approach adds a further level of strictness such that we only consider bursts with recurring sequence structure. In response to this reviewer’s question, we have added quantification of the fraction of all data covered by 500 ms periods in Figure Supplement 1, panels D and E. Indeed we include a large fraction (20 to 40%) (except sleep and habituation), which is consistent with our interpretation that during the outward phase sequences mainly reflect task field firing.

Reviewer #1(Recommendations for the Authors):

It is possible that 1-photon recordings do not have the temporal resolution and information about oscillatory activity to enable these kinds of analyses. Therefore, an unambiguous conclusion about the existence and role of prefrontal reactivation is not possible in this experimental and analytical design.

We indeed cannot extract information encoded in LFP oscillations from the calcium signal, we now mention the relation between LFP oscillations and olfaction-guided behaviors in the discussion (including the suggested references). However, our finding that sequence and covariance-based analysis yield partly complementary results argues that it indeed allows conclusion about the existence and role of prefrontal reactivation.

Reviewer #2 (Recommendations for the authors):

The results of the Muysers et al. (2025) paper need to be discussed in detail and explain why the cell categorization is different, three groups of spatial cells vs two groups here. Also, explain in what aspect the major findings in this work go beyond what was shown in Figure 4 in that paper.

The main goal of this paper was to explore sequence/replay like activity, which is not at all captured in the Muysers et al. 2025 paper. Because of this focus on sequences, we excluded the inward runs (from reward to sampling point) for better interpretability and thus ended up with only two types of cells. Muysers et al. included backward runs and could thereby also assess whether the place field remains in the outward and inward runs. We added this clarification in the Results section.

Regarding the reviewer’s question regarding figure 4: Our task cells would largely overlap with the “path-equivalent cells” from Muysers et al. 2025 (albeit not taking into account inward runs). In this sense their finding that the share of path-equivalent cells increases with learning is consistent with our report of increasing fraction of task cells in Figure 2 C. Our Figure 2 adds that some task cells develop from previous goal cells with fields at the same location (generalizing). Moreover, we use spatial information as a criterion to identify TC and GCs, showing that a large fraction of cells actually is and remains spatially unselective. In Muysers et al. 2025 a statistical criterion was not applied on spatial selectivity but peak height, with fewer neurons failing this test. Moreover, we were analyzing only those cells trackable over the whole period. Despite all these methodological differences, the result of increasing the number of task/path-equivalent cells over learning was consistent. The main reason for recategorization of the cells in the present manuscript was to be able to meaningfully link them to sequence activity (Fig. 5E, F).

It is not clear from the description how the cell type transitions were quantified. Was the last learning day compared to the first learned day? Given that, particularly during learning, there are changes across days in the spatial representations according to Figure 2 of Muysers et al. (2025), this is the meaningful way to make the comparisons. Nevertheless, it is also not clear whether the daily variations within learning and learned conditions differ from the transition day, so without comparing these three conditions, it is hard to make a firm conclusion from examining only changes in the transition days.

The analysis of cell type transitions was performed by pooling all learning sessions and comparing them with all learned sessions, without taking into account the chronological order of sessions within each category. This approach allowed us to identify broad changes associated with learning state. Figure supplement 1.C shows the session intervals per animal. We argue that the large interval between learning and learned session justifies this analysis approach.

Identifying sequences by a clustering method in which sequence patterns of individual events are compared is an interesting idea. Nevertheless, there is a danger, as with any clustering method, that data without clustering tendency could be artificially subdivided into clusters.

In Figure 4.C, we show three example sequence cluster templates (colored) obtained via hierarchical clustering, along with representative member sequences (black) sorted by cluster membership. In response to this reviewer’s comment, we now included a complete clustering result for one animal, including all sequence clusters and their member sequences. It is provided in Figure 4 supplement 1. This comprehensive visualization serves as an additional control, demonstrating that the clustering approach identifies consistent sequence patterns across the dataset.

Furthermore, it is possible that some cells at the edge of the cluster boundary may show a more similar sequence tendency to events detected at the overlapping border region of another cluster. Was this controlled for? It would be essential to show that events clustered together all show higher similarity to each other than to events in any other clusters.

By default, the clusters are rejected if in the adjacency matrix of the graph constructed by significant motif similarity, the number of within cluster edges is smaller than the number of without cluster edges. In subsequent cluster merges the separation is increased since only those clusters are merged that show significant similarity. As a visual control, we monitor plots as shown in Figure 4 supplement 1. Sequence templates (color dot clouds) are supposed to show no serial correlation when ordered according to any one template other than its own. We have added more clarification to the Methods including a new Figure 6 illustrating the Method.

From the description, it was not clear how the sequence similarity was established between pairs of individual events. The only way I can see it is that the sequence (orders at which cells fire) is established with one event, and the rank order correlation is calculated with this order for the other event. However, in this case, distance A-B is not the same as distance B-A. Not sure how this is handled with the clustering procedure. Secondly, how the number of clusters is established in the hierarchical clustering procedure needs to be explained. Furthermore, from the method description, it is not clear how GS and TS sequences are identified. Can an event be classified as both a TS and GS event at the same time?

The reviewer is correct in their assertion that we compute all pairwise rank order correlations (that are then subject to a statistical test detailed in the original method publication Chenani et al., 2019). By nature of the rankorder correlation the coefficients A-B and B-A are symmetric. This is now more carefully explained in the Methods.

Several control analyses are needed to show that the sequences detected reflect not random patterns but those that repeat at a higher than random chance. This requires, at the first step, to establish to what degree sequences are consistent within a cluster and to what degree individual events show a sequential firing tendency. And at the next stage, these need to be compared with randomised events in which spike timing of cells is jittered or spike identity is randomised, and show that these events result in poorer sequence tendency and less consistent clusters.

The controls requested by the reviewer are already implemented in our Method (see original publication of the Method in Chenani et al., 2019). This is now made clearer in the Methods section.

Firing rate and place-related firing of cells alone could generate sequences even if cells otherwise fire independently from each other. In a similar manner, it was shown before that reactivation of waking cell assemblies could be seen in sleep, in which case firing rate differences across cells belonging to the same assembly could also generate sequential patterns without temporal coordination. Appropriate shuffling procedures need to be performed to exclude such scenarios.

We are aware that the sequential firing in our data (particularly during the outward phase when the animal is performing the task), is most likely resulting from the correlations between rate maps and the animals trajectory. During the reward, this is less likely. An intrinsic control is that during sampling we do not see these sequences. Given the nature of the calcium signal, a direct connection to firing rate is not possible. However, we argue that using our center of mass-approach of the calcium trace effectively normalizes for firing rate effects. Shuffling dF/F amplitudes (as a proxy for firing rates) would thus have no effect on the center of mass sequences. We, however, consider this to be an important methodological difference between sequence analysis with spikes and Calcium signals and have added a related comment to the Methods part.

The past literature describing mPFC reactivation, replay, and sequences needs to be described, and findings of this work need to be appropriately acknowledged, and those findings compared with this work (starting with this work from 2007 PMID: 18006749). In the current reading, a novice reader of this field might conclude that this is the first work that identified relay and sequences in the mPFC.

We would like to apologize that the manuscript evokes this impression. This was not our intention, in fact we have given strong emphasis on the Kaefer et al. paper in the Discussion. We have now added early references on PFC replay based on electro-physiological recordings in the Discussion section.

The analysis of Figure 4H is not sufficient to show that only forward sequences occur. If 50% are forward and 50% are reverse, the median is zero. Some of the presented histograms look like Gaussian distributions with SD=1, which would show that those events were not real sequences. It should be tested whether the distributions are significantly different from the expected Gaussian.

We agree with the reviewer that we did not explicitly test for significance of individual replays, but only tested for the rightward shift of the median. We have now added these significance tests/p values in Figure 5) and indeed could show that none of the significant backward replays exceed the fraction expected by chance, whereas forward replay significantly exceeds chance levels only in the cases where the median had a significant right ward shift (except for non-SI clusters). We would like to thank the reviewer for this suggestion, which we think makes the analysis stronger.

Overall, the clarity of the text could be improved, and further examples of reactivated sequences should be shown, and the methods should be illustrated in the figures. At the current version, I fear that even readers in this field would give up on reading the current text given an insufficient level of clarity.

We have included more examples of reactivated sequence (Suppl2 to Figure 5) and made extensive additions to the methods part. Particularly, we followed the reviewer’s request for method illustration (new Figure 6).

Reviewer #3 (Recommendations for the authors):

My main comment here is for the authors to increase the clarity of the manuscript.[...] For instance, it was difficult to follow what was being done to determine TSs and GSs.

We have made extensive additions to the Methods section including a new Figure 6 depicting the workflow of the sequence analysis in a schematic manner.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation