Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorAdrien PeyracheMcGill University, Montreal, Canada
- Senior EditorMichael FrankBrown University, Providence, United States of America
Reviewer #1 (Public review):
Summary:
The authors use longitudinal in vivo 1-photon calcium recordings in mouse prefrontal cortex throughout the learning of an odor-guided spatial memory task, with the goal of examining the development of task-related prefrontal representations over the course of learning in different task stages and during sleep sessions. They report replication of their previous results, Muysers et al. 2025, that task and representations in prefrontal cortex arise de novo after learning, comprising of goal selective cells that fire selectively for left or right goals during the spatial working memory component of the task, and generalized task phase selective cells that fire equivalently in the same place irrespective of goal, together comprising task-informative cells. The number of task-informative cells increases over learning, and covariance structure changes resulting in increased sequential activation in the learned condition, but with limited functional relevance to task representation. Finally, the authors report that similar to hippocampal trajectory replay, prefrontal sequences are replayed at reward locations.
Strengths:
The major strength of the study is the use of longitudinal recordings, allowing identification of task-related activity in the prefrontal cortex that emerges de novo after learning, and identification of sub-second sequences at reward wells.
Weaknesses:
(1) The study mainly replicates the authors' previously reported results about generalized and trajectory-specific coding of task structure by prefrontal neurons, and stable and changing representations over learning (Muysers et al., 2024, PMID: 38459033; Muysers et al., 2025, PMID: 40057953), although there are useful results about changes in goal-selective and task-phase selective cells over learning. There are basic shortcomings in the scientific premise of two new points in this manuscript, that of the contribution of pre-existing spatial representations, and the role of replay sequences in the prefrontal cortex, both of which cannot be adequately tested in this experimental design.
(2) The study denotes neurons that show precise spatial firing equivalently irrespective of goal, as generalized task representations, and uses this as a means to testing whether pre-existing spatial representations can contribute to task coding and learning. A previous study using this data has already shown that these neurons preferentially emerge during task learning (Muysers et al., 2025, PMID: 40057953). Furthermore, in order to establish generalization for abstract task rules or cognitively flexibility, as motivated in the manuscript, there is a need to show that these neurons "generalize" not just to firing in the same position during learning of a given task, but that they can generalize across similar tasks, e.g., different mazes with similar rules, different rules with similar mazes, new odor-space associations, etc. For an adequate test of pre-existing spatial structure, either a comparison task, as in the examples above, is needed, or at least a control task in which animals can run similar trajectories without the task contingencies. An unambiguous conclusion about pre-existing spatial structure is not possible without these controls.
(3) The scientific premise for the test of replay sequences is motivated using hippocampal activity in internally guided spatial working memory rule tasks (Fernandez-Ruiz et al., 2019, PMID: 31197012; Kay et al., PMID: 32004462; Tang et al., 2021, PMID: 33683201), and applied here to prefrontal activity in a sensory-cue guided spatial memory task (Muysers et al., 2024, PMID: 38459033; Symanski et al., PMID: 36480255; Taxidis et al, 2020, PMID: 32949502). There are several issues with the conclusion in the manuscript that prefrontal replay sequences are involved in evaluating behavioral outcomes rather than planning future outcomes.
(4) First, odor sampling in odor-guided memory tasks is an active sensory processing state that leads to beta and other oscillations in olfactory regions, hippocampus, prefrontal cortex, and many other downstream networks, as documented in a vast literature of studies (Martin et al., 2007, PMID: 17699692; Kay, 2014, PMID: 24767485; Martin et al., 2014; Ramirez-Gordillo, 2022, PMID: 36127136; Symanski et al., 2022, PMID: 36480255). This is an active sensory state, not conducive to internal replay sequences, unlike references used in this manuscript to motivate this analysis, which are hippocampal spatial memory studies with internally guided rather than sensory-cue guided decisions, where internal replay is seen during immobility at reward wells. These two states cannot be compared with the expectation of finding similar replay sequences, so it is trivially expected that internal replay sequences will not be seen during odor sampling.
(5) Second, sequence replay is not the only signature of reactivation. Many studies have quantified prefrontal replay using template matching and reactivation strength metrics that do not involve sequences (Peyrache et al., 2009, PMID: 19483687; Sun et al., 2024, PMID: 38872470). Third, previous studies have explicitly shown that prefrontal activity can be decoded during odor sampling to predict future spatial choices - this uses sensory-driven ensemble activity in prefrontal cortex and not replay, as odor sampling leads to sensory driven processing and recall rather than a reactivation state (Symanski et al., 2022, PMID: 36480255). It is possible that 1-photon recordings do not have the temporal resolution and information about oscillatory activity to enable these kinds of analyses. Therefore, an unambiguous conclusion about the existence and role of prefrontal reactivation is not possible in this experimental and analytical design.
Reviewer #2 (Public review):
Summary:
The first part of the manuscript quantifies the proportion of goal-arm specific and task-phase specific cells during the learning and learned conditions, and similar to their previously published Muysers et al. (2025) paper, find that the task-phase coding cells (Muysers et al. call them path equivalent cells) increase in the learned condition. However, compared to the Muysers et al. 2025 paper, this work quantifies the proportion of cells that change coding type across learning and learned conditions. The second part of the paper reports firing sequences using a sequence similarity clustering-based method that the group developed previously and applied to hippocampal data in the past.
Strengths:
Identifying sequences by a clustering method in which sequence patterns of individual events are compared is an interesting idea.
Weaknesses:
Further controls are needed to validate the results.
Reviewer #3 (Public review):
In the study, the authors performed longitudinal 1P calcium imaging of mouse mPFC across 8 weeks during learning of an olfactory-guided task, including habituation, training, and sleep periods. The task had 3 arms. Odor was sampled at the end of the middle arm (named the "Sample" period). The animal then needed to run to one of the two other arms (R or L) based on the odor. The whole period until they reached the end of one of the choice arms was the "Outward" period. The time at the reward end was the "Reward" period. They noted several changes from the learning condition to the learned condition (there are some questions for the authors interspersed):
(1) They classified cells in a few ways. First, each cell was classified as SI (spatially informative) if it had significantly more spatial information than shuffled activity, and ~50% of cells ended up being SI cells. Then, among the SI cells, they classified a cell as a TC (task cell) if it had statistically similar activity maps for R versus L arms, and a GC (goal arm cell) otherwise. Note that there are 4 kinds of these cells: outer arm TCs and GCs, and middle arm TCs and GCs (with middle arm GCs essentially being like "splitter cells" since they are not similarly active in the middle arm for R versus L trials). There was an increase in TCs from the learning to the learned condition sessions.
(2) They analyze activity sequences across cells. They extracted 500 ms duration bursts (defined as periods of activity > 0.5 standard deviations over what I assume is the mean - if so, the authors can add "over the mean" to the burst definition in the methods). They first noted that the resulting "Burst rates were significantly larger during behavioral epochs than during sleep and during periods of habituation to the arena", and "Moreover, burst rates during correct trials were significantly lower than during error trials". For the sequence analysis, they only considered bursts consisting of at least 5 active cells. A cell's activity within the burst was set to the center of mass of calcium activity. Then they took all the sequences from all learned and learning sessions together and hierarchically clustered them based on Spearman's rank correlation between the order of activity in each pair of sequences (among the cells active in both). The iterative hierarchical clustering process produces groups (clusters) of sequences such that there are multiple repeats of sequences within a cluster. Different sequences are expressed across all the longitudinally recorded sessions. They noted "large differences of sequence activation between learning and learned condition, both in the spatial patterns (example animal in Figure 3D) and the distribution of the sequences (Figures 3D, E). Rastermap plots (Figure 3D) also reveal little similarity of sequence expression between task and habituation or sleep condition." They also note that the difference in the sequences between learning and learned conditions was larger than the difference between correct and error trials within each condition. They conclude that during task learning, new representations are established, as measured by the burst sequence content. They do additional analyses of the sequence clusters by assessing the spatial informativeness (SI) of each sequence cluster. Over learning, they find an increase in clusters that are spatially informative (clusters that tend to occur in specific locations). Finally, they analyzed the SI clusters in a similar manner to SI cells and classified them as task phase selective sequences (TSs) and goal arm selective sequences (GSs), and did some further analysis. However, they themselves conclude that the frequency of TSs and GSs is limited (I believe because most sequence clusters were non-SI - the authors can verify this and write it in the text?). In the discussion, they say, "In addition to GSs and TSs, we found that most of the recurring sequences are not related to behavior".
(3) As an alternative to analyzing individual cells and sequences of individual cells, they then look for trajectory replay using Bayesian population decoding of location during bursts. They analyze TS bursts, GS bursts, and non-SI bursts. They say "we found correlations of decoded position with time bin (within a 500 ms burst) strongly exceeding chance level only during outward and reward phase, for both GSs and TSs (Fig 4H)." Figure 4H shows distributions indicating statistically significant bias in the forward direction (using correlations of decoded location versus time bin across 10 bins of 50 ms each within each 500-ms burst). They find that the Outward trajectories appear to reflect the actual trajectory during running itself, so they are likely not replay. But the sequences at the Reward are replay as they do not reflect the current location. Furthermore, replay at the Reward is in the forward direction (unlike the reverse replay at Reward seen in the hippocampus), and this replay is only seen in the learned and not the learning condition. At the same time, they find that replay is not seen during odor Sampling, from which they conclude there is no evidence of replay used for planning. Instead, they say the replay at the Reward could possibly be for evaluation during the Reward phase, though this would only be for the learned condition. They conclude "Together with our finding of strong changes in sequence expression after learning (Figure 3E) these findings suggest that a representation of task develops during learning, however, it does not reflect previous network structure." I am not sure what is meant here by the second part of this sentence (after "however ..."). Is it the idea that the replay represents network structure, and the lack of Reward replay in the learning condition means that the network structure must have been changed to get to the learned condition? Please clarify.
This study provides valuable new information about the evolution of mPFC activity during the learning of an odor-based 2AFC T-maze-like task. They show convincing evidence of changes in single-cell tuning, population sequences, and replay events. They also find novel forward replay at the Reward, and find that this is present only after the animal has learned the task. In the discussion, the authors note "To our knowledge, this study identified for the first time fast recurring neural sequence activity from 1-p calcium data, based on correlation analysis."
(1) There are some statements that are not clear, such as at the end of the introduction, where the authors write, "Both findings suggest that the mPFC task code is locally established during learning." What is the reasoning behind the "locally established" statement? Couldn't the learning be happening in other areas and be inherited by the mPFC? Or are the authors assuming that newly appearing sequences within a 500-ms burst period must be due to local plasticity? I have also pointed out a question about the statement "however, it does not reflect previous network structure" in (3) above.
(2) The threshold for extracting burst events (0.5 standard deviations, presumably above the mean, but the authors should verify this) seems lower than what one usually sees as a threshold for population burst detection. What fraction of all data is covered by 500 ms periods around each such burst? However, it is potentially a strength of this work that their results are found by using this more permissive threshold.