Learned response dynamics reflect stimulus timing and encode temporal expectation violations in superficial layers of mouse V1

  1. Center for Systems Neuroscience, Department of Biology, Boston University, Boston, United States
  2. Neurophotonics Center, Boston University, Boston, United States
  3. Graduate Program in Neuroscience, Boston University, Boston, United States

Peer review process

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Tatyana Sharpee
    Salk Institute for Biological Studies, La Jolla, United States of America
  • Senior Editor
    Tirin Moore
    Stanford University, Howard Hughes Medical Institute, Stanford, United States of America

Reviewer #1 (Public review):

Summary:

Knudstrup et al. use two-photon calcium imaging to measure neural responses in the mouse primary visual cortex (V1) in response to image sequences. The authors presented mice with many repetitions of the same four-image sequence (ABCD) for four days. Then on the fifth day, they presented unexpected stimulus orderings where one stimulus was either omitted (ABBD) or substituted (ACBD). After analyzing trial-averaged responses of neurons pooled across multiple mice, they observed that stimulus omission (ABBD) caused a small, but significant, strengthening of neural responses but observed no significant change in the response to stimulus substitution (ACBD). Next, they performed population analyses of this dataset. They showed that there were changes in the correlation structure of activity and that many features about sequence ordering could be reliably decoded. This second set of analyses is interesting and exhibited larger effect sizes than the first results about predictive coding. However, concerns about the design of the experiment temper my enthusiasm.

The most recent version of this manuscript makes a few helpful changes (entirely in supplemental figures--the main text figures are unchanged). It does not resolve any of the larger weaknesses of the experimental design, or even perform single-neuron tracking in the one case where it was possible (between similar FOVs shown in Supplemental Figure 1).

Strengths:

(1) The topic of predictive coding in the visual cortex is exciting, and this task builds on previous important work by the senior author (Gavornik and Bear 2014) where unexpectedly shuffling sequence order caused changes in LFPs recorded from visual cortex.

(2) Deconvolved calcium responses were used appropriately here to look at the timing of the neural responses.

(3) Neural decoding results showing that the context of the stimuli could be reliably decoded from trial-averaged responses were interesting. But I have concerns about how the data was formatted for performing these analyses.

Weaknesses:

(1) All analyses were performed on trial-averaged neural responses that were pooled across mice (except for Supplementary Figure 6, see below). Owing to differences between subjects in behavior, experimental preparation quality, and biological variability, it seems important to perform most analyses on individual datasets to assess how behavioral training might differently affect each animal.

In the most recent draft, a single-mouse analysis was added for Figure 4C (Supplementary Figure 6). This effect of "representational drift" was not statistically quantified in either the single-mouse results or in the main text figure panel. Moreover, the apparent correlational drift could be accounted for by a reduction in SNR as a consequence of photobleaching.

(2) The correlation analyses presented in Figure 3 (labeled the second Figure 2 in the text) should be conducted on a single-animal basis. Studying population codes constructed by pooling across mice, particularly when there is no behavioral readout to assess whether learning has had similar effects on all animals, appears inappropriate to me. If the results in Figure 3 hold up on single animals, I think that is definitely an interesting result.

In the most recent draft, this analysis was still not performed on single mice. I was referring to the "decorrelation of responses" analysis in Figure 3, not the "representational drift" analysis in Figure 4. See my comments on Supplementary Figure 6 above.

(3) On Day 0 and Day 5, the reordered stimuli are presented in trial blocks where each image sequence is shown 100 times. Why wasn't the trial ordering randomized as was done in previous studies (e.g. Gavornik and Bear 2014)? Given this lack of reordering, did neurons show reduced predictive responses because the unexpected sequence was shown so many times in quick succession? This might change the results seen in Figure 2, as well as the decoder results where there is a neural encoding of sequence order (Figure 4). It would be interesting if the Figure 4 decoder stopped working when the higher order block structure of the task were disrupted.

In the rebuttal letter for the most recent draft, the authors refer to recent work in press (Hosmane et al. 2024) suggesting that because sleep may be important for plastic changes between sessions, they do not expect much change to be apparent within a session. However, they admit that this current study is too underpowered to know for sure--and do not cite or mention this yet unpublished work in the manuscript itself.

As a control, I would be interested to at least know how much variance in neural responses is observed between intermediate "training" sessions with identical stimuli, e.g. between Day 1 and Day 4, but this is not possible as imaging was not performed on these days.

Despite being referred to as "similar" I do not think early and late responses are clearly shown--aside from the histograms comparing "early traces" to "all traces" which include early traces in Figure 5B and Figure 6A. Showing variance in single-cell responses would be helpful to add in Supplementary Figure 3 and Supplementary Figure 4.

(4) A primary advantage of using two-photon calcium imaging over other techniques like extracellular electrophysiology is that the same neurons can be tracked over many days. This is a standard approach that can be accomplished by using many software packages-including Suite2P (Pachitariu et al. 2017), which is what the authors already used for the rest of their data preprocessing. The authors of this paper did not appear to do this. Instead, it appears that different neurons were imaged on Day 0 (baseline) and Day 5 (test). This is a significant weakness of the current dataset.

In the most recent draft, this concern has not been mitigated. Despite Supplementary Figure 1 showing similar FOVs, mostly different neurons were still extracted. In all other sessions, it is not reported how far apart the other recorded FOVs were from each other.

The rebuttal comment that the PE statistic is computed on an individual cell within-session basis is reasonable. Moreover, the bootstrapped version of the PE analysis in Supplementary Figure 8 is an improvement of the main analysis in the paper. As a control, it would have been helpful to compute the stability of the PE ratio statistics between training days (e.g. between day 1 and day 4). How much change would have been observed when none is expected? Unfortunately, imaging was not performed on these training days so this analysis will not be readily possible to perform. Moreover, the PE statistic requires averaging across cells and trials and is therefore very likely to wash out many interesting effects. Even if it is the population response that is changing, why would it be the arithmetic mean that changes in particular vs. some other projection of the population activity? The experimental and analysis design of the paper here remains weak in my mind.

Reviewer #2 (Public review):

Knudstrup and colleagues investigate response to short and rapid sequences of stimuli in layer 2/3 of mouse visual cortex. To quote the authors themselves: "the work continues the recent tradition of providing ambiguous support for the idea that cortical dynamics are best described by predictive coding models". Unfortunately, the ambiguity here is largely a result of the choice of experimental design and analysis, and the data provide only incomplete support for the authors' conclusions.

The authors have addressed some of the concerns of the first revision. However, many still remain.

(1) From the first review: "There appears to be some confusion regarding the conceptual framing of predictive coding. Assuming the mouse learns to expect the sequence ABCD, then ABBD does not probe just for negative prediction errors, and ACBD not just positive prediction errors. With ABBD, there is a combination of a negative prediction error for the missing C in the 3rd position, and a positive prediction error for B in 3rd. Likewise, with ACBD, there is negative prediction error for the missing B at 2nd and missing C at 3rd, and a positive prediction error for the C in 2nd and B in 3rd. Thus, the authors' experimental design does not have the power to isolate either negative or positive prediction errors. Moreover, looking at the raw data in Figure 2C, this does not look like an "omission" response to C, more like a stronger response to a longer B. The pitch of the paper as investigating prediction error responses is probably not warranted - we see no way to align the authors' results with this interpretation."

The authors acknowledge in their response that this is a problem, but do not appear to discuss this in the manuscript. This should be fixed.

(2) From the first review: "Recording from the same neurons over the course of this paradigm is well within the technical standards of the field, and there is no reason not to do this. Given that the authors chose to record from different neurons, it is difficult to distinguish representational drift from drift in the population of neurons recorded. "

The authors respond by pointing out that what they mean by "drift" is within day changes. This has been clarified. However, the analyses in Figures 3 and 5 still are done across days. Figure 3: "Experience modifies activity in PCA space ..." and figure 5: "Stimulus responses shift with training". Both rely on comparisons of population activity across days. This concern remains unchanged here. It would probably be best to remove any analysis done across days - or use data where the same neurons were tracked. Performing chronic two-photon imaging experiments without tracking the same neurons is simply bad practice (assuming one intends to do any analysis across recording sessions).

(3) From the first revision: "The block paradigm to test for prediction errors appears ill chosen. Why not interleave oddball stimuli randomly in a sequence of normal stimuli? The concern is related to the question of how many repetitions it takes to learn a sequence. Can the mice not learn ACBD over 100x repetitions? The authors should definitely look at early vs. late responses in the oddball block. Also the first few presentations after block transition might be potentially interesting. The authors' analysis in the paper already strongly suggests that the mice learn rather rapidly. The authors conclude: "we expected ABCD would be more-or-less indistinguishable from ABBD and ACBD since A occurs first in each sequence and always preceded by a long (800 ms) gray period. This was not the case. Most often, the decoder correctly identified which sequence stimulus A came from." This would suggest that whatever learning/drift could happen within one block did indeed happen and responses to different sequences are harder to interpret."

Again, the authors acknowledge the problem and state that "there is no indication that this is a learned effect". However, they provide no evidence for this and perform no analysis to mitigate the concern.

(4) Some of the minor comments also appear unaddressed and uncommented. E.g. the response amplitudes are still shown in "a.u." instead of dF/F or z-score or spikes.

Reviewer #3 (Public review):

Summary:

This work provides insights into predictive coding models of visual cortex processing. These models predict that visual cortex neurons will show elevated responses when there are unexpected changes to learned sequential stimulus patterns. This model is currently controversial, with recent publications providing conflicting evidence. In this work, the authors test two types of unexpected pattern variations in layer 2/3 of the mouse visual cortex. They show that pattern omission evokes elevated responses, in favor of a predictive coding model, but find no evidence for prediction errors with substituted patterns, which conflicts with both prior results in L4, and with the expectations of a predictive coding model. They also report that with sequence training, responses sparsify and decorrelate, but surprisingly find no changes in the ability of an ideal observer to decode stimulus identity or timing.

These results are an important contribution to the understanding of how temporal sequences and expectations are encoded in the primary visual cortex

Comments on revisions:

In this revision, the authors address several of the concerns in the original manuscript. However, the primary issue, raised by all three reviewers, was the block design of the experiments. This design makes disentangling the effects of any rapid (within block) plasticity from any longer term (across days) plasticity-which nominally is the subject of the paper-extremely difficult.

Although it may be the case that re-running the experiments with an interleaved design is beyond the scope of this paper, unfortunately, the revised manuscript still does not adequately discuss this potential confound. The authors note that stimulus A in ABCD, ABBD, and ACBD could be distinguished on day 0, indicating that within block changes do occur. In both the original and revised manuscript this finding is discussed in terms of representational drift, but the authors fail to discuss how such within block plasticity may impact their primary findings of prediction error effects.

This remains a significant concern with the revised manuscript.

Many of the other issues in the original manuscript have been addressed, and in these areas the revised manuscript is both clearer and more accurately reflects the presented data. The additional analyses and controls shown in the supplemental figures aid in the interpretation of the findings.

Author response:

The following is the authors’ response to the previous reviews.

Reviewer #1:

(1) All analyses were performed on trial-averaged neural responses that were pooled across mice. Owing to differences between subjects in behavior, experimental preparation quality, and biological variability, it seems important to perform at least some analyses on individual analyses to assess how behavioral training might differently affect each animal.

In order to image at a relatively fast rate (30Hz) appropriate to the experimental conditions, we restricted our imaging to a relatively small field of view (412x412um with 512x512 pixels). This entails a smaller number of ROIs per animal, which can lead to an unbalanced distribution of cells responsive to different stimuli for individual fields-of-view. We used the common approach of pooling across animals (Homann et al., 2021; Kim et al., 2019) to overcome limitations imposed by sampling a smaller number of cells per animal. In response to this comment, we included supplemental analyses (Sup.Fig. 6) showing that representational drift (which was not performed on trial-averaged data) looks substantially the same (albeit nosier) for individual animals as at the population level. Additional analyses (PE ratio, etc.) were difficult since the distribution of cells selective for individual stimuli is unbalanced between individual animals and few mice have multiple cells representing all of the different stimuli.

(2) The correlation analyses presented in Figure 3 (labeled the second Figure 2 in the text) should be conducted on a single-animal basis. Studying population codes constructed by pooling across mice, particularly when there is no behavioral readout to assess whether learning has had similar effects on all animals, appears inappropriate to me. If the results in Figure 3 hold up on single animals, I think that is definitely an interesting result.

We repeated the correlation analysis performed on mice individually and included them in the supplement (Supp. Fig. 6). The overall result generally mirrors the result found by pooling across animals.

(3) On Day 0 and Day 5, the reordered stimuli are presented in trial blocks where each image sequence is shown 100 times. Why wasn't the trial ordering randomized as was done in previous studies (e.g. Gavornik and Bear 2014)? Given this lack of reordering, did neurons show reduced predictive responses because the unexpected sequence was shown so many times in quick succession? This might change the results seen in Figure 2, as well as the decoder results where there is a neural encoding of sequence order (Figure 4). It would be interesting if the Figure 4 decoder stopped working when the higher-order block structure of the task was disrupted.

Our work builds primarily on previous studies (Gavornik & Bear, 2014; Price et al., 2023) that demonstrated clear changes in neural responses over days while employing a similar block structure. Notably, Price et al. found that trial number (within a block) was not a significant factor in the generation of prediction-error responses which strongly suggests short-term plasticity does not play a significant role in shaping responses within the block structure. This finding is consistent with our previous LFP recordings which have not revealed any significant plasticity occurring within a training session, a conclusion bolstered by a collaborative work currently in press (Hosmane et al. 2024, Sleep) revealing the requirement for sleep in sequence plasticity expression.

It is possible that layer 2/3 adapts to sequences more rapidly than layer 4/5. While manual inspection does not reveal an obvious difference between early and late blocks in this dataset, the n for this subset is too small to draw firm conclusions. It is our view that the block structure provides the strongest comparison to previous work, but agree it would be interesting to randomize or fully interleave sequences in future studies to determine what effect, if any, short-term changes might have.

(4) A primary advantage of using two-photon calcium imaging over other techniques like extracellular electrophysiology is that the same neurons can be tracked over many days. This is a standard approach that can be accomplished by using many software packages-including Suite2P (Pachitariu et al. 2017), which is what the authors already used for the rest of their data preprocessing. The authors of this paper did not appear to do this. Instead, it appears that different neurons were imaged on Day 0 (baseline) and Day 5 (test). This is a significant weakness of the current dataset.

The hypothesis being tested was whether expectation violations, as described in Keller & Mrsic-Flogel 2018, exist under a multi-day sequence learning paradigm. For this, tracking cells across days is not necessary as our PE metric compared responses of individual neurons to multiple stimuli within a single session. Given the speed/FOV tradeoff discussed above, we wanted to consider all cells irrespective of whether they were visible/active or trackable across days, especially since we would expect cells that learn to signal prediction errors to be inactive on day 0 and not selected by our segmentation algorithm. Though we did not compare the responses of single cells before/after training, we did analyze cells from the same field of view on days 0 and 5 (see Supp.Fig. 1) and not distinct populations.

Reviewer #2:

(1) There appears to be some confusion regarding the conceptual framing of predictive coding.

Assuming the mouse learns to expect the sequence ABCD, then ABBD does not probe just for negative prediction errors, and ACBD is not just for positive prediction errors. With ABBD, there is a combination of a negative prediction error for the missing C in the 3rd position, and a positive prediction error for B in the 3rd. Likewise, with ACBD, there is a negative prediction error for the missing B at 2nd and missing C at 3rd, and a positive prediction error for the C in 2nd and B in 3rd. Thus, the authors' experimental design does not have the power to isolate either negative or positive prediction errors. Moreover, looking at the raw data in Figure 2C, this does not look like an "omission" response to C, but more like a stronger response to a longer B. The pitch of the paper as investigating prediction error responses is probably not warranted - we see no way to align the authors' results with this interpretation.

The reviewer has identified a real problem with the framing of “positive” and “negative” prediction errors in context of sensory stimuli where substitution simultaneously introduces unexpected “positive” violation and “negative” omission. Simply put, even if there are separate mechanisms to represent positive and negative errors, there may be no way to isolate the positive response experimentally since an unexpected input always replaces the unseen expected input. For example, had a cell fired solely to ACBD (and not during either ABCD or ABCD), then whether it was signaling the unexpected occurrence of C or the unexpected absence of B would be inherently ambiguous. In either case, such a cell would have been labeled as C-responsive, and its activity would have been elevated compared with ABCD and would have been included in our substitution-type analysis of prediction errors. We accept that there is some ambiguity regarding the description in this particular case, but overall, this cell’s activity pattern would have informed the PE analysis for which the result was essentially null for the substitution-type violation ACBD.

Omission, in which the sensory input does not change, may experimentally isolate the negative response though this is only true if there is a temporal expectation of when the change should have occurred. If A is predicting B in an ordinal sense but there is no expectation of when B will occur with respect to A, changing the duration of A would not be expected to produce an error signal since at any point in time B might still be coming and the expectation is not broken until something other than B occurs. With respect specifically to ABBD in our experiments, it is correct that the learned error responses take the form of stronger, sustained responses to B during the time C was expected. This is still in contrast to day 0 in which activation decays after a transient response to ABBD. The data shows that responses during an omitted element are altered with training and take the form of elevated responses to ABBD on day 5.As we say in our discussion, this is somewhat ambiguous evidence of prediction errors since it does emerges only with training and is generally consistent with the hypothesis being tested though it takes a different form than we expected it to.

(2) Related to the interpretation of the findings, just because something can be described as a prediction error does not mean it is computed in (or even is relevant to) the visual cortex. To the best of our knowledge, it is still unclear where in the visual stream the responses described here are computed. It is possible that this type of computation happens before the signals reach the visual cortex, similar to mechanisms predicting moving stimuli already in the retina (https://pubmed.ncbi.nlm.nih.gov/10192333/). This would also be consistent with the authors' finding (in previous work) that single-cell recordings in V1 exhibit weaker sequence violation responses than the author's earlier work using LFP recordings.

Our work was aimed at testing the specific hypothesis that PE responses, at the very least, exist in L2/3—a hypothesis that is well-supported under different experimental paradigms (often multisensory mismatch). Our aim was to test this idea under a sequence learning paradigm and connect it with previously found PE responses in L4. We don’t claim that it is the only place in which prediction errors may be computed or useful, especially since (as you mentioned), there is evidence for such responses in layer 4. But it is fundamentally important to predictive processing that we determine whether PE responses can be found in layer 2/3 under this passive sequence learning paradigm, whether or not they reflect upstream processes, feedback from higher areas, or entirely local computations. Our aim was to establish some baseline evidence for or against predictive processing accounts of L2/3 activity during passive exposure to visual sequences.

(3) Recording from the same neurons over the course of this paradigm is well within the technical standards of the field, and there is no reason not to do this. Given that the authors chose to record from different neurons, it is difficult to distinguish representational drift from drift in the population of neurons recorded.

Our discussion of drift refers to changes occurring within a population of neurons over the course of a single imaging session. We have added clarifying language to the manuscript to make this clear. Changes to the population-level encoding of stimuli over days are treated separately and with different analytical tools. Re. tracking single across days, please see the response to Reviewer #1, comment 4.

(4) The block paradigm to test for prediction errors appears ill-chosen. Why not interleave oddball stimuli randomly in a sequence of normal stimuli? The concern is related to the question of how many repetitions it takes to learn a sequence. Can the mice not learn ACBD over 100x repetitions? The authors should definitely look at early vs. late responses in the oddball block. Also, the first few presentations after the block transition might be potentially interesting. The authors' analysis in the paper already strongly suggests that the mice learn rather rapidly. The authors conclude: "we expected ABCD would be more-or-less indistinguishable from ABBD and ACBD since A occurs first in each sequence and always preceded by a long (800 ms) gray period.

This was not the case. Most often, the decoder correctly identified which sequence stimulus A came from." This would suggest that whatever learning/drift could happen within one block did indeed happen and responses to different sequences are harder to interpret.

This work builds on previous studies that used a block structure to drive plasticity across days. We previously tested whether there are intra-block effects and found no indication of changes occurring within a block or withing a session (please see the response to Reviewer #1, comment 3 for further discussion). Observed drift does complicate comparison between blocks. There is no indication in our data that this is a learned effect, though future experiments could test this directly.

(5) Throughout the manuscript, many of the claims are not statistically tested, and where they are the tests do not appear to be hierarchical (https://pubmed.ncbi.nlm.nih.gov/24671065/), even though the data are likely nested.

We have modified language throughout the manuscript to be more precise about our claims. We used pooled data between mice and common parametric statistics in line with published literature. The referenced paper offers a broad critique of this approach, arguing that it increases the possibility of type 1 errors, though it is not clear to us that our experimental design carries this risk particularly since most of our results were negative. To address the specific concern, however we performed a non-parametric hierarchical bootstrap analysis (https://pmc.ncbi.nlm.nih.gov/articles/PMC7906290/) that re-confirmed the statistical significance of our positive results, see Supplemental Figure 8.

(6) The manuscript would greatly benefit from thorough proofreading (not just in regard to figure references).

We apologize for the errors in the manuscript. We caught the issue and passed on a corrected draft, but apparently the uncorrected draft was sent for review. The re-written manuscript addresses all identified issues.

(7) With a sequence of stimuli that are 250ms in length each, the use of GCaMP6s appears like a very poor choice.

We started our experiments using GCaMP6f but ultimately switched to GCaMP6s due to its improved sensitivity, brightness, and accuracy in spike detection (Huang et al., 2021). When combined with deconvolution (Pachitariu et al., 2018; Pnevmatikakis et al., 2016), we found GCaMP6s provides the most complete and accurate view of spiking within 40ms time bins. The inherent limitations of calcium imaging are more likely to be addressed using electrophysiology rather than a faster sensor in future studies.

(8) The data shown are unnecessarily selective. E.g. it would probably be interesting to see how the average population response evolves with days. The relevant question for most prediction error interpretations would be whether there are subpopulations of neurons that selectively respond to any of the oddballs. E.g. while the authors state they "did" not identify a separate population of omission-responsive neurons, they provide no evidence for this. However, it is unclear whether the block structure of the experiments allows the authors to analyze this.

We concluded that there is no clear dedicated subpopulation of omission-responding cells by inspecting cells with large PE responses (i.e., ABBD, see supplemental figure 3). Out of the 107 B-responsive cells on day 5, only one appeared to fire exclusively during the omitted stimulus. Average traces for all B-responsive cells are included in the supplement and we have updated the manuscript accordingly. Similarly, a single C-responsive cell was found with an apparently unique substitution error profile (ABCD and ACBD , supplemental figure 4).

Our primary concern was to make sure that days 0 and 5 had the highest quality fields-of-view. In work leading up to this study, there were concerns that imaging on all intermediate days resulted in a degradation of quality due to photobleaching. We agree that an analysis of intermediate days would be interesting, but it was excluded due to these concerns.

Reviewer #3:

(1) Experimental design using a block structure. The use of a block structure on test days (0 and 5) in which sequences were presented in 100 repetition blocks leads to several potential confounds. First, there is the potential for plasticity within blocks, which could alter the responses and induce learned expectations. The ability of the authors to clearly distinguish blocks 1 and 2 on Day 0 with a decoder suggests this change over time may be meaningful.

Repeating the experiments with fully interleaved sequences on test days would alleviate this concern. With the existing data, the authors should compare responses from the first trials in a block to the last trials in a block.

This block design likely also accounts for the ability of a decoder to readily distinguish stimulus A in ABCD from A in ABBD. As all ABCD sequences were run in a contiguous block separate from ABBD, the recent history of experience is different for A stimuli in ABCD versus ABBD. Running fully interleaved sequences would also address this point, and would also potentially mitigate the impact of drift over blocks (discussed below).

As described in other responses, the block structure was chosen to align more closely with previous studies. We take the overall point though, and future studies will employ the suggested randomized or interleaved structure in addition to block structures to investigate the effects of short-term plasticity.

(2) The computation of prediction error differs significantly for omission as opposed to substitutions, in meaningful ways the authors do not address. For omission errors, PE compares the responses of B1 and B2 within ABBD blocks. These responses are measured from the same trial, within tens of milliseconds of each other. In contrast, substitution PE is computed by comparing C in ABCD to C in ACBD. As noted above, the block structure means that these C responses were recorded in different blocks, when the state of the brain could be different. This may account for the authors' detection of prediction error for omission but not substitution. To address this, the authors should calculate PE for omission using B responses from ABCD.

We performed the suggested analysis (i.e., ABBD vs ABCD) prior to submission but omitted it from the draft for brevity (the effect was the same as with ABBD vs ABBD). We have added the results of standardizing with ABCD as supplementary figure 3.

(3) The behavior of responses to B and C within the trained sequence ABCD differs considerably, yet is not addressed. Responses to B in ABCD potentiate from d0-> d5, yet responses to C in the same sequence go down. This suggests there may be some difference in either the representation of B vs C or position 2 vs 3 in the sequence that may also be contributing to the appearance of prediction errors in ABBD but not ACBD. The authors do not appear to consider this point, which could potentially impact their results. Presenting different stimuli for A,B,C,D across mice would help (in the current paper B is 75 deg and C is 165 deg in all cases). Additionally, other omissions or substitutions at different sequence positions should be tested (eg ABCC or ABDC).

We appreciate the suggestion. Ideally, we could test many different variants, but practical concerns regarding the duration of the imaging sessions prevented us from testing other interesting variations (such as ABCC) in the current study. We are uncertain as to how we should interpret the overall depressed response to element C seen on day 5, but since the effect is shared in both ABCD and ACBD, we don’t think it affected our PE calculations.

(4) The authors' interpretation of their PCA results is flawed. The authors write "Experience simplifies activity in principal component space". This is untrue based on their data. The variance explained by the first set of PCs does not change with training, indicating that the data is not residing in a lower dimensional ("simpler") space. Instead, the authors show that the first 5 PCs better align with their a priori expectations of the stimulus structure, but that does not mean these PCs necessarily represent more information about the stimulus (and the fact that the authors fail to see an improvement in decoding performance argues against this case). Addressing such a question would be highly interesting, but is lacking in the current manuscript. Without such analysis, referring to the PCs after training as "highly discretized" and "untangled" are largely meaningless descriptions that lack analytical support.

We meant the terms “simpler”, “highly-discretized”, and “untangled” as qualitative descriptions of changes in covariance structure that occurred despite the maintenance of overall dimensionality. As the reviewer notes, the obvious changes in PC space appear to have had practically no effect on decodability or dimensionality, and we found this surprising and worth describing.

(5) The authors report that activity sparsifies, yet provide only the fraction of stimulus-selective cells. Given that cell detection was automated in a manner that takes into account neural activity (using Suite2p), it is difficult to interpret these results as presented. If the authors wish to claim sparsification, they need to provide evidence that the total number of ROIs drawn on each day (the denominator for sparseness in their calculation) is unbiased. Including more (or less) ROIs can dramatically change the calculated sparseness.

The authors mention sparsification as contributing to coding efficiency but do not test this. Training a decoder on variously sized subsets of their data on days 0 and 5 would test whether redundant information is being eliminated in the network over training.

First, we provide evidence for sparseness using a visual responsiveness metric in addition to stimulus-selectivity. Second, it is true that Suite2p’s segmentation is informed by activity and therefore may possibly omit cells with very minimal activity. However, we detected a comparable number of cells on day 5 (n=1500) to day 0 (1368). We reportedly roughly half as many cells are stimulus-selective on day 5 compared with day 0. In order for that to have been a result of biased ROI segmentation, we would have needed to have detected closer to 2600 cells on day 5 rather than 1500. Therefore, we consider any bias in the segmentation to have had little effect on the main findings.

(6) The authors claim their results show representational drift, but this isn't supported in the data. Rather they show that there is some information in the structure of activity that allows a decoder to learn block ID. But this does not show whether the actual stimulus representations change, and could instead reflect an unrelated artifact that changes over time (responsivity, alertness, bleaching, etc). To actually assess representational drift, the authors should directly compare representations across blocks (one could train a decoder on block 1 and test on blocks 2-5). In the absence of this or other tests of representational drift over blocks, the authors should remove the statement that "These findings suggest that there is a measurable amount of representational drift".

“To actually assess representational drift, the authors should directly compare representations across blocks (one could train a decoder on block 1 and test on blocks 25)”: This is the exact analysis that was performed. Additionally, our analysis of pairwise correlations directly measures representational drift.

“But this does not show whether the actual stimulus representations change, and could instead reflect an unrelated artifact that changes over time (responsivity, alertness, bleaching, etc)”: We have repeated the decoder analysis using normalized population vectors (Supplementary Figure 5) which we believe directly addresses whether the observed drift is due to photobleaching or alertness that would affect the overall magnitudes of response vectors.

Our analysis of block decoding reflects decoders trained on individual stimulus elements, and we show the average over all such decodings (we have clarified this in the text). For example, we trained a decoder on ABCD presentations from block 1 and tested only against ABCD from other blocks, which I believe is the test being suggested by the reviewer. Furthermore, we do show that representational similarity for all stimulus elements reduces gradually and more-or-less monotonically as the time between presentations increases. We believe this is a fairly straightforward test of representational drift as has been reported and used elsewhere (Deitch et al., 2021).

(7) The authors allude to "temporal echoes" in a subheading. This term is never defined, or substantiated with analysis, and should be removed.

We hoped the term ‘temporal echo’ would be understood in the context of rebounding activity during gray periods as supported by analysis in figure 6a. We have eliminated the wording in the updated manuscript.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation