Learned response dynamics reflect stimulus timing and encode temporal expectation violations in superficial layers of mouse V1

  1. Center for Systems Neuroscience, Department of Biology, Boston University, Boston, MA 02215
  2. Neurophotonics Center, Boston University, Boston, MA, 02215
  3. Graduate Program in Neuroscience, Boston University, Boston, MA 02215

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.


  • Reviewing Editor
    Tatyana Sharpee
    Salk Institute for Biological Studies, La Jolla, United States of America
  • Senior Editor
    Tirin Moore
    Howard Hughes Medical Institute, Stanford University, Stanford, United States of America

Reviewer #1 (Public Review):


Knudstrup et al. use two-photon calcium imaging to measure neural responses in the mouse primary visual cortex (V1) in response to image sequences. The authors presented mice with many repetitions of the same four-image sequence (ABCD) for four days. Then on the fifth day, they presented unexpected stimulus orderings where one stimulus was either omitted (ABBD) or substituted (ACBD). After analyzing trial-averaged responses of neurons pooled across multiple mice, they observed that stimulus omission (ABBD) caused a small, but significant, strengthening of neural responses but observed no significant change in the response to stimulus substitution (ACBD). Next, they performed population analyses of this dataset. They showed that there were changes in the correlation structure of activity and that many features of sequence ordering could be reliably decoded. This second set of analyses is interesting and exhibited larger effect sizes than the first results about predictive coding. However, concerns about the design of the experiment temper my enthusiasm.


(1) The topic of predictive coding in the visual cortex is exciting, and this task builds on previous important work by the senior author (Gavornik and Bear 2014) where unexpectedly shuffling sequence order caused changes in LFPs recorded from the visual cortex.

(2) Deconvolved calcium responses were used appropriately here to look at the timing of the neural responses.

(3) Neural decoding results showing that the context of the stimuli could be reliably decoded from trial-averaged responses were interesting. However I have concerns about how the data was formatted for performing these analyses.


(1) All analyses were performed on trial-averaged neural responses that were pooled across mice. Owing to differences between subjects in behavior, experimental preparation quality, and biological variability, it seems important to perform at least some analyses on individual analyses to assess how behavioral training might differently affect each animal.

(2) The correlation analyses presented in Figure 3 (labeled the second Figure 2 in the text) should be conducted on a single-animal basis. Studying population codes constructed by pooling across mice, particularly when there is no behavioral readout to assess whether learning has had similar effects on all animals, appears inappropriate to me. If the results in Figure 3 hold up on single animals, I think that is definitely an interesting result.

(3) On Day 0 and Day 5, the reordered stimuli are presented in trial blocks where each image sequence is shown 100 times. Why wasn't the trial ordering randomized as was done in previous studies (e.g. Gavornik and Bear 2014)? Given this lack of reordering, did neurons show reduced predictive responses because the unexpected sequence was shown so many times in quick succession? This might change the results seen in Figure 2, as well as the decoder results where there is a neural encoding of sequence order (Figure 4). It would be interesting if the Figure 4 decoder stopped working when the higher-order block structure of the task was disrupted.

(4) A primary advantage of using two-photon calcium imaging over other techniques like extracellular electrophysiology is that the same neurons can be tracked over many days. This is a standard approach that can be accomplished by using many software packages-including Suite2P (Pachitariu et al. 2017), which is what the authors already used for the rest of their data preprocessing. The authors of this paper did not appear to do this. Instead, it appears that different neurons were imaged on Day 0 (baseline) and Day 5 (test). This is a significant weakness of the current dataset.

Reviewer #2 (Public Review):

Knudstrup et al set out to probe prediction errors in the mouse visual cortex. They use a variant of an oddball paradigm and test how repeated passive exposure to a specific sequence of visual stimuli affects oddball responses in layer 2/3 neurons. Unfortunately, there are problems with the experimental design which make it difficult to interpret the results in light of the question the authors want to address. The conceptual framing, choice of block design structure, and not tracking the same cells over days, are just some of the reasons that make this work difficult to interpret. Specific comments are as follows:

(1) There appears to be some confusion regarding the conceptual framing of predictive coding. Assuming the mouse learns to expect the sequence ABCD, then ABBD does not probe just for negative prediction errors, and ACBD is not just for positive prediction errors. With ABBD, there is a combination of a negative prediction error for the missing C in the 3rd position, and a positive prediction error for B in the 3rd. Likewise, with ACBD, there is a negative prediction error for the missing B at 2nd and missing C at 3rd, and a positive prediction error for the C in 2nd and B in 3rd. Thus, the authors' experimental design does not have the power to isolate either negative or positive prediction errors. Moreover, looking at the raw data in Figure 2C, this does not look like an "omission" response to C, but more like a stronger response to a longer B. The pitch of the paper as investigating prediction error responses is probably not warranted - we see no way to align the authors' results with this interpretation.

(2) Related to the interpretation of the findings, just because something can be described as a prediction error does not mean it is computed in (or even is relevant to) the visual cortex. To the best of our knowledge, it is still unclear where in the visual stream the responses described here are computed. It is possible that this type of computation happens before the signals reach the visual cortex, similar to mechanisms predicting moving stimuli already in the retina (https://pubmed.ncbi.nlm.nih.gov/10192333/). This would also be consistent with the authors' finding (in previous work) that single-cell recordings in V1 exhibit weaker sequence violation responses than the author's earlier work using LFP recordings.

(3) Recording from the same neurons over the course of this paradigm is well within the technical standards of the field, and there is no reason not to do this. Given that the authors chose to record from different neurons, it is difficult to distinguish representational drift from drift in the population of neurons recorded.

(4) The block paradigm to test for prediction errors appears ill-chosen. Why not interleave oddball stimuli randomly in a sequence of normal stimuli? The concern is related to the question of how many repetitions it takes to learn a sequence. Can the mice not learn ACBD over 100x repetitions? The authors should definitely look at early vs. late responses in the oddball block. Also, the first few presentations after the block transition might be potentially interesting. The authors' analysis in the paper already strongly suggests that the mice learn rather rapidly. The authors conclude: "we expected ABCD would be more-or-less indistinguishable from ABBD and ACBD since A occurs first in each sequence and always preceded by a long (800 ms) gray period. This was not the case. Most often, the decoder correctly identified which sequence stimulus A came from." This would suggest that whatever learning/drift could happen within one block did indeed happen and responses to different sequences are harder to interpret.

(5) Throughout the manuscript, many of the claims are not statistically tested, and where they are the tests do not appear to be hierarchical (https://pubmed.ncbi.nlm.nih.gov/24671065/), even though the data are likely nested.

(6) The manuscript would greatly benefit from thorough proofreading (not just in regard to figure references).

(7) With a sequence of stimuli that are 250ms in length each, the use of GCaMP6s appears like a very poor choice.

(8) The data shown are unnecessarily selective. E.g. it would probably be interesting to see how the average population response evolves with days. The relevant question for most prediction error interpretations would be whether there are subpopulations of neurons that selectively respond to any of the oddballs. E.g. while the authors state they "did" not identify a separate population of omission-responsive neurons, they provide no evidence for this. However, it is unclear whether the block structure of the experiments allows the authors to analyze this.

Reviewer #3 (Public Review):


This work provides insights into predictive coding models of visual cortex processing. These models predict that visual cortex neurons will show elevated responses when there are unexpected changes to learned sequential stimulus patterns. This model is currently controversial, with recent publications providing conflicting evidence. In this work, the authors test two types of unexpected pattern variations in layer 2/3 of the mouse visual cortex. They show that pattern omission evokes elevated responses, in favor of a predictive coding model, but find no evidence for prediction errors with substituted patterns, which conflicts with both prior results in L4, and with the expectations of a predictive coding model. They also report that with sequence training, responses sparsify and decorrelate, but surprisingly find no changes in the ability of an ideal observer to decode stimulus identity or timing.

These results are an important contribution to the understanding of how temporal sequences and expectations are encoded in the primary visual cortex. However, there are several methodological concerns with the study, and some of the authors' interpretations and conclusions are unsupported by data.

Major concerns:

(1) Experimental design using a block structure. The use of a block structure on test days (0 and 5) in which sequences were presented in 100 repetition blocks leads to several potential confounds. First, there is the potential for plasticity within blocks, which could alter the responses and induce learned expectations. The ability of the authors to clearly distinguish blocks 1 and 2 on Day 0 with a decoder suggests this change over time may be meaningful.

Repeating the experiments with fully interleaved sequences on test days would alleviate this concern. With the existing data, the authors should compare responses from the first trials in a block to the last trials in a block.

This block design likely also accounts for the ability of a decoder to readily distinguish stimulus A in ABCD from A in ABBD. As all ABCD sequences were run in a contiguous block separate from ABBD, the recent history of experience is different for A stimuli in ABCD versus ABBD. Running fully interleaved sequences would also address this point, and would also potentially mitigate the impact of drift over blocks (discussed below).

(2) The computation of prediction error differs significantly for omission as opposed to substitutions, in meaningful ways the authors do not address. For omission errors, PE compares the responses of B1 and B2 within ABBD blocks. These responses are measured from the same trial, within tens of milliseconds of each other. In contrast, substitution PE is computed by comparing C in ABCD to C in ACBD. As noted above, the block structure means that these C responses were recorded in different blocks, when the state of the brain could be different. This may account for the authors' detection of prediction error for omission but not substitution. To address this, the authors should calculate PE for omission using B responses from ABCD.

(3) The behavior of responses to B and C within the trained sequence ABCD differs considerably, yet is not addressed. Responses to B in ABCD potentiate from d0-> d5, yet responses to C in the same sequence go down. This suggests there may be some difference in either the representation of B vs C or position 2 vs 3 in the sequence that may also be contributing to the appearance of prediction errors in ABBD but not ACBD. The authors do not appear to consider this point, which could potentially impact their results. Presenting different stimuli for A,B,C,D across mice would help (in the current paper B is 75 deg and C is 165 deg in all cases). Additionally, other omissions or substitutions at different sequence positions should be tested (eg ABCC or ABDC).

(4) The authors' interpretation of their PCA results is flawed. The authors write "Experience simplifies activity in principal component space". This is untrue based on their data. The variance explained by the first set of PCs does not change with training, indicating that the data is not residing in a lower dimensional ("simpler") space. Instead, the authors show that the first 5 PCs better align with their a priori expectations of the stimulus structure, but that does not mean these PCs necessarily represent more information about the stimulus (and the fact that the authors fail to see an improvement in decoding performance argues against this case). Addressing such a question would be highly interesting, but is lacking in the current manuscript. Without such analysis, referring to the PCs after training as "highly discretized" and "untangled" are largely meaningless descriptions that lack analytical support.

(5) The authors report that activity sparsifies, yet provide only the fraction of stimulus-selective cells. Given that cell detection was automated in a manner that takes into account neural activity (using Suite2p), it is difficult to interpret these results as presented. If the authors wish to claim sparsification, they need to provide evidence that the total number of ROIs drawn on each day (the denominator for sparseness in their calculation) is unbiased. Including more (or less) ROIs can dramatically change the calculated sparseness.

The authors mention sparsification as contributing to coding efficiency but do not test this. Training a decoder on variously sized subsets of their data on days 0 and 5 would test whether redundant information is being eliminated in the network over training.

(6) The authors claim their results show representational drift, but this isn't supported in the data. Rather they show that there is some information in the structure of activity that allows a decoder to learn block ID. But this does not show whether the actual stimulus representations change, and could instead reflect an unrelated artifact that changes over time (responsivity, alertness, bleaching, etc). To actually assess representational drift, the authors should directly compare representations across blocks (one could train a decoder on block 1 and test on blocks 2-5). In the absence of this or other tests of representational drift over blocks, the authors should remove the statement that "These findings suggest that there is a measurable amount of representational drift".

(7) The authors allude to "temporal echoes" in a subheading. This term is never defined, or substantiated with analysis, and should be removed.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation