Experimental design.

(A) Awake head-fixed mice viewed sequences of oriented gratings while undergoing two-photon calcium imaging in superficial V1, the location of which was determined using widefield retinotopic mapping. To establish a functional baseline, mice were shown the sequences ABCD, ABBD, and ACBD on day 0 (pre-training). After seeing only ABCD during training (days 1-4), mice were again shown all three sequences on the test day (day 5). Sequence elements were held on the screen for 250 ms each and presented without gaps between elements. Sequence presentations were separated by 800 ms of gray screen. (B) To increase the temporal specificity of calcium signals, fluorescence extracted from ROIs marking individual neurons was deconvolved prior to analysis. (C) Heat-map showing the averaged responses of 1368 cells on day 0 to sequence ABCD. To remove artificial temporal structure, this map represents that average activity of odd trials sorted based on the time of peak activity in even trials.

Stimulus selectivity

Stimulus selectivity for neurons recorded on day 0 (n=1368) and day 5 (n=1500). A cell was considered stimulus-selective for an element if the average activity evoked by that stimulus was more than two standard deviations higher than any other stimulus.

Stimulus omissions drive prediction errors.

(A) Diagram of putative prediction error (PE) responses to omissions (middle) and substitutions (right) where the omitted/substituted element is expected to drive elevated responses on day 5 compared with day 0 (the x indicates that this elevated response is not present in our data). (B) PE ratios were computed by dividing trial- and time-averaged activity during the deviant image by activity during a corresponding standard image (traces in panel B were drawn manually for illustration purposes). (C) Average trace of B-responsive cells to ABCD with bootstrapped 95% confidence intervals on days 0 (gray) and 5 (red). (D) Average trace of B-responsive cells to ABBD (left) and distributions of omission-type PE ratios. Note that there is no change in visual stimulus at the B1B2 transition. On day 0, mean PE=1.10 (n=137). On day 5, mean PE=1.44 (n=107). Distributions were significantly different (p << 0.05; n=244; KS-test). (E) Average trace of C-responsive cells to ACBD (left) and ABCD (middle). (right) Distributions of substitution-type PE ratios for C-responsive cells. On day 0, mean PE=0.88 (n=84). On day 5, mean PE=0.84 (n=39). Day 0 and day 5 distributions were not significantly different (p > 0.05; n=123; KS-test).

Principal Component and Sparsity Analysis.

(A) Trial average population responses, sorted by time to peak latency in each cell, to each sequence before and after training. (B) Prior to training, activity is driven along principal components jointly in complex combinations. After training, the most significant principal components align neatly with individual stimuli. In both datasets, the first five components explain ∼80% of the variance. (C) To test whether changes in principal component space reflected the decorrelation of responses, we computed Pearson-correlation coefficients between all four images for each sequence presentation individually. Empirical PDFs (top panels) and CDFs (bottom panels) of Pearson-correlation coefficients. After training, activity became significantly less correlated (p < 0.05; n=6000; KS-test) for ABCD and ACBD. Delta (Δ) on bottom panels indicates area between curves on CDFs.

Stimulus decoding and representational drift.

(A) Average confusion matrices (100 iterations) for decoders trained on responses to all images. Average decoder accuracy was 78% and 76% for days 0 and 5, respectively. Both are well above chance (6.7% ± 0.4%). The ability to differentiate correctly between the same image in different contexts, such as ABCD vs ABBD, prompted us to consider whether responses slowly drifted over time since sequences were presented in blocks of 100. (B) A decoder trained on individual elements (e.g., ABCD) accurately classifies which block responses came from, with errors decreasing along with distance between blocks (e.g., block 2 responses are more often classified as belonging to block 3 than blocks 4 or 5). Confusion matrices represent averages over decoders trained on stimuli individually (see Methods). Decoder accuracy was 68% on day 0 and 56% on day 5. (C) We estimated drift by computing Pearson’s correlation coefficients between all pairs of population vectors driven by a particular sequence element and grouped these values based of between pairs were in time/trial. Responses become less correlated as distance between trials increases during both stimulus-evoked and gray periods. The largest change in overall temporal correlation was seen between gray periods on days 0 and 5.

Stimulus responses shift with training.

(A) Heatmaps displaying trial-averaged responses to all sequences sorted by peak response time during ABCD. Histograms showing the binned location of peak activity for each cell shift towards stimulus-locked phasic responses after training. The green rectangle highlights the location of increased activity on day 5 relative to baseline. (B) Combining histograms across all stimulus conditions shows how temporal response latency patterning changes over days. Note that the first two time bins (66 ms) after onsets are omitted in histograms to compensate for the transmission delay from retina to L2/3 cells. Prior to training, responses slowly build up after onset. After training, responses are robust at onset and undergo quick depression prior to ramping up for the next element. Histograms based only on early trials (first 100) show that this pattern does not change significantly over the course of the imaging session. (C) To test whether changes to temporal patterning might reflect a change in the ability to discern durations, we trained decoders on responses from different time bins following sequence onset. Decoder accuracy did not change significantly as a function of training or sequence.

Dynamics during intersequence gray periods.

(A) Heatmaps displaying trial-averaged responses during gray periods following sequence presentations. Activity is shown sorted as before by peak response time during ABCD presentations (top panels) and resorted based on peak response during the gray periods only (middle panels). Histograms (bottom) reflect locations of peak activity for each cell and show that there is an increase in active population size about 300 ms after gray onset and a second uptick towards preceding the next sequence presentation at 800 ms. (B) Average confusion matrices (100 iterations) for decoders trained on responses at different delays from gray onset for day 0 (left), day 5 (middle), and the difference between them (right). (C) Overall decoder accuracy as a function of time since gray onset did not change significantly with training.