Mice rapidly learn to discriminate stimulus direction in head-fixed paradigm.

a, A water droplet is paired with air puffs in one direction (CS+) but not the other (CS-). Licking in anticipation of water is assessed in the response window just after CS+ or CS- and prior to water delivery for the CS+ (grey bar). b, Experimental timeline. 2-3 weeks after virus injection, naive tuft responses to stimuli are recorded (pre). The CS+ is then paired with water for 8-9 days (blue). On the last day, stimuli are presented without reward (post). In a separate group of mice, the same stimuli are presented over 9 days in the absence of reward (unrewarded group). c, Lick rasters for three different sessions in one example mouse. On session 9, the CS+ but not the CS- reliably elicits licks. d, Mean baseline-subtracted whisking amplitude aligned to the CS+ (red) and CS- (navy) across sessions 1, 2, and 9 of an example mouse. e, Learning curve demonstrates rapid learning. Mean probability of at least one lick in the response window across sessions. f, Behavioral performance of each mouse in the rewarded group (M1 – M7).

Overall tuft response to stimuli is unbiased and relatively stable across conditioning.

a, Dendritic activity was recorded in layer 1 (i) in the C1/C2 barrel columns (ii). (i) Two-photon image ∼60 µm deep relative to pia. Dashed yellow lines denote C1 and C2 boundaries from intrinsic imaging. Reconstruction from50. (ii) Tangential section through layer 4 showing barrels stained with streptavidin-Alexa 647 and GCaMP6f-expressing apical trunks. Red circles indicate location of 2-photon lesions to mark the imaging region for post-hoc analysis. b, Overlay of five segmented pseudo-colored tufts from imaging field in A(i). c, Time courses of calcium responses of example tufts in b to three air puffs (dashes). d, Amplitude for CS+ (red) and CS-responses (blue), computed for each segmented tuft in the first 1.5 s post-stimulus (grey points), do not differ within or across sessions. Colored lines indicate median. e, Same as in d, showing data for all conditioning sessions.

Reinforcement learning, but not stimulus exposure, enhances tuft selectivity for CS+ and CS- stimuli.

a, Across the indicated sessions, individual tufts (circles) exhibit larger biases to CS+ or CS- (pooled across all conditioned mice). b, Repeated exposure to stimuli does not bias individual tufts to CS+ or CS-. c, Conditioning reshapes distribution of selectivity indices for tufts from Normal on pre-conditioning session to uniform on post-conditioning session. d, Distribution of tuft selectivity indices remains Normal throughout all repeated exposure sessions. e, Selectivity (median SI magnitude of tufts for each session) increases with behavioral performance of 6 animals. f, Neural discriminability (mean ± sem) of tufts, pooled across all animals on each session, increases with conditioning and decreases with repeated exposure.

High-speed volumetric imaging of apical tufts confirms the emergence of enhanced selectivity after learning.

a, Top and side view of four example tufts segmented from volumetric SCAPE imaging. b, Time courses of calcium activity from example tufts in a during five presentations of air puff stimuli (dashes). c, Performance across all conditioning sessions of two mice that were imaged with SCAPE. d, Across the indicated sessions, individual SCAPE-imaged tufts (circles) exhibit larger biases to CS+ or CS-. e, Conditioning reshapes selectivity distribution from Normal to uniform.

Longitudinal tracking reveals that reward enhances the selectivity of both initially unresponsive and responsive tufts.

a, Three example tufts that were longitudinally tracked across learning. Top row: An initially unresponsive tuft develops a robust response to the CS+ but not the CS- after learning. Middle row: A responsive but unselective tuft loses its robust CS+ response and becomes selective for the CS-. Bottom row: A CS- selective neuron becomes unresponsive to both stimuli. b, Tufts that were unresponsive during the first session were longitudinally tracked to the last session. Plotted is the mean proportion of selective and unselective neurons across all animals in the conditioned (black bars) and repeated exposure (grey bars) groups. c,d, Same analysis as b for initially selective (c) and unselective (d) tufts. Two-sample t-test was used for comparisons between conditioned and repeated exposure groups. Paired t-test was used for comparisons within a group. * p < 0.05. e, Total tuft counts from first to last session within the 3 response categories for either conditioned (left) or repeated exposure (right) groups. f, SI of responsive tufts on the last session that were initially unresponsive during the first session. Conditioned tufts have enhanced selectivity compared to repeated exposure. g, Tufts that were selective on the last session are more selective if conditioned (black) rather than undergoing repeated exposure (grey). h, Tufts that responded on both pre and post sessions tend to have higher selectivity if conditioned rather than undergoing repeated exposure. i, SI of responsive tufts on the first session that later became unresponsive during the last session.

Whisking is only weakly correlated with tuft activity and cannot account for changes in selectivity during learning.

a, Whisking amplitude aligned to calcium activity of three example tufts in one session. Green shading indicates periods of whisking. Red and navy ticks indicate CS+ or CS- delivery, respectively. b, Mean whisking response of five mice to CS+ (red) and CS- (navy) does not change across sessions during learning (mean ± s.e.m.). c, Mean standard deviation of whisking decreases for both CS+ and CS- across learning, but CS+ and CS- do not differ. d, Event-triggered averages of 322 tufts on the post-conditioning day (grey traces - individual tufts, black inset - population average) are responsive to stimuli but relatively unmodulated by whisking. e, R2 values for linear models predicting calcium from stimuli (y axis) are consistently greater than those predicting calcium from whisking (x axis). Each circle represents a tuft. (n = 322 tufts) f, Magnitude of tuft selectivity does not correlate with mean whisking amplitude during CS+ (left) and CS- trials (right) on that session.

Behavioral responses do not account for enhancement of stimulus selectivity during learning.

a, Mean stimulus responses of four tufts during hit (red), CR (cyan), and FA (black) trials. Top row: Example tufts whose responses are not behaviorally modulated (CR is similar to FA). Bottom row: Example tufts with behaviorally modulated responses (CR and FA differ). b, Selectivity index (SI) distribution changes from early (left) and late learning sessions (right) even when tufts with behaviorally modulated responses (CR≠FA) are excluded. c, Median SI magnitude of tufts in each of six animals (from panel b) increases from early to late learning sessions.

Apical tufts in barrel cortex of mice performing the task exclusively with their whiskers undergo long-lasting changes in selectivity.

a, SI histograms of mice performing the task exclusively with their whiskers exhibit increased selectivity across pre-conditioning, last-rewarded, and post-conditioning sessions. b, Relative to pre-conditioning, mice using their whiskers and other sensory cues to perform the task have increased selectivity during the last rewarded session, but not the post-conditioning session. c, The probability of anticipatory licks in response to the CS+ extinguishes across post-conditioning blocks (of 20 trials each). d, Tuft selectively remains uniformly distributed during post-conditioning trial blocks 1-2 (top) while licking is extinguishing, and blocks 3-4 (bottom) in which licking is extinguished.

CS+ trials evoke a second, long-latency peak during early learning, but not late learning.

a, Left: Population average of stimulus-responsive tufts aligned to CS+ (red) or CS- (blue) trials from an example mouse. Right: Normalized ΔF/F of individual tufts during CS+ trials. b, Same as in a, combining data across four mice whose imaging regions were mapped with intrinsic imaging.

Selectivity was enhanced in individual animals that received rewards.

Median SI magnitude for each animal across three sessions for conditioned (left) and repeated exposure groups (right). * p < 0.05, *** p < 10-3

Segmented tufts from two-photon and SCAPE microscopy.

12 example tufts extracted from either two-photon (a) or SCAPE microscopy (b). Tufts segmented from SCAPE microscopy are shown as maximum intensity projections from the top and side. Scale bars: 100 μm.

Calcium event rate of tufts that were either unresponsive or responsive to air puff stimuli.

The number of calcium events per minute was quantified for all tufts during each conditioning session. Data from each group was pooled across all sessions.

Licking cannot account for changes in selectivity during learning.

a, ITI lick-bout-triggered averages of 232 tufts on the 5th conditioning day, when ITI licks were still common (grey traces - individual tufts, black inset - population average), exhibit little or no lick-related calcium influx. b, R2 values for linear models predicting calcium from stimuli (y axis) are consistently greater than those predicting calcium from licking (x axis). Each circle represents one tuft out of 442 tufts on last-rewarded sessions. c, Coefficients from a multivariate regression analysis with calcium as the response variable and the CS+, CS-, whisking, and licking as the predictors. CS+ and CS- coefficients are therefore disentangled from correlations with whisking and licking. Conditioning biases individual tufts (circles) to have larger CS+ or CS- coefficients. n = 304, 324, and 322 tufts for First rewarded, Last rewarded, and Post conditioning, respectively. d, Similar analysis to C but for repeated exposure group, with calcium as the response variable and the CS+, CS-, and whisking as the predictors. n = 223, 208, and 218 tufts for Day 2, Final - 1, and Final session, respectively.