Learning enhances behaviorally relevant representations in apical dendrites

Sam E. Benezra; Kripa B. Patel; Citlali Pérez Campos; Elizabeth M.C. Hillman; Randy M. Bruno

doi:10.7554/eLife.98349.1

eLife assessment

This important study uses calcium imaging to show an increase in the selectivity of the sensory-evoked response in the apical dendritic tuft of layer 5 barrel cortex neurons as mice learn a whisker-dependent discrimination task. The evidence supporting the conclusions is compelling, and this work will be of great interest to neuroscientists working on reward-based learning and sensory processing.

https://doi.org/10.7554/eLife.98349.1.sa2

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

compelling: Evidence that features methods, data and analyses more rigorous than the current state-of-the-art

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Summary

Learning alters cortical representations and improves perception. Apical tuft dendrites in Layer 1, which are unique in their connectivity and biophysical properties, may be a key site of learning-induced plasticity. We used both two-photon and SCAPE microscopy to longitudinally track tuft-wide calcium spikes in apical dendrites of Layer 5 pyramidal neurons as mice learned a tactile behavior. Mice were trained to discriminate two orthogonal directions of whisker stimulation. Reinforcement learning, but not repeated stimulus exposure, enhanced tuft selectivity for both directions equally, even though only one was associated with reward. Selective tufts emerged from initially unresponsive or low-selectivity populations. Animal movement and choice did not account for changes in stimulus selectivity. Enhanced selectivity persisted even after rewards were removed and animals ceased performing the task. We conclude that learning produces long-lasting realignment of apical dendrite tuft responses to behaviorally relevant dimensions of a task.

Introduction

Learning and memory depend on the ability of biological networks to alter their activity based on past experience. For example, as animals learn the behavioral relevance of stimuli in a sensory discrimination task, neural representations of those stimuli are enhanced^1–7, potentially improving the salience of information relayed to downstream areas. Studies in primary somatosensory (S1)⁸ and visual cortex² have revealed that top-down signals from distant cortical regions can modify sensory representations during learning, although the cellular and circuit mechanisms underlying this plasticity remain unclear.

Cortical layer 1, comprised mainly of apical tuft dendrites of layer 5 (L5) and layer 2/3 pyramidal neurons, may be a key site driving the enhancement of sensory representations during learning. Apical tufts are anatomically well positioned for learning, receiving top-down signals from numerous cortical and thalamic areas^9–11. While L5 distal tufts are electrically remote and far from the soma, they are in close proximity to the highly electrogenic calcium spike initiation zone at the main bifurcation of the apical dendrite, and form a separate biophysical and processing compartment from the proximal dendrites^12–16. Top-down signals arriving at the tuft can trigger tuft-wide dendritic calcium spikes in L5 neurons¹⁷, which can modulate synaptic plasticity across the entire dendritic tree¹⁸ and potently drive somatic burst firing^{15, 19–23}. Consistent with this observation, L5 apical dendrite activity is highly correlated with somatic activity^{24, 25}. Therefore, by strongly influencing somatic activity, L5 apical dendritic calcium spikes can play an important role in modulating cortical output. Several neuromodulators can augment the excitability of the apical tuft and increase the likelihood of eliciting calcium spikes^{26, 27}, which could be a substrate for control of plasticity by behavioral state. Consistent with these ideas, we recently demonstrated that during behavioral training with positive reinforcements, apical tufts in sensory cortex acquire associations that extend beyond their normal sensory modality²⁸. In mouse models of dementia and Alzheimer’s disease^{29, 30}, tuft dendrites exhibit degeneration which may contribute to the cognitive and memory deficits.

L5 pyramidal neurons are the major source of output from cortex, targeting numerous subcortical structures that affect behavior. The activity of apical dendrites is known to correlate with stimulus intensity, and manipulating L5 apical dendrites and their inputs impacts performance of sensory tasks^{17, 31–33}. Apical dendritic calcium spikes of pyramidal cells could be a crucial cellular mechanism in learning-related plasticity and behavioral modification^{18, 34, 35}. However, sensory representations of apical tufts, as well as possible changes across learning, have received little attention.

To address this question, we used two-photon microscopy and a new high-speed volumetric imaging technique called Swept Confocally-Aligned Planar Excitation (SCAPE)^{36, 37} to longitudinally track the activity of GCaMP6f-expressing L5 apical tufts in barrel cortex during a sensory discrimination task. We found that apical tufts underwent extensive dynamic changes in selectivity for task-relevant stimuli as performance improved, even though only one of the stimuli was unrewarded. These changes in responses persisted even after animals disengaged from the task, demonstrating that learning induced long-lasting changes in tuft sensory representations. Animals that were exposed to the same stimulation protocol without any reinforcement did not develop enhanced representations. Our results show for the first time that reinforcement learning expands apical tuft sensory representations along behaviorally relevant dimensions.

Results

Direction discrimination behavior

We devised an awake head-fixed mouse conditioning paradigm that enables controlled investigation of reinforcement effects across learning (Fig.1A,B). In addition to discriminating tactile objects, rodents are known to sense wind direction using their whiskers^{38, 39} and can be trained to discriminate different directions of whisker deflections^{40, 41}. With this in mind, we directed brief (100-ms) air puffs at the whiskers in either of two directions: rostrocaudal (backward) or ventrodorsal (upward). One of the directions was paired with a water reward delivered 500 ms after the air puff and thus constituted a conditioned stimulus (CS+). No reward was given for the other direction (CS-).

Mice rapidly learn to discriminate stimulus direction in head-fixed paradigm.
a, A water droplet is paired with air puffs in one direction (CS+) but not the other (CS-). Licking in anticipation of water is assessed in the response window just after CS+ or CS- and prior to water delivery for the CS+ (grey bar). b, Experimental timeline. 2-3 weeks after virus injection, naive tuft responses to stimuli are recorded (pre). The CS+ is then paired with water for 8-9 days (blue). On the last day, stimuli are presented without reward (post). In a separate group of mice, the same stimuli are presented over 9 days in the absence of reward (unrewarded group). c, Lick rasters for three different sessions in one example mouse. On session 9, the CS+ but not the CS- reliably elicits licks. d, Mean baseline-subtracted whisking amplitude aligned to the CS+ (red) and CS- (navy) across sessions 1, 2, and 9 of an example mouse. e, Learning curve demonstrates rapid learning. Mean probability of at least one lick in the response window across sessions. f, Behavioral performance of each mouse in the rewarded group (M1 – M7).

Licking and whisking were monitored throughout the session (Fig.1C,D). Stimuli elicited a brief passive whisker deflection followed by active whisking over the subsequent ∼1.5 seconds (analyzed below, Fig.6). Any anticipatory licks prior to reward delivery were counted as a response. Typically, on the first session, mice exhibited few anticipatory licks to either stimulus (Fig.1C, top, grey shading). By session 2 or 3, mice had learned an association between whisker deflection and reward, but could not discriminate the CS+ and CS- (middle). Within a week (by sessions 7-9), every mouse we tested learned to reliably lick to the CS+ while withholding licks to the CS-, performing substantially above chance after a single week of training (Fig.1C, bottom; Fig.1E,F). Thus, mice rapidly learned to discriminate the direction of whisker stimuli in our behavioral task.

Overall stimulus-evoked activity is unbiased and stable across conditioning

To investigate the effects of reinforcement learning on apical tuft activity, we imaged apical tufts (433 x 433 μm field of view) across conditioning days as well as on an unrewarded pre-conditioning day to measure naïve stimulus responses and an unrewarded post-conditioning day to detect any long-lasting changes in responses (Fig.1B). We virally delivered the gene for Cre-dependent GCaMP6f⁴² in the barrel cortex of Rbp4-Cre mice, which labels a heterogeneous population of pyramidal neurons comprising approximately 50% of layer 5^{28, 43, 44}. By targeting our injections to layer 5B, we predominantly labeled thick-tufted pyramidal neurons (see Methods). Using intrinsic signal imaging, we mapped the location of the C2, D2, and gamma whisker barrel columns and identified an overlapping region in layer 1 with sufficient GCaMP6f expression (Fig.2A). The air puff nozzles were aimed toward the whiskers corresponding to this region. Dendritic activity was longitudinally recorded from the same field-of-view (horizontal location and depth) in layer 1 across all sessions (Supplementary Movie 1).

Overall tuft response to stimuli is unbiased and relatively stable across conditioning.
a, Dendritic activity was recorded in layer 1 (i) in the C1/C2 barrel columns (ii). (i) Two-photon image ∼60 µm deep relative to pia. Dashed yellow lines denote C1 and C2 boundaries from intrinsic imaging. Reconstruction from⁵⁰. (ii) Tangential section through layer 4 showing barrels stained with streptavidin-Alexa 647 and GCaMP6f-expressing apical trunks. Red circles indicate location of 2-photon lesions to mark the imaging region for post-hoc analysis. b, Overlay of five segmented pseudo-colored tufts from imaging field in A(i). c, Time courses of calcium responses of example tufts in b to three air puffs (dashes). d, Amplitude for CS+ (red) and CS-responses (blue), computed for each segmented tuft in the first 1.5 s post-stimulus (grey points), do not differ within or across sessions. Colored lines indicate median. e, Same as in d, showing data for all conditioning sessions.

To extract calcium signals from individual cells, we segmented tufts using CaImAn, a sparse non-negative matrix factorization method that clusters pixels according to their temporal correlation⁴⁵ (see Methods), and analyzed regions of interest exhibiting apical tuft structure (Fig.2B; 65 ± 15 tufts per mouse; mean ± SD). Individual segmented tufts were substantial in their spatial extent (>100 µm), reflecting tuft-wide voltage-gated calcium spikes rather than branch-specific N-methyl-D-aspartate (NMDA) receptor-mediated spikes. All calcium analyses hereafter refer to tuft-wide calcium spikes. Average responses to an event include failures. In many tufts, the CS+ and CS- reliably evoked an influx of calcium that robustly activated the tuft (e.g. Fig.2C). Successful calcium events across tufts averaged 28% ΔF/F, consistent with previous studies of layer 5 apical dendrites^{17, 31}. Interestingly, during intermediate but not early learning, the average population response to the CS+ exhibited a two-peak structure (Supp Fig.1, session 4) similar to tuft reward-related signals we observed previously in barrel cortex²⁸. By the last-rewarded and post sessions, the second CS+ peak was no longer visible, which could be an endpoint of mice learning that the conditioned stimulus predicts the upcoming reward.

Reward can alter somatic receptive fields in the auditory, visual, and somatosensory cortex of both rodents and non-human primates such that rewarded stimulus representations become more robust after learning^{4, 5, 28, 46}, although cortical sensory responses can remain unchanged during learning⁴⁷. We investigated whether calcium responses to the CS+ increased in the tuft population as animals learned its association with reward (Fig.2). Average responses of tufts to the CS+ and CS- were similar during the pre-conditioning session (Fig.2D; p = 0.20, signed rank test, n = 440 pre tufts and 418 post tufts), indicating that there was no inherent bias in the population toward a particular stimulus in naïve animals. Surprisingly, even after learning, responses to the CS+ and CS- were similar on the last- and post-conditioning sessions (p = 0.62, 0.64, respectively, signed rank test), revealing that no bias develops for the CS+ among dendritic tufts. Only a minority of tufts exhibited statistically significant average responses to air puff stimuli (CS+ responsive: 26 ± 8%; CS- responsive: 25 ± 8%; mean ± SD across all sessions). When we excluded responses that were not statistically significant (see Methods), we again found no difference between the average response amplitudes to the CS+ and CS- on the pre, last-rewarded, and post sessions (p = 0.65, 0.31, and 0.69, respectively, rank sum test; data not shown). Similarly, the probability of transients in response to CS+ versus CS- (see Methods) did not differ during pre-conditioning (p = 0.66) or post-conditioning sessions (p = 0.44). Therefore, reinforcement learning in our paradigm does not bias tuft representations toward the rewarded stimulus.

While a bias for the CS+ did not develop after learning, we wondered whether overall tuft responses to both conditioned stimuli increased as animals learned the task. Linear regression analysis revealed that conditioning session number was a poor predictor of both CS+ and CS- amplitudes (All tufts R², CS+: 0.0064, CS-: 0.0035, Fig.2E; Significantly responding tufts R², CS+: 0.014, CS-: 0.014, data not shown). We did find a small but significant decrease in amplitude from pre to last for CS+ (p < 0.01) and CS- (p < 10^-7), but this was not permanent: amplitudes did not significantly differ between the pre and post sessions (Fig.2D; p = 0.53, 0.33, CS+ and CS- respectively, Wilcoxon rank sum test). Taken together, these findings demonstrate that reinforcement learning does not robustly bias the magnitudes of tuft calcium responses to either stimulus at the population level.

Development of tuft selectivity with task learning

While learning produced no bias in overall tuft activity, learning might enhance selectivity for conditioned stimuli. Barrel cortex neurons are tuned to the angle of whisker deflection^48–50, indicating that the sets of synaptic connections activated by the CS+ and CS- may be overlapping but should not be identical. Therefore, the possibility exists that responses to the CS+ and CS- can change independently of each other. To examine this, we compared the amplitude of the average response to CS+ and CS- trials for all segmented tufts on the pre, last-rewarded, and post sessions (Fig.3A; n = 7 mice; 465 pre, 442 last-rewarded, and 430 post tufts). In agreement with our previous analysis, we found no significant bias toward CS+ or CS- during any of the three sessions (Pre: p = 0.20; last-rewarded: p = 0.43; Post: p = 0.64, sign-rank test). Under naïve conditions during the pre session, most tufts that responded to air puff stimuli did not strongly prefer the CS+ or CS- (Fig.3A, left). Surprisingly, on the last-rewarded session and the unrewarded post-conditioning session, we observed a prominent shift in the response distribution, where many tufts exhibited more selective responses to one stimulus or the other (Fig.3A, middle and right).

Reinforcement learning, but not stimulus exposure, enhances tuft selectivity for CS+ and CS- stimuli.
a, Across the indicated sessions, individual tufts (circles) exhibit larger biases to CS+ or CS- (pooled across all conditioned mice). b, Repeated exposure to stimuli does not bias individual tufts to CS+ or CS-. c, Conditioning reshapes distribution of selectivity indices for tufts from Normal on pre-conditioning session to uniform on post-conditioning session. d, Distribution of tuft selectivity indices remains Normal throughout all repeated exposure sessions. e, Selectivity (median SI magnitude of tufts for each session) increases with behavioral performance of 6 animals. f, Neural discriminability (mean ± sem) of tufts, pooled across all animals on each session, increases with conditioning and decreases with repeated exposure.

Plasticity can occur after repeated exposure to stimuli even in the absence of reinforcements^51–55. To test whether enhanced selectivity depended on reinforcement, we imaged a separate group of similarly water-restricted mice that were repeatedly exposed to the same stimuli for the same number of days but without any reward. Repeated exposure mice exhibited a stable distribution of response selectivity over time (Fig.3B; a separate cohort of 7 mice; 317, 313, and 321 tufts on Day 1, Day 8, and Day 9, respectively). These results suggest that reinforcement learning, and not simply repeated stimulus exposure, drives apical tufts to become more selective for either the CS+ or CS-.

To directly quantify the response selectivity of tufts, we computed a selectivity index (SI; see Methods) ranging from −1 (exclusively CS- responsive) to 1 (exclusively CS+ responsive) for each tuft. Initially in both the conditioned and repeated exposure mice, the SI distribution was centered around zero, indicating that most tufts in naïve animals did not strongly prefer either stimulus (Fig.3C,D, left panels). Consistent with our other analyses (Fig.2D), the mean SI remained close to zero for each of the three sessions (−0.049, −0.001, and 0.003 for pre-conditioning, last rewarded, and post-conditioning days, respectively), confirming that learning produced no overall bias toward one particular stimulus among the population. During learning, the SI distribution of conditioned but not repeated exposure mice shifted markedly, whereby a much greater proportion of neurons were highly selective for either the CS+ or CS- (Fig.3C,D, middle and right panels, |SI| pre versus last-rewarded: p < 10^-6, |SI| pre versus post: p < 10^-5; Wilcoxon rank sum test). These effects can even be observed within individual mice, with learning significantly increasing tuft selectivity in individual conditioned mice, but not repeated exposure mice (Supp.Fig.2). The degree of enhancement in tuft selectivity was closely correlated with conditioned animals’ ability to discriminate stimuli across sessions (Fig.3E; Pearson’s R = 0.60, p < 10^-5).

Whereas selectivity magnitude (|SI|) only considers the amplitude of tuft responses to CS+ and CS-, their discriminability also depends on their variability. For example, a large difference in CS+ and CS- responses would not be discriminable if the variability of those responses were very high; a small difference might be discriminable if the variability were low. We therefore additionally calculated a d-prime metric of neural discriminability that normalizes differences in response magnitudes to each stimulus by their variability (see Methods). In conditioned animals, neural discriminability of CS+ and CS- responses of tufts increased significantly across learning (Fig.3F, blue; first-rewarded versus last-rewarded: p < 10^-3, pre versus post: p < 10^-4; Wilcoxon rank sum test). By contrast, neural discriminability of tuft responses in the repeated exposure mice decreased slightly with progressive exposure to the stimuli (Fig.3F, gray; Day 1 versus Final: p < 0.01). Taken together, these results show that enhanced stimulus representations can emerge in apical tufts, but require reinforcement.

The above analyses rely on the accurate measurement of calcium spikes from individual tufts. While two-photon microscopy acquires images with high resolution and speed, the imaging field is restricted to a single focal plane. This method can only measure calcium signals from a thin cross-section of the three-dimensionally complex apical structures. Indeed, many of the spatial components extracted from our two-photon data were comprised of dendritic branches that cross the imaging plane at different locations (Supp.Fig.3A), which makes it difficult to determine whether the segmentation software accurately extracted signals from one tuft or erroneously merged multiple tufts. For the same reasons, a single apical tuft could be falsely classified as two different tufts. Such errors could mislead our interpretation of selectivity in the population, especially given that a single apical tuft can exhibit non-homogenous branch-specific events^{15, 56, 57}.

To confirm that our interpretation was not due to segmentation errors, we repeated the conditioning experiment using a new, high-speed volumetric imaging approach called SCAPE^{36, 37}, which allowed us to monitor calcium across entire apical tufts (Supplementary Movie 2). These three-dimensional datasets (300 × 1050 × 234 μm field of view) encompassed large portions of the apical tree which included branches converging on their bifurcation points in layer 2, enabling us to identify whole apical trees unambiguously (Fig.4A,B; Supp.Fig.3B). CaImAn effectively demixed overlapping trees in these three-dimensional volumes. SCAPE microscopy, we imaged tuft activity of additional mice conditioned with the same behavioral paradigm (Fig.4C). Comparison of tuft responses to the CS+ and CS- on the pre, last-rewarded, and post sessions (Fig.4D; 241 pre, 215 last-rewarded, 150 post tufts in 2 mice) revealed again that task learning induced significant increases in tuft selectivity (Fig.4E; pre versus last-rewarded: p < 10^-5, pre versus post: p < 10^-4, Wilcoxon rank sum test of |SI|). On average, the SI magnitudes were similar between tufts imaged using 2-photon microscopy and SCAPE (mean ± s.e.m. |SI| for 2-photon versus SCAPE; pre: 0.41±0.01 versus 0.40±0.02; last-rewarded: 0.54±0.02 versus 0.54±0.02; post: 0.51±0.02 versus 0.53±0.03). These data demonstrate that the effects in our two-photon dataset are not caused by errors in segmentation, but rather reflect changes at the level of individual dendritic tufts. Our results, based on two different imaging approaches, clearly demonstrate that reinforcement increases stimulus selectivity at the level of the entire apical tuft.

High-speed volumetric imaging of apical tufts confirms the emergence of enhanced selectivity after learning.
a, Top and side view of four example tufts segmented from volumetric SCAPE imaging. b, Time courses of calcium activity from example tufts in a during five presentations of air puff stimuli (dashes). c, Performance across all conditioning sessions of two mice that were imaged with SCAPE. d, Across the indicated sessions, individual SCAPE-imaged tufts (circles) exhibit larger biases to CS+ or CS-. e, Conditioning reshapes selectivity distribution from Normal to uniform.

Selective tufts emerge from both initially unresponsive and responsive populations

The striking effect of reinforcement learning on tuft response selectivity could develop in several ways. For example, initially unresponsive tufts could develop a robust response to either stimulus after learning (e.g., Fig.5A, top). Conceivably, tufts that were initially unselective in naïve animals could also maintain their response to one stimulus while losing their response to the other (e.g., Fig.5A, middle). Either or both scenarios could lead to the increase in neurons that are selective for stimulus direction. To investigate which changes in individual tufts underlie population-wide improvements in stimulus selectivity, we longitudinally tracked the same set of tufts across all sessions and compared their selectivity in pre- and post-conditioning sessions for both conditioned and repeated exposure mice.

Longitudinal tracking reveals that reward enhances the selectivity of both initially unresponsive and responsive tufts.
a, Three example tufts that were longitudinally tracked across learning. Top row: An initially unresponsive tuft develops a robust response to the CS+ but not the CS- after learning. Middle row: A responsive but unselective tuft loses its robust CS+ response and becomes selective for the CS-. Bottom row: A CS- selective neuron becomes unresponsive to both stimuli. b, Tufts that were unresponsive during the first session were longitudinally tracked to the last session. Plotted is the mean proportion of selective and unselective neurons across all animals in the conditioned (black bars) and repeated exposure (grey bars) groups. **c,d,** Same analysis as b for initially selective (c) and unselective (d) tufts. Two-sample t-test was used for comparisons between conditioned and repeated exposure groups. Paired t-test was used for comparisons within a group. * p < 0.05. e, Total tuft counts from first to last session within the 3 response categories for either conditioned (left) or repeated exposure (right) groups. f, SI of responsive tufts on the last session that were initially unresponsive during the first session. Conditioned tufts have enhanced selectivity compared to repeated exposure. g, Tufts that were selective on the last session are more selective if conditioned (black) rather than undergoing repeated exposure (grey). h, Tufts that responded on both pre and post sessions tend to have higher selectivity if conditioned rather than undergoing repeated exposure. i, SI of responsive tufts on the first session that later became unresponsive during the last session.

First, we categorized tufts that were unresponsive to either stimulus on the first imaging session, which accounted for the large majority of tufts (conditioned: 458/603; repeated exposure: 334/457), and compared their response to the CS+ and CS- on the last session to determine if they became selective (Fig.5B, see Methods). Stimulus-unresponsive tufts, while on average less active than responsive ones (median calcium events per minute: 2.65 versus 3.66 for stimulus-unresponsive and responsive tufts, respectively; p < 10^-40, Wilcoxon rank sum test; Supplementary Fig.4), were not silent, with many undergoing tuft-wide calcium influx several times per minute. Silent tufts that are never active during the session may not have been detected in our imaging, but we were able to detect tufts that discharged as few as 3 voltage-gated calcium spikes over a 30-minute behavioral session. Interestingly, in both the conditioned and repeated exposure mice, approximately 40% of initially unresponsive tufts developed a response to at least one stimulus by the last session, becoming either selective or unselective (Fig.5B). However, in conditioned animals, the proportion of initially unresponsive tufts that became selective was significantly larger than in repeated exposure mice (p = 0.04, 2-sample t-test comparing mice). Furthermore, while the proportion of selective and unselective tufts in this category was similar for conditioned animals, unselective tufts were more common in repeated exposure mice (p = 0.03, paired t-test).

Next, we analyzed tufts that were initially responsive and either selective (Fig.5C; conditioned: 56/603, RE: 43/457) or unselective (Fig.5D; conditioned: 89/603, repeated exposure: 80/457). In these smaller categories, we found no significant differences in the outcome of selectivity between the two groups of animals. Together, these results indicate that, while both stimulus exposure and reinforcement can alter tuft tuning, the presence of reward increases the likelihood that initially unresponsive tufts develop selectivity for either the CS+ or CS- (summarized in Fig.5E).

While a greater proportion of tufts from the conditioned animals were selective during the final session (20.2% versus 10.3% of tufts from conditioned and repeated exposure mice, respectively), we wondered whether conditioning also impacted the degree of selectivity. Note that some tufts had very small yet statistically different CS+ and CS- response amplitudes and were thus classified as selective despite a small SI. First, we compared the SI of initially unresponsive tufts on the final imaging session (Fig. 5F). Supporting our results in Fig. 5B, the SI distribution was shifted toward the tails in conditioned, but not repeated exposure mice, indicating that reward enhances selectivity for either the CS+ or CS- in this subset (|SI| conditioned versus repeated exposure: p < 10^-5, Wilcoxon rank sum test, n = 199 and 110 tufts, respectively).

Next, we compared the |SI| of all tufts that were categorized as selective during the last imaging session in conditioned and repeated exposure mice (Fig. 5G). Interestingly, we found that even among selective tufts, the |SI| distribution in conditioned mice was significantly greater than in repeated exposure mice (p = 0.006, Wilcoxon rank sum test, n = 122 and 47 tufts, respectively), indicating that while selective tufts are present after both conditioning and repeated stimulus exposure, the magnitude of selectivity is stronger after conditioning.

We then quantified the change in |SI| of all tufts that were responsive in both the first and last sessions by computing the difference between the two sessions (Fig. 5H). Tufts in conditioned mice exhibited a greater increase in |SI| across sessions compared to repeated exposure mice (p = 0.01, Wilcoxon rank sum test, n = 48 and 42 tufts, respectively), demonstrating that the magnitude of selectivity in initially responsive tufts increases after reinforcement learning.

Finally, we found that the degree of selectivity of tufts that eventually became unresponsive on the last session was overall similar between the two groups (Fig.5I, |SI| conditioned versus repeated exposure: p = 0.06, Wilcoxon rank sum test, n = 97 and 81 tufts, respectively). However, tufts that became unresponsive were more likely to be initially highly selective in the conditioned group than in the repeated exposure group (19 tufts with initial |SI| > 0.75 / 97 tufts ending as unresponsive in the conditioned group versus 3/81 in the repeated exposure group; p = 0.0013, Z approximation to binomial). Therefore, learning can involve a loss of responsivity in a small subset of well-tuned tufts.

In summary, our longitudinal analyses revealed that reinforcement learning biases initially unresponsive tufts toward becoming selective and enhances the selectivity of tufts that are initially responsive.

Neither movement nor behavioral choice account for enhanced selectivity

Several plausible factors could underlie the changes in selectivity we observed across learning. For instance, movements like whisking are correlated with layer 5 somatic action potentials^58–60 and might have impacted calcium activity in the apical tuft. To investigate whether whisking could account for the changes in tuft selectivity, we imaged the whiskers with a high-speed camera and computed whisking amplitude (see Methods) while mice underwent conditioning and two-photon imaging (Fig.6A). First, we considered whether animals changed their whisker movements in response to conditioned stimuli over the course of learning. We computed the peak of the mean stimulus-aligned whisking amplitude for the CS+ and CS- (Fig.1C, left; Fig.6B) for each session in five mice. Although conditioning alters licking behavior (Fig.1C,E), the magnitudes of whisker movements following both stimuli were stable across sessions (Fig.6B; CS+: p = 0.44; CS-: p = 0.45; linear regression). We also computed the standard deviation (SD) of stimulus-evoked whisker amplitude across trials for all sessions (Fig. 6C). While the whisking amplitude became slightly more reliable (decreased SD) across sessions (p < 10^-4), the change in reliability across sessions was similar for CS+ and CS- (p = 0.53). Therefore, whisking is similar on both trial types throughout learning.

Whisking is only weakly correlated with tuft activity and cannot account for changes in selectivity during learning.
a, Whisking amplitude aligned to calcium activity of three example tufts in one session. Green shading indicates periods of whisking. Red and navy ticks indicate CS+ or CS- delivery, respectively. b, Mean whisking response of five mice to CS+ (red) and CS- (navy) does not change across sessions during learning (mean ± s.e.m.). c, Mean standard deviation of whisking decreases for both CS+ and CS- across learning, but CS+ and CS- do not differ. d, Event-triggered averages of 322 tufts on the post-conditioning day (grey traces - individual tufts, black inset - population average) are responsive to stimuli but relatively unmodulated by whisking. e, R² values for linear models predicting calcium from stimuli (y axis) are consistently greater than those predicting calcium from whisking (x axis). Each circle represents a tuft. (n = 322 tufts) f, Magnitude of tuft selectivity does not correlate with mean whisking amplitude during CS+ (left) and CS- trials (right) on that session.

We next examined whether whisking was correlated with tuft calcium activity by comparing stimulus-triggered averages and intertrial interval (ITI) whisk-triggered averages of all tufts during post-conditioning. Whisking amplitude was similar between spontaneous ITI whisking bouts and evoked whisking responses to stimuli (n = 115 and 617 events, respectively; p = 0.53, Wilcoxon rank sum test). In contrast to air puff stimuli, ITI whisking bouts were not associated with a robust calcium response (Fig.6D).

To quantify the relationship of whisking and sensory stimuli to tuft calcium spikes, we performed a linear regression analysis (see Methods) on 322 tufts using calcium influx as the response variable and either stimulus or whisking amplitude as a single predictor variable (Fig.6E). Air puff stimuli more reliably predicted calcium influx than whisking amplitude for each of virtually all tufts (p < 10^-12, sign rank test). These results are consistent with other studies that found either only weak or no correlation between whisking and L5 tuft calcium spikes in S1^{28, 31, 32}.

Furthermore, we found no relationship between the whisking response and the median SI magnitude on a given session (Fig.6F, whisking to CS+ p = 0.22, CS- p = 0.78). Therefore, changes in whisker movement cannot account for the changes in selectivity during learning that we observed.

Finally, the possibility remains that other task-related signals relaying information about reward expectation and behavioral choice could impact apical tuft activity and drive increases in selectivity. To test this, we compared tuft responses to the CS- in false alarm trials (FA; mouse incorrectly licked for reward) and correct rejection trials (CR; mouse correctly withheld licks) to determine if their activity was modulated by behavioral choice. Notice that these two trial types have the same sensory input but involve different choices. (The corresponding analysis for CS+ trials is not technically possible for lack of sufficient Miss trials after the first conditioning day, an issue also observed in¹.) Tufts were classified as behaviorally modulated if the FA response was significantly different from the CR response, and were not behaviorally modulated if CR and FA responses were statistically indistinguishable (e.g. Fig.7A). Behaviorally modulated tufts accounted for only ∼10% of the total tuft population in both early and late learning (50/395 in early; 35/406 in late learning).

Behavioral responses do not account for enhancement of stimulus selectivity during learning.
a, Mean stimulus responses of four tufts during hit (red), CR (cyan), and FA (black) trials. Top row: Example tufts whose responses are not behaviorally modulated (CR is similar to FA). Bottom row: Example tufts with behaviorally modulated responses (CR and FA differ). b, Selectivity index (SI) distribution changes from early (left) and late learning sessions (right) even when tufts with behaviorally modulated responses (CR≠FA) are excluded. c, Median SI magnitude of tufts in each of six animals (from panel b) increases from early to late learning sessions.

To test whether these behaviorally modulated tufts contributed to increased selectivity during learning, we excluded them and compared selectivity of the remaining behaviorally-insensitive tufts. We found that selectivity increased significantly from early to late learning (Fig.7B,C; median |SI| of 345 tufts early versus 371 tufts late learning: 0.38 versus 0.47, p = 0.02, Wilcoxon rank sum test), similar to our previous analysis of the entire population. Licking, like whisking, was a relatively poor predictor of tuft calcium influx (Supp.Fig.5A,B). Because some behaviorally modulated tufts may not have been statistically detectable, we used multivariate linear regression to disentangle stimulus responses from licking and whisking, which may have been confounded with choice. Median coefficients for licking and whisking were on average 3.3 times smaller than median stimulus coefficients for the first rewarded, last rewarded, and post sessions (all p < 10^-6, Wilcoxon rank sum test). Even after we factored out possible effects of movements, CS+ and CS- coefficients were enhanced by learning but not repeated exposure (Supp.Fig.5C,D), consistent with our other analyses. Together, these results demonstrate that enhanced selectivity during learning cannot be explained by non-sensory signals related to the animals’ behavior.

Enhanced selectivity in barrel cortex is long-lasting when mice exclusively use whiskers

Mice could conceivably exploit other sensory cues to learn and perform the task, such as auditory cues from the air nozzles or non-whisker tactile cues from air current eddies contacting the fur or skin. To determine which mice exclusively used their whiskers to distinguish the CS+ and CS-, we trimmed all whiskers after the post-conditioning session and assessed performance in five mice. Performance in each of the five mice decreased after whisker trimming, indicating that each used some whisker information. Three mice performed the task exclusively with their whiskers, falling to chance levels after the whisker trim (“whiskers only”). Two other mice still performed the task above chance after the whisker trim, indicating that they were not exclusively using their whiskers and exploited information from multiple sensory streams (“whiskers + other senses”).

We examined whether these two different behavioral strategies impacted tuft selectivity. Both the “whiskers only” and “whiskers + other senses” groups exhibited enhanced tuft selectivity in the last-rewarded session relative to pre-conditioning. This effect was more pronounced in the “whiskers only” mice (Fig.8A,B, left and middle; whiskers only: median |SI| of 180 pre tufts versus 169 last-rewarded tufts: 0.36 versus 0.59, p < 10^-3; “whiskers + other senses”: median |SI| of 144 pre tufts versus 155 last-rewarded tufts: 0.39 versus 0.50, p = 0.01). Surprisingly, enhanced selectivity persisted during the post-conditioning session for the “whiskers only” group but not the “whiskers + other senses” group (Fig.8A,B right panels; whiskers only: median |SI| of pre versus 167 tufts post: 0.36 versus 0.58; p < 10^-3; whiskers + other senses: median |SI| of 155 pre versus post tufts: 0.39 versus 0.42; p = 0.45). Therefore, tuft selectivity in barrel cortex is enhanced regardless of behavioral strategy, but outlasts conditioning only when mice rely solely on their whiskers to perform the task.

Apical tufts in barrel cortex of mice performing the task exclusively with their whiskers undergo long-lasting changes in selectivity.
a, SI histograms of mice performing the task exclusively with their whiskers exhibit increased selectivity across pre-conditioning, last-rewarded, and post-conditioning sessions. b, Relative to pre-conditioning, mice using their whiskers and other sensory cues to perform the task have increased selectivity during the last rewarded session, but not the post-conditioning session. c, The probability of anticipatory licks in response to the CS+ extinguishes across post-conditioning blocks (of 20 trials each). d, Tuft selectively remains uniformly distributed during post-conditioning trial blocks 1-2 (top) while licking is extinguishing, and blocks 3-4 (bottom) in which licking is extinguished.

We further examined this persistence of enhanced tuft selectivity as experienced mice stopped performing the task. While the entire post-conditioning session was unrewarded, mice initially expected rewards and licked for many CS+ trials in the first half of the session. By the second half of the session, the probability of a lick occurring during the CS+ extinguished, approaching zero (Fig.8C). We compared the selectivity of tufts during the first and second halves of the post-conditioning sessions of mice that exclusively used their whiskers and found no difference in the two distributions (Fig.8D, p = 0.94, Wilcoxon rank sum test of |SI|), demonstrating that selectivity of the population remained stable throughout the session. Taken together, these results demonstrate that enhanced stimulus selectivity of apical tuft dendrites after reinforcement learning is long lasting, persisting even after mice cease performing the task and expecting reward.

Discussion

Our study is the first to investigate how learning a discrimination task alters apical tuft activity. Using both novel volumetric whole-tuft imaging and conventional planar microscopy, we discovered that L5 apical tufts acquire enhanced representations of multiple stimuli during learning. Rather than simply retuning tufts toward the rewarded stimulus, learning enhanced selectivity for both stimuli, suggesting that tufts are aligning themselves to the behaviorally relevant stimulus dimensions. These enhanced sensory representations persist even after mice cease performing the task. In contrast, representations are slightly degraded by mere repeated exposure to stimuli outside of a task. Consistent with previous studies^{28, 31}, we found that movement in and of itself has little direct impact on tuft spikes, indicating that increased selectivity of apicals reflects alterations in sensory coding as animals learn. This sensitization of tufts to behaviorally relevant sensory dimensions may be a general feature of all sensory cortical areas.

Tuft spikes enhance plasticity of synaptic inputs that occur over behavioral (seconds-long) timescales^{18, 34}. These new behaviorally relevant tuft representations may therefore prime subsequent plasticity of synapses across the entire pyramidal neuron. Additionally, tuft events potently modulate somatic burst firing and enhance how somata respond to their basal inputs^{15, 61}. As learning and plasticity increase apical selectivity for a behaviorally relevant axis, tuft events will unavoidably amplify somatic burst output along the same axis. This could enable action potential output of L5 cells in primary sensory cortex to directly drive behavioral responses via projections to movement related areas, such as the corticostriatal, corticopontine, and corticotrigeminal pathways. Thus, tuft spikes have the potential to modify somatic output, both in the present and in the future.

Enhanced Representation of Behaviorally Relevant Stimuli

Enhancing the representation of relevant stimulus dimensions rather than a singularly important stimulus, such as a rewarded event, has multiple benefits for behavior. In our paradigm, both the CS+ and CS- are predictive of whether or not a reward will occur in the future. Explicitly encoding both stimuli could allow sensory cortical areas to directly elicit actions. In the context of this task, CS+ preferring tufts in barrel cortex may trigger anticipatory licking while CS- preferring tufts could suppress licking. L5 cells in sensory cortex via their output to striatum, pons, brain stem, and spinal cord would thereby be able to directly and rapidly drive action without further cortical processing, such as by frontal areas including motor cortex^{32, 62}. Such rapid sensory-motor transformations by primary sensory areas may be critical for natural time-constrained behavior.

Furthermore, learning produced a representation in which the degree of selectivity for the two stimuli was continuous and uniformly distributed. Exclusively CS+ or CS- selective apicals never dominated the population. Continuous degrees of selectivity across the population, rather than discrete representations, may allow the system to be more robust to the variability caused by active movements that alter sensory input. A continuous distribution may also facilitate future adjustments of neural representations as subjects continue to learn a task or encounter new tasks. The uniformity we observed may reflect that neurons are high-dimensional, being sensitive to mixtures of variables^{60, 63–65}, only one of which might be altered here by learning. The uniform distribution of selectivity corresponds to a full range of pessimism to optimism concerning stimulus predictions of upcoming rewards. Recent work shows that behavioral performance benefits from reinforcement learning that incorporates the distribution of reward probabilities rather than just the average expected reward value⁶⁶. L5 corticostriatal synapses could theoretically afford a plastic substrate for acquiring the necessary distribution of reward probabilities.

Surprisingly, past studies in which mice were trained to associate one or more stimuli with a reward typically show that cortical representations are stronger for the rewarded stimulus ^1,3,5. In contrast to these studies of layer 2/3 somatic activity, our experiments revealed that the overall tuft calcium response to the CS+ and CS- at the population level did not change significantly after animals learned the task (Fig.2). Instead, representations for both stimuli were enhanced by individual tufts developing selectivity for either the CS+ or the CS- (Fig.3). This divergence in phenomena may result from several important differences between our work and the aforementioned studies.

First, enhanced selectivity for both rewarded and unrewarded stimuli could be a phenomenon that is unique to the apical dendritic tufts. In addition to local inputs, the apical tufts of pyramidal cells in S1 receive long-range top-down input from several sources, including motor cortex^{31, 67}, secondary somatosensory cortex¹¹, and secondary thalamus^{9, 10, 68}. Frontal areas, such as prefrontal cortex, indeed have enhanced representations of the CS+ and CS- after learning⁴⁷. In contrast, input to the somata is dominated by the local cortical area and primary thalamus^{69, 70}. While somato-dendritic coupling can be strong in L5 neurons²⁵, it is asymmetric; at least 40% of somatic transients attenuate in a distance-dependent manner along the apical trunk and distal tufts²⁴. The non-overlapping anatomical inputs and asymmetric coupling together could produce different learning-related effects on apical tuft and somatic stimulus representations.

Second, learning-related changes may manifest differently in layer 2/3, the usual focus of previous studies^{1, 3}, and layer 5 pyramidal cells, the tufts of which we studied. With the exception of a small population of corticostriatal cells, most excitatory cells in layer 2/3 project to other cortical areas to affect further cortical processing^{71, 72}. In contrast, many L5 cells project to subcortical structures including the thalamus, superior colliculus, and brainstem, which may directly trigger behavioral responses^73–75. In discrimination paradigms, both stimuli are relevant to behavior. In our task, the CS+ prompted licking to obtain a reward, and the CS- suppressed licking that would have no benefit. Thus, an enhanced representation of both stimuli in layer 5 would be advantageous for animals to perform the task efficiently. Recently, it was shown that apical dendrite activation of subcortical-targeting pyramidal tract L5 cells, but not intratelencephalic L5 cells that are more like L2/3 cells in their connectivity, determines the detection of tactile stimuli³². The Rbp4-Cre mice we used in this study labels a heterogenous population of layer 5 pyramidal cells, comprising both pyramidal tract and intratelencephalic neurons. In the future, it would be interesting to examine whether learning has different effects on the sensory representations of these two populations. Moreover, direct comparisons of the layers would be particularly informative.

Candidate Plasticity Mechanisms

Enhanced selectivity could be due to changes in local synaptic connectivity, long-range inputs, or both. Learning may strengthen and weaken synapses onto barrel cortex neurons from ascending thalamocortical input or from neighboring cells. Such local plasticity could enhance CS+ or CS- responsiveness. Alternatively or additionally, other cortical regions encoding task context could via long-range inputs reconfigure barrel cortex to respond more strongly to these stimuli. The present results do not completely distinguish between these two scenarios because long-range inputs may still encode the context while the mouse is in the behavioral apparatus. However, we found that enhanced representations persist after mice are no longer engaged in the task and receiving rewards. This result suggests that enhanced representations may be a product of local plasticity in sensory cortex that alters receptive fields.

Even in the absence of reward, repeated exposure to stimuli can drive plasticity in sensory cortex and alter response tuning. For instance, repeated exposure to oriented gratings can alter the orientation tuning of cells in primary visual cortex^51–53, and overstimulation of whiskers induces plasticity at dendritic spines and alters whisker representations in somatosensory cortex^{54, 76}. Our results demonstrate that at the population level enhanced representations developed only when stimuli were behaviorally relevant. Our longitudinal analysis revealed that while the response dynamics of some tufts changed after repeated stimuli presentations, overall selectivity of the population did not increase when rewards were omitted (Figs.3&5). This raises the question: What are the mechanisms that drive enhanced selectivity under rewarded conditions? In one possible scenario, reward delivery causes the release of neuromodulators that augment the activity of apical tufts. Cortical layer 1 is innervated by cholinergic afferents from the nucleus basalis⁷⁷ and adrenergic afferents from the locus coeruleus⁷⁸, the main source of acetylcholine and norepinephrine, respectively. Salient events such as reward and arousal lead to the release of these neuromodulators in cortex^{79, 80}, which could increase the excitability of apical dendrites by recruiting disinhibitory circuits or directly influencing dendritic currents^{26, 27, 79, 81}. In this model, the release of reward-driven neuromodulators promotes plasticity and an enhanced representation of temporally aligned sensory inputs. This phenomenon was demonstrated in auditory cortex, where tones paired with stimulation of the nucleus basalis shifted the tuning of neurons toward the frequency of the paired stimulus⁸².

Why are representations of the CS- equally enhanced when there is no associated reward? One explanation is that, as mice learn that the CS- indicates absence of reward, the CS- effectively signals punishment and acquires negative value. Acetylcholine is released in response to aversive stimuli, and can activate disinhibitory microcircuits that reduce inhibition onto pyramidal cells and may be essential for learning^{83, 84}. Thus, it is possible that both the CS+ and CS- representations are enhanced by neuromodulatory mechanisms tied to reward and punishment, respectively. An open question is whether the outcome is due to reinforcement learning or the behavioral state brought on by the reinforcers rather than their valence. Sensory cortical plasticity may not be tied to reinforcer valence. Our paradigm creates an environment where mice benefit from being attentive and engaged in order to maximize reward while minimizing effort. Previous work has shown that active engagement in a visual discrimination task was associated with significantly higher selectivity in layer 2/3 cells in visual cortex¹. Task engagement may lead to a sustained increase in neuromodulator release throughout the conditioning session, priming the apical dendrites for plasticity and the development of selective responses for task-relevant stimuli as they learn.

What determines whether a particular tuft eventually becomes selective for the CS+ or CS-? Our longitudinal analysis revealed that many tufts that were initially unresponsive to either stimulus developed a highly selective response to either the CS+ or the CS- (Fig.5). In these tufts, stimulus preference after learning might be seeded by initially weak, directionally selective inputs on to the neuron that already exist prior to conditioning and that are potentiated by the learning process. We also found tufts that initially exhibited robust responses to both stimuli and either lost or significantly reduced their response to one stimulus after learning. The reduction of an apical response to a particular stimulus could be driven by local disynaptic inhibition between L5 pyramidal cells mediated by the apical-targeting Martinotti cells^85–87. Through this mechanism, L5 neurons that are selective for a particular stimulus could inhibit responses to that stimulus in neighboring L5 apical tufts. Experiments that assess the tuning of excitatory and inhibitory inputs onto apical dendrites as a function of learning could test such mechanisms.

In addition to demonstrating increased tuft selectivity with learning, we replicated a surprising phenomenon in a previous instrumental behavior in which a population of apical tufts exhibit activity around the time of reward²⁸. This reward-related activity was observed only during CS+ trials and was most prominent during intermediate conditioning sessions, when most animals were still performing at chance levels, and disappeared completely by the final conditioning session (Supp.Fig.1). Other than this transient effect, unconditioned stimuli did not appear to elicit calcium responses, consistent with our previous findings²⁸. The disappearance of this reward-related peak might be attributable to the reward becoming predictable in later stages of learning. In previous classical conditioning experiments, dopaminergic cells exhibit responses to rewards early in learning due to the novelty of an unexpected stimulus. These responses are lost after extended training, as animals learn the association between the CS and reward^{88, 89}. While dopaminergic terminals are sparse in primary sensory areas, they are not entirely absent, nor are dopaminergic receptors. Furthermore, the excitability of the apical tuft is sensitive to noradrenaline²⁶. Interestingly, noradrenergic neurons in the locus coeruleus exhibit a similar phenomenon to dopaminergic neurons, where responses shift from temporal alignment with the reward to a predictive conditioned stimulus after learning⁹⁰. Such mechanisms could explain why reward-related activity is restricted to early-to-intermediate learning in our paradigm.

Global versus local dendritic spikes

Apical dendrites exhibit not only global spikes that elicit calcium influx across the entire tuft, which we exclusively analyzed here, but also local events known as NMDA spikes, which typically engage short (<30-μm) segments of individual dendritic branches^{15, 31, 57}. These local, NMDA receptor-dependent events can promote prolonged plasticity within individual dendritic branches in the absence of backpropagating actions potentials, a feature that is unique to the apical dendrites¹⁶. In motor cortex, branch-specific NMDA spikes are crucial for establishing the long-lasting plasticity necessary for learning⁵⁶, and depolarization provided by multiple local NMDA spikes is thought to be essential for the generation of a global calcium spike triggered by distal synaptic inputs¹⁵. We focused this study on global tuft-wide calcium events, rather than local events. Local events are more difficult to unambiguously identify in planar imaging⁹¹, and their existence in vivo is still an open question for L5 apicals in barrel cortex^{31, 57}. Nonetheless, they may play important roles in plasticity processes that eventually lead to the emergence of global tuft spike selectivity for stimuli. Volumetric microscopy studies, the feasibility of which we showed here, are needed to further investigate the existence of local events in such behaviors as well as examine possible relationships between local and global tuft events during reinforcement learning. However, it would be essential to verify that seemingly spatially overlapping local and global events derive from the same dendritic tree, which requires greater resolution than was practical for the present study.

To analyze activity of individual tufts, we segmented these structures based on spatiotemporal covariance⁴⁵. This method does not discount the possibility of errors where one tuft is split erroneously into two trees, or where two highly correlated tufts are merged. With this in mind, we used volumetric imaging SCAPE microscopy, which allowed us to visualize the apicals in three dimensions and unambiguously screen for such artifacts. The results from SCAPE are quantitatively similar to those from two-photon microscopy, and confirm that our observation of enhanced selectivity with learning is not an artifact of planar imaging.

Stability of learned tuft representations

In contrast to previous studies of discrimination learning^1–3, we included an unrewarded post-conditioning session to examine whether learning-related effects persisted through extinction. Our results show that post-conditioning selectivity of the apical population remains significantly higher than pre-conditioning, even after animals stop licking in response to the CS+ (Fig.8). Interestingly, the effects of learning are much more pronounced in animals that relied exclusively on their whiskers to perform the task. In animals that apparently used other sensory modalities, we observed a modest increase from the pre to last-rewarded session, which seemed to be largely absent by the post-conditioning session. Considering that these animals were additionally exploiting other sensory areas to perform, selectivity may have been more widely distributed and thus diluted in barrel cortex, diminishing the effect and its stability. How long selectivity persists in the neuronal population after conditioning and which factors influence stability are interesting open questions for future study.

Conclusion

In summary, we have shown for the first time that reinforcement learning enhances representations along behaviorally relevant dimensions in apical tufts. Our results suggest that dendritic calcium spikes are an important cellular mechanism underlying the changes in sensory encoding that occur with learning, and provide an avenue for further investigation of cellular and circuit mechanisms underlying plasticity induced by perceptual experience and reinforcement. This cellular compartment may be key to understanding pathology in some cognitive, memory, and learning disorders.

Supporting information

Supplementary Movie 1. Example two-photon microscopy movie during behavioral session. Playback speed is in real time. “CS+” and “CS-” denote times of stimulus onset. 433 x 433 μm field of view.

Supplementary Movie 2. Example SCAPE microscopy movie during behavioral session. Top, maximum intensity projection (MIP) across the dorsal-ventral dimension showing horizontal extent of dendritic activity. Bottom, MIP across the medial-lateral dimension showing vertical extent of dendritic activity. Playback speed is in real time. “CS+” and “CS-” denote times of stimulus onset. 300 × 1050 × 234 μm field of view.

Acknowledgements

We thank Venkatakaushik Voleti for help with the design, construction, and alignment of the SCAPE microscope; Dan Kato, Georgia Pierce, and Jung Park for help with pilot experiments; Eftychios Pnevmatikakis and Johannes Friedrich for advice on dendrite segmentation; and Larry Abbott, Stefano Fusi, Ashok Litwin-Kumar, Chris Rodgers, Georgia Pierce, Gordon Petty, and Dan Kato for comments on the manuscript. Funding was provided by a Wellcome Trust Discovery Award, an Academy of Medical Sciences Professorship, NIH/NINDS R01 NS069679, and NIH/NINDS R01 NS094659 (RMB); a Kavli Institute for Brain Science Postdoctoral Fellowship (SEB); NIH/NINDS/NIMH/BRAIN U01 NS094296, UF1 NS108213, U19 NS104649, and RF1 MH114276 (EMCH).

Author contributions

SEB and RMB conceived of the behavioral and two-photon imaging experiments. EMCH and RMB conceived of the SCAPE imaging experiments. SEB built the behavioral apparatus, EMCH, KBP, and CC designed, built, and maintained the SCAPE microscope, and RMB built and maintained the two-photon and intrinsic signal microscopes. SEB performed the experiments and analyzed the data with input from RMB and EMCH. SEB and RMB wrote the manuscript.

Data availability

Due to the large volume of data (∼80TB), data are maintained by the authors and available upon request.

Methods

All experiments complied with the NIH Guide for the Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee of Columbia University. Sixteen C57BL/6 mice ranging in age from 77 to 316 days old (mean of 123 days) were used in these experiments. Six were male, and 10 female. Our results were observed in both male and female individuals, and no sex difference was detected.

Surgery

Animals were administered dexamethasone (1 mg/kg) via intramuscular injection 1-4 hours prior to surgery to reduce edema. Anesthesia was induced with 3% isoflurane in oxygen and maintained at 1%. Mice were head-fixed in a stereotax, and a subcutaneous injection of bupivacaine (0.5%, 0.1 mL) was administered under the scalp. Buprenorphine (0.05 mg/kg) was injected subcutaneously on the back. The scalp was cut, and the skull was covered with a thin layer of Vetbond. A circular craniotomy (3-mm diameter) centered at 1.5 mm posterior and 3.5 mm lateral to bregma was made using a dental drill. The dura was kept moist using artificial cerebrospinal fluid.

For both two-photon and SCAPE microscopy, Rbp4-Cre_KL100 mice were injected with 100 nL of virus (initial titer ∼2×10¹³ cfu/mL, diluted 1:4 in artificial cerebrospinal fluid) encoding GCaMP6f in a Cre recombinase-specific manner (AAV1-CAG-flex-GCaMP6f, UPenn Vector Core). The virus was injected in layer 5B of the barrel cortex (1.0 mm deep to the pia) using a pulled pipette (20-30 μm ID) fastened on a Nanoject III, which was mounted on a manipulator angled at ∼30° from vertical. The depth was chosen to maximize labeling of thick-tufted pyramidal neurons. In pilot experiments, we found that placing injections 1.0 mm deep resulted primarily in thick-tufted labeling whereas at more superficial depths (e.g., 0.8 mm deep) we obtained mainly thin-tufted tufts, consistent with ref ⁹². The dura was then removed, and a thin cover glass was implanted and sealed using superglue. A custom metal head plate was implanted on the skull using dental cement. Twenty-four hours after surgery, carprofen (5 mg/kg) was administered subcutaneously. Imaging and behavioral training commenced 3 weeks after surgery.

Behavior

Animals in both rewarded ‘conditioning’ and unrewarded ‘repeated exposure’ groups were water restricted for 2 days prior to starting imaging and habituated to head fixation for ∼10 minutes on each of these 2 days. They were subsequently given ∼1 mL of water per day for 9 days either by pairing water rewards with a specific stimulus (conditioning group), or in their cage following the imaging session (repeated exposure group). Mice were head restrained in a custom-made behavioral apparatus by positioning the body in a 3D-printed chamber and fastening the head plate to metal posts flanking the chamber. Air puff stimuli (10 psi measured before a control solenoid, 100 ms) were delivered from two nozzles (cut P200 pipette tips) positioned toward the distal tips of the whiskers, in either the rostrocaudal or ventrodorsal direction. Nozzles were oriented to prevent air jets from stimulating other parts of the face. One of these directions (CS+) was paired with a water reward (10 μL), delivered through a lick port 0.5 seconds after the stimulus onset. The particular direction (rostrocaudal vs ventrodorsal) used as the CS+ was randomized and counterbalanced across mice. Approximately 180 stimuli were presented over the course of a 30-minute imaging session (8-12-s intertrial interval). The probability of CS+ or CS- delivery was 50%. In preliminary experiments, we found that an auditory mask helped prevent mice from exploiting auditory cues to discriminate the two stimuli: a third air nozzle was positioned close to the mouse and was active throughout the session.

During the first session (pre-conditioning), stimuli were delivered in the absence of reward to assess neural and behavioral responses in naïve animals. In the following 7-9 days, the CS+ was paired with reward. Licks for rewards were detected with a capacitance-based touch sensor (Sparkfun). A trial response was registered when one or more licks were elicited within a 0.5- second response window following the stimulus and before reward delivery. To determine whether behavioral performance was above chance, we computed 95% confidence intervals using the ‘binofit’ function in MATLAB. During the final session (post-conditioning), stimuli were delivered in the absence of reward. Animals in the unrewarded group received the same two stimuli across 9 days without reward pairing. Behavioral experiments were performed with the Arduino-based OpenMaze open-source behavioral system, whose designs are fully described at www.openmaze.org. Whisking was monitored at 125 fps with a camera (Sony PS3eye) and automatically tracked using published software ⁹³.

Intrinsic signal optical imaging and two-photon imaging

Intrinsic signal optical imaging and two-photon imaging were performed on a Sutter movable objective microscope. The locations of whisker barrels in S1 were identified using intrinsic signal optical imaging. Single whiskers in isoflurane-anesthetized mice were stimulated at 5 Hz using a piezoelectric bimorph while recording the reflectance of 700-nm long-pass incandescent light with a Rolera CCD camera (QImaging) through a low-magnification objective (Zeiss 5X/0.16NA). Movies were collected using software custom-written in Labview (National Instruments). Regions of reflectance change were referenced to an image acquired under green illumination.

Two-photon imaging was conducted on the same microscope under the control of the ScanImage software package (V. Iyer, Janelia Farms). All calcium imaging data was collected by two-photon microscopy except for those in figure 4. Scanning during awake conditions was performed at 30 fps using a Chameleon Ultra II laser (Coherent) tuned to 920 nm, precompensated for group velocity dispersion and focused through a 20x/1.0NA water immersion lens (Zeiss). Emitted light was collected with an HQ535/50 filter (Chroma) and GaAsP photomultiplier tubes (Hamamatsu Photonics). Apical tuft tufts in Layer 1 were imaged at depths of 40-80 μm from the pial surface (433 x 433 μm field of view, 512 x 512 pixels).

SCAPE imaging

High-speed volumetric imaging was performed using a custom SCAPE microscope as previously described, including for dendritic tufts^{36, 37, 94}. Briefly, the cortex was illuminated with an oblique light sheet through a Olympus XLUMPLFLN 20XW 1.0 NA water immersion objective with a 2- mm working distance. Fluorescence excited by this sheet (extending in the y-z′ direction) was collected by the same objective lens. A galvanometer mirror in the system was positioned to both cause the oblique light sheet to scan from side to side across the sample (in the x direction) but also to de-scan returning fluorescence light. This optical path results in an intermediate, de-scanned oblique image plane that is stationary yet always co-aligned with the plane in the sample that is being illuminated by the scanning light sheet. Image rotation optics and a fast sCMOS camera (Andor Zyla 4.2+) were then focused to capture these y-z′ images (750 x 200 pixels) at >1000 frames per second as the sheet was repeatedly scanned across the cortex in the x direction. All other system parts, including the objective and sample stage, were stationary during high-speed 3D image acquisition. Data were reformed into a 3D volume by stacking successive y-z′ planes according to the scanning mirror’s x position and de-skewing to correct for the oblique sheet angle. This rotation of the image volume is responsible for its rectangular appearance despite the camera’s square frames. The resulting volumes were large enough to encompass many GCaMP6f-labeled tufts in barrel cortex,

In this study, the stationary objective lens in SCAPE was configured on a manual rotation mount and set to 20°-30° away from the standard upright configuration, so the optical axis was perpendicular to the cranial window to achieve optimal performance without tilting the head of the animal. A 488-nm laser (Coherent OBIS) was used for excitation (<10 mW at the sample) with a 500-nm long-pass filter in the emission path. To achieve optimal spatiotemporal resolution and volume rate, the sample was imaged with an x-direction scanning step of 3 μm over a 300 × 1050 × 234 μm field of view (x-y-z, 3.0 × 1.40 × 1.17 μm per voxel, 100 x 750 x 200 voxels) at 10 volumes per second (VPS). Our imaging involves no special practical considerations or limitations of field of view or resolution, beyond the usual imaging goal of maximizing FOV while maintaining sufficient resolution to discern structures of interest (dendrites).

Analysis

Two-photon movies were motion corrected using the NormCorre package ⁹⁵ in MATLAB. Spatial and temporal components for individual tufts imaged by two-photon and SCAPE were segmented using CaImAn v1.8.3, which employs large-scale sparse non-negative matrix factorization ^{45, 96}. CaImAn inherently corrects for background signal. All further analyses used custom-written routines implemented in MATLAB. Spatial components with tuft structural characteristics were identified and analyzed, while neuropil components were discarded.

To quantify a tuft’s response to stimuli, the mean stimulus-aligned ΔF/F was computed across all CS+ or CS- trials and corrected by the mean ΔF/F of the second before the trial. Probability of transients was obtained by taking each trial’s ΔF/F in the first 1.5 seconds following either the CS+ or CS- and fitting these data with a univariate mixture of two Normal distributions: (1- p)N(µ₁, σ₁) + pN(µ₂, σ₂). The smaller Normal reflects the distribution of failures, and the larger Normal the distribution of transient amplitudes following the stimulus. The parameter p captures the probability of transients.

From these data, a selectivity index (SI) was defined as (F_CS+ − F_CS-) / (F_CS+ + F_CS-), in which F_CS+ and F_CS- are the mean stimulus-aligned amplitudes (ΔF/F) to the CS+ and CS- within the first 1.5 seconds, respectively. This yielded values that range from −1 (exclusively CS- responsive) to 1 (exclusively CS+ responsive). Neural discriminability was defined as d’ = |F_CS+ − F_CS-| / √((σ²_CS+ + σ²_CS-)/2) where σ²_CS+ is the variance of the response amplitudes in F_CS+ and σ² is the variance of the response amplitudes in F_CS-.

For longitudinal analysis, tufts were categorized as stimulus responsive if they met two criteria: 1) Across all trials, the mean ΔF/F 1.5 seconds before and 1.5 seconds after the stimulus were significantly different according to the Wilcoxon rank sum test, for either the CS+ or CS-, and 2) the average response amplitude for that stimulus was greater than 0.04 ΔF/F. Tufts with a significant response to only one stimulus were categorized as highly selective and their |SI| was set to 1. To classify tufts as behaviorally modulated, the mean ΔF/F of the first 1.5 seconds after the stimulus was computed for false alarm and correct rejection trials and compared with a rank sum test. Only sessions with at least 12 false alarm trials were used for this analysis. If the two distributions were significantly different, the tuft was classified as behaviorally modulated.

Custom MATLAB software was used to compute the median whisker angle, and whisking amplitude was computed as described previously ⁹⁷. The median angle was bandpass filtered from 4 to 30 Hz and passed through a Hilbert transform to calculate phase. We defined the upper and lower envelopes of the unfiltered median whisking angle as the points in the whisk cycle where phase equaled 0 (most protracted) or π (most retracted), respectively. Whisking amplitude was defined as the difference between these two envelopes. Periods of whisking were defined as times where whisking amplitude exceeded 20% of maximum for at least 250 msec. Periods of time where amplitude exceeded this threshold for less than 250 msec were considered ambiguous and excluded from analysis of whisking versus quiescence. The whisking-triggered average for each tuft was computed by aligning the calcium signal to the start times of whisking periods during inter-trial intervals (2-8 seconds after stimulus delivery).

For the linear regression analysis, we excerpted the calcium timeseries 2 seconds before and 6 seconds after each stimulus onset. The whisking amplitude signal was frame aligned to the calcium signal according to the lag of the calcium-whisking cross-correlation peak for each tuft. Whisking amplitude was then normalized to the max, yielding values that ranged from 0 to 1. The stimulus predictor variable was a binary vector with an 800-msec ‘on’ period (24 frames) centered at the stimulus time. The timing of the stimulus variable was then aligned to the calcium signal according to the latency of peak of the mean ΔF/F of the first 1.5 seconds relative to the stimulus. The lick predictor variable was a binary vector with ‘on’ periods denoting lick bouts. Lick bouts were defined as periods of time where the mouse elicited at least 2 licks, with a maximum gap of 200 msec, and therefore had variable lengths.

All statistical tests were two-sided. T-tests were used for Normally distributed data. Otherwise non-parametric tests were applied.

CS+ trials evoke a second, long-latency peak during early learning, but not late learning.
a, Left: Population average of stimulus-responsive tufts aligned to CS+ (red) or CS- (blue) trials from an example mouse. Right: Normalized ΔF/F of individual tufts during CS+ trials. b, Same as in a, combining data across four mice whose imaging regions were mapped with intrinsic imaging.

Selectivity was enhanced in individual animals that received rewards.
Median SI magnitude for each animal across three sessions for conditioned (left) and repeated exposure groups (right). * p < 0.05, *** p < 10^-3

Segmented tufts from two-photon and SCAPE microscopy.
12 example tufts extracted from either two-photon **(a)** or SCAPE microscopy **(b)**. Tufts segmented from SCAPE microscopy are shown as maximum intensity projections from the top and side. Scale bars: 100 μm.

Calcium event rate of tufts that were either unresponsive or responsive to air puff stimuli.
The number of calcium events per minute was quantified for all tufts during each conditioning session. Data from each group was pooled across all sessions.

Licking cannot account for changes in selectivity during learning.
a, ITI lick-bout-triggered averages of 232 tufts on the 5th conditioning day, when ITI licks were still common (grey traces - individual tufts, black inset - population average), exhibit little or no lick-related calcium influx. b, R² values for linear models predicting calcium from stimuli (y axis) are consistently greater than those predicting calcium from licking (x axis). Each circle represents one tuft out of 442 tufts on last-rewarded sessions. c, Coefficients from a multivariate regression analysis with calcium as the response variable and the CS+, CS-, whisking, and licking as the predictors. CS+ and CS- coefficients are therefore disentangled from correlations with whisking and licking. Conditioning biases individual tufts (circles) to have larger CS+ or CS- coefficients. n = 304, 324, and 322 tufts for First rewarded, Last rewarded, and Post conditioning, respectively. d, Similar analysis to C but for repeated exposure group, with calcium as the response variable and the CS+, CS-, and whisking as the predictors. n = 223, 208, and 218 tufts for Day 2, Final - 1, and Final session, respectively.

References

1
1. Poort J.
2. et al.
2015Learning Enhances Sensory and Multiple Non-sensory Representations in Primary Visual CortexNeuron 86:1478–1490https://doi.org/10.1016/j.neuron.2015.05.037 Google Scholar
2
1. Liu D.
2. et al.
2020Orbitofrontal control of visual cortex gain promotes visual associative learningNat Commun 11:2784https://doi.org/10.1038/s41467-020-16609-7 Google Scholar
3
1. Henschke J. U.
2. et al.
2020Reward Association Enhances Stimulus-Specific Representations in Primary Visual CortexCurr Biol 30:1866–1880https://doi.org/10.1016/j.cub.2020.03.018 Google Scholar
4
1. Beitel R. E.
2. Schreiner C. E.
3. Cheung S. W.
4. Wang X.
5. Merzenich M. M
2003Reward-dependent plasticity in the primary auditory cortex of adult monkeys trained to discriminate temporally modulated signalsProc Natl Acad Sci U S A 100:11070–11075https://doi.org/10.1073/pnas.1334187100 Google Scholar
5
1. Goltstein P. M.
2. Coffey E. B.
3. Roelfsema P. R.
4. Pennartz C. M
2013In vivo two-photon Ca2+ imaging reveals selective reward effects on stimulus-specific assemblies in mouse visual cortexJ Neurosci 33:11540–11555https://doi.org/10.1523/JNEUROSCI.1341-12.2013 Google Scholar
6
1. David S. V.
2. Fritz J. B.
3. Shamma S. A
2012Task reward structure shapes rapid receptive field plasticity in auditory cortexProc Natl Acad Sci U S A 109:2144–2149https://doi.org/10.1073/pnas.1117717109 Google Scholar
7
1. Fritz J.
2. Shamma S.
3. Elhilali M.
4. Klein D
2003Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortexNat Neurosci 6:1216–1223https://doi.org/10.1038/nn1141 Google Scholar
8
1. Banerjee A.
2. et al.
2020Value-guided remapping of sensory cortex by lateral orbitofrontal cortexNature 585:245–250https://doi.org/10.1038/s41586-020-2704-z Google Scholar
9
1. Zhang W.
2. Bruno R. M
2019High-order thalamic inputs to primary somatosensory cortex are stronger and longer lasting than cortical inputseLife 8https://doi.org/10.7554/eLife.44158 Google Scholar
10
1. Rubio-Garrido P.
2. Perez-de-Manzo F.
3. Porrero C.
4. Galazo M. J.
5. Clasca F
2009Thalamic input to distal apical dendrites in neocortical layer 1 is massive and highly convergentCereb Cortex 19:2380–2395https://doi.org/10.1093/cercor/bhn259 Google Scholar
11
1. Cauller L. J.
2. Clancy B.
3. Connors B. W
1998Backward cortical projections to primary somatosensory cortex in rats extend long horizontal axons in layer IJ Comp Neurol 390:297–310Google Scholar
12
1. Amitai Y.
2. Friedman A.
3. Connors B. W.
4. Gutnick M. J
1993Regenerative activity in apical dendrites of pyramidal cells in neocortexCereb Cortex 3:26–38Google Scholar
13
1. Yuste R.
2. Gutnick M. J.
3. Saar D.
4. Delaney K. R.
5. Tank D. W
1994Ca2+ accumulations in dendrites of neocortical pyramidal neurons: an apical band and evidence for two functional compartmentsNeuron 13:23–43Google Scholar
14
1. Schiller J.
2. Schiller Y.
3. Stuart G.
4. Sakmann B
1997Calcium action potentials restricted to distal apical dendrites of rat neocortical pyramidal neuronsJ Physiol 505:605–616Google Scholar
15
1. Larkum M. E.
2. Nevian T.
3. Sandler M.
4. Polsky A.
5. Schiller J
2009Synaptic integration in tuft dendrites of layer 5 pyramidal neurons: a new unifying principleScience 325:756–760https://doi.org/10.1126/science.1171958 Google Scholar
16
1. Sandler M.
2. Shulman Y.
3. Schiller J
2016A Novel Form of Local Plasticity in Tuft Dendrites of Neocortical Somatosensory Layer 5 Pyramidal NeuronsNeuron 90:1028–1042https://doi.org/10.1016/j.neuron.2016.04.032 Google Scholar
17
1. Manita S.
2. et al.
2015A Top-Down Cortical Circuit for Accurate Sensory PerceptionNeuron 86:1304–1316https://doi.org/10.1016/j.neuron.2015.05.006 Google Scholar
18
1. Roelfsema P. R.
2. Holtmaat A
2018Control of synaptic plasticity in deep cortical networksNat Rev Neurosci 19:166–180https://doi.org/10.1038/nrn.2018.6 Google Scholar
19
1. Larkum M. E.
2. Zhu J. J
2002Signaling of layer 1 and whisker-evoked Ca2+ and Na+ action potentials in distal and terminal dendrites of rat neocortical pyramidal neurons in vitro and in vivoJ Neurosci 22:6991–7005Google Scholar
20
1. Larkum M. E.
2. Senn W.
3. Luscher H. R
2004Top-down dendritic input increases the gain of layer 5 pyramidal neuronsCereb Cortex 14:1059–1070https://doi.org/10.1093/cercor/bhh065 Google Scholar
21
1. Schwindt P.
2. Crill W
1999Mechanisms underlying burst and regular spiking evoked by dendritic depolarization in layer 5 cortical pyramidal neuronsJ Neurophysiol 81:1341–1354https://doi.org/10.1152/jn.1999.81.3.1341 Google Scholar
22
1. Larkum M. E.
2. Zhu J. J.
3. Sakmann B
2001Dendritic mechanisms underlying the coupling of the dendritic with the axonal action potential initiation zone of adult rat layer 5 pyramidal neuronsJ Physiol 533:447–466Google Scholar
23
1. Manita S.
2. Miyakawa H.
3. Kitamura K.
4. Murayama M
2017Dendritic Spikes in Sensory PerceptionFront Cell Neurosci 11https://doi.org/10.3389/fncel.2017.00029 Google Scholar
24
1. Francioni V.
2. Padamsey Z.
3. Rochefort N. L
2019High and asymmetric somato-dendritic coupling of V1 layer 5 neurons independent of visual stimulation and locomotionElife 8https://doi.org/10.7554/eLife.49145 Google Scholar
25
1. Beaulieu-Laroche L.
2. Toloza E. H. S.
3. Brown N. J.
4. Harnett M. T
2019Widespread and Highly Correlated Somato-dendritic Activity in Cortical Layer 5 NeuronsNeuron 103:235–241https://doi.org/10.1016/j.neuron.2019.05.014 Google Scholar
26
1. Labarrera C.
2. et al.
2018Adrenergic Modulation Regulates the Dendritic Excitability of Layer 5 Pyramidal Neurons In VivoCell Rep 23:1034–1044https://doi.org/10.1016/j.celrep.2018.03.103 Google Scholar
27
1. Brombas A.
2. Fletcher L. N.
3. Williams S. R
2014Activity-dependent modulation of layer 1 inhibitory neocortical circuits by acetylcholineJ Neurosci 34:1932–1941https://doi.org/10.1523/JNEUROSCI.4470-13.2014 Google Scholar
28
1. Lacefield C. O.
2. Pnevmatikakis E. A.
3. Paninski L.
4. Bruno R. M
2019Reinforcement Learning Recruits Somata and Apical Dendrites across Layers of Primary Sensory CortexCell Rep 26:2000–2008https://doi.org/10.1016/j.celrep.2019.01.093 Google Scholar
29
1. Luebke J. I.
2. et al.
2010Dendritic vulnerability in neurodegenerative disease: insights from analyses of cortical pyramidal neurons in transgenic mouse modelsBrain Struct Funct 214:181–199https://doi.org/10.1007/s00429-010-0244-2 Google Scholar
30
1. Tsai J.
2. Grutzendler J.
3. Duff K.
4. Gan W. B
2004Fibrillar amyloid deposition leads to local synaptic abnormalities and breakage of neuronal branchesNat Neurosci 7:1181–1183https://doi.org/10.1038/nn1335 Google Scholar
31
1. Xu N. L.
2. et al.
2012Nonlinear dendritic integration of sensory and motor input during an active sensing taskNature 492:247–251https://doi.org/10.1038/nature11601 Google Scholar
32
1. Takahashi N.
2. et al.
2020Active dendritic currents gate descending cortical outputs in perceptionNat Neurosci https://doi.org/10.1038/s41593-020-0677-8 Google Scholar
33
1. Takahashi N.
2. Oertner T. G.
3. Hegemann P.
4. Larkum M. E
2016Active cortical dendrites modulate perceptionScience 354:1587–1590https://doi.org/10.1126/science.aah6066 Google Scholar
34
1. Bittner K. C.
2. Milstein A. D.
3. Grienberger C.
4. Romani S.
5. Magee J. C
2017Behavioral time scale synaptic plasticity underlies CA1 place fieldsScience 357:1033–1036https://doi.org/10.1126/science.aan3846 Google Scholar
35
1. Doron G.
2. et al.
2020Perirhinal input to neocortical layer 1 controls learningScience 370https://doi.org/10.1126/science.aaz3136 Google Scholar
36
1. Bouchard M. B.
2. et al.
2015Swept confocally-aligned planar excitation (SCAPE) microscopy for high speed volumetric imaging of behaving organismsNat Photonics 9:113–119https://doi.org/10.1038/nphoton.2014.323 Google Scholar
37
1. Hillman E. M.
2. et al.
2018High-speed 3D imaging of cellular activity in the brain using axially-extended beams and light sheetsCurr Opin Neurobiol 50:190–200Google Scholar
38
1. Yu Y. S.
2. Graff M. M.
3. Bresee C. S.
4. Man Y. B.
5. Hartmann M. J
2016Whiskers aid anemotaxis in ratsSci Adv 2:e1600716https://doi.org/10.1126/sciadv.1600716 Google Scholar
39
1. Yu Y. S.
2. Graff M. M.
3. Hartmann M. J
2016Mechanical responses of rat vibrissae to airflowJ Exp Biol 219:937–948https://doi.org/10.1242/jeb.126896 Google Scholar
40
1. Nakamura S.
2. Narumi T.
3. Tsutsui K.
4. Iijima T
2009Difference in the functional significance between the lemniscal and paralemniscal pathways in the perception of direction of single-whisker stimulation examined by muscimol microinjectionNeurosci Res 64:323–329https://doi.org/10.1016/j.neures.2009.04.005 Google Scholar
41
1. Bernhard S. M.
2. et al.
2020An automated homecage system for multiwhisker detection and discrimination learning in micePLoS One 15:e0232916https://doi.org/10.1371/journal.pone.0232916 Google Scholar
42
1. Chen T. W.
2. et al.
2013Ultrasensitive fluorescent proteins for imaging neuronal activityNature 499:295–300https://doi.org/10.1038/nature12354 Google Scholar
43
1. Kozorovitskiy Y.
2. Saunders A.
3. Johnson C. A.
4. Lowell B. B.
5. Sabatini B. L
2012Recurrent network activity drives striatal synaptogenesisNature 485:646–650https://doi.org/10.1038/nature11052 Google Scholar
44
1. Glickfeld L. L.
2. Andermann M. L.
3. Bonin V.
4. Reid R. C
2013Cortico-cortical projections in mouse visual cortex are functionally target specificNat Neurosci 16:219–226https://doi.org/10.1038/nn.3300 Google Scholar
45
1. Giovannucci A.
2. et al.
2019CaImAn an open source tool for scalable calcium imaging data analysisElife 8https://doi.org/10.7554/eLife.38173 Google Scholar
46
1. Pakan J. M. P.
2. Currie S. P.
3. Fischer L.
4. Rochefort N. L
2018The Impact of Visual Cues, Reward, and Motor Feedback on the Representation of Behaviorally Relevant Spatial Locations in Primary Visual CortexCell Rep 24:2521–2528https://doi.org/10.1016/j.celrep.2018.08.010 Google Scholar
47
1. Wang P. Y.
2. et al.
2020Transient and Persistent Representations of Odor Value in Prefrontal CortexNeuron 108:209–224https://doi.org/10.1016/j.neuron.2020.07.033 Google Scholar
48
1. Bruno R. M.
2. Khatri V.
3. Land P. W.
4. Simons D. J
2003Thalamocortical angular tuning domains within individual barrels of rat somatosensory cortexJ Neurosci 23:9565–9574Google Scholar
49
1. Bruno R. M.
2. Sakmann B
2006Cortex is driven by weak but synchronously active thalamocortical synapsesScience 312:1622–1627https://doi.org/10.1126/science.1124593 Google Scholar
50
1. Ramirez A.
2. et al.
2014Spatiotemporal receptive fields of barrel cortex revealed by reverse correlation of synaptic inputNat Neurosci 17:866–875https://doi.org/10.1038/nn.3720 Google Scholar
51
1. Yao H.
2. Dan Y
2001Stimulus timing-dependent plasticity in cortical processing of orientationNeuron 32:315–323https://doi.org/10.1016/s0896-6273(01)00460-3 Google Scholar
52
1. Dragoi V.
2. Sharma J.
3. Miller E. K.
4. Sur M
2002Dynamics of neuronal sensitivity in visual cortex and local feature discriminationNat Neurosci 5:883–891https://doi.org/10.1038/nn900 Google Scholar
53
1. Dragoi V.
2. Sharma J.
3. Sur M
2000Adaptation-induced plasticity of orientation tuning in adult visual cortexNeuron 28:287–298https://doi.org/10.1016/s0896-6273(00)00103-3 Google Scholar
54
1. Zhang Y.
2. Cudmore R. H.
3. Lin D. T.
4. Linden D. J.
5. Huganir R. L
2015Visualization of NMDA receptor-dependent AMPA receptor synaptic plasticity in vivoNat Neurosci 18:402–407https://doi.org/10.1038/nn.3936 Google Scholar
55
1. Chu M. W.
2. Li W. L.
3. Komiyama T
2016Balancing the Robustness and Efficiency of Odor Representations during LearningNeuron 92:174–186https://doi.org/10.1016/j.neuron.2016.09.004 Google Scholar
56
1. Cichon J.
2. Gan W. B
2015Branch-specific dendritic Ca(2+) spikes cause persistent synaptic plasticityNature 520:180–185https://doi.org/10.1038/nature14251 Google Scholar
57
1. Palmer L. M.
2. et al.
2014NMDA spikes enhance action potential generation during sensory inputNat Neurosci 17:383–390https://doi.org/10.1038/nn.3646 Google Scholar
58
1. Derdikman D.
2. et al.
2006Layer-specific touch-dependent facilitation and depression in the somatosensory cortex during active whiskingJ Neurosci 26:9538–9547https://doi.org/10.1523/JNEUROSCI.0918-06.2006 Google Scholar
59
1. de Kock C. P.
2. Sakmann B
2009Spiking in primary somatosensory cortex during natural whisking in awake head-restrained rats is cell-type specificProc Natl Acad Sci U S A 106:16446–16450https://doi.org/10.1073/pnas.0904143106 Google Scholar
60
1. Rodgers C. C.
2. et al.
2021Sensorimotor strategies and neuronal representations for shape discriminationNeuron 109:2308–2325https://doi.org/10.1016/j.neuron.2021.05.019 Google Scholar
61
1. Larkum M
2013A cellular mechanism for cortical associations: an organizing principle for the cerebral cortexTrends Neurosci 36:141–151https://doi.org/10.1016/j.tins.2012.11.006 Google Scholar
62
1. Park J. M.
2. et al.
2020Deep and superficial layers of the primary somatosensory cortex are critical for whisker-based texture discrimination in micebioRxiv https://doi.org/10.1101/2020.08.12.245381 Google Scholar
63
1. Rigotti M.
2. et al.
2013The importance of mixed selectivity in complex cognitive tasksNature 497:585–590https://doi.org/10.1038/nature12160 Google Scholar
64
1. Stringer C.
2. Pachitariu M.
3. Steinmetz N.
4. Carandini M.
5. Harris K. D
2019High-dimensional geometry of population responses in visual cortexNature 571:361–365https://doi.org/10.1038/s41586-019-1346-5 Google Scholar
65
1. Kim J.
2. Erskine A.
3. Cheung J. A.
4. Hires S. A
2020Behavioral and Neural Bases of Tactile Shape Discrimination Learning in Head-Fixed MiceNeuron 108:953–967https://doi.org/10.1016/j.neuron.2020.09.012 Google Scholar
66
1. Dabney W.
2. et al.
2020A distributional code for value in dopamine-based reinforcement learningNature 577:671–675https://doi.org/10.1038/s41586-019-1924-6 Google Scholar
67
1. Petreanu L.
2. et al.
2012Activity in motor-sensory projections reveals distributed coding in somatosensationNature 489:299–303https://doi.org/10.1038/nature11321 Google Scholar
68
1. Wimmer V. C.
2. Bruno R. M.
3. de Kock C. P.
4. Kuner T.
5. Sakmann B
2010Dimensions of a Projection Column and Architecture of VPM and POm Axons in Rat Vibrissal CortexCereb Cortex 20:2265–2276https://doi.org/10.1093/cercor/bhq068 Google Scholar
69
1. Feldmeyer D
2012Excitatory neuronal connectivity in the barrel cortexFront Neuroanat 6https://doi.org/10.3389/fnana.2012.00024 Google Scholar
70
1. Constantinople C. M.
2. Bruno R. M
2013Deep cortical layers are activated directly by thalamusScience 340:1591–1594https://doi.org/10.1126/science.1236425 Google Scholar
71
1. Petersen C. C.
2. Crochet S
2013Synaptic computation and sensory processing in neocortical layer 2/3Neuron 78:28–48https://doi.org/10.1016/j.neuron.2013.03.020 Google Scholar
72
1. Yamashita T.
2. et al.
2018Diverse Long-Range Axonal Projections of Excitatory Layer 2/3 Neurons in Mouse Barrel CortexFront Neuroanat 12:33https://doi.org/10.3389/fnana.2018.00033 Google Scholar
73
1. Llinas R.
2. Ribary U.
3. Contreras D.
4. Pedroarena C
1998The neuronal basis for consciousnessPhilos Trans R Soc Lond B Biol Sci 353:1841–1849https://doi.org/10.1098/rstb.1998.0336 Google Scholar
74
1. Krauzlis R. J.
2. Lovejoy L. P.
3. Zenon A
2013Superior colliculus and visual spatial attentionAnnu Rev Neurosci 36:165–182https://doi.org/10.1146/annurev-neuro-062012-170249 Google Scholar
75
1. Parvizi J.
2. Damasio A.
2001Consciousness and the brainstemCognition 79:135–160https://doi.org/10.1016/s0010-0277(00)00127-x Google Scholar
76
1. Feldman D. E.
2. Brecht M
2005Map plasticity in somatosensory cortexScience 310:810–815https://doi.org/10.1126/science.1115807 Google Scholar
77
1. Mechawar N.
2. Cozzari C.
3. Descarries L
2000Cholinergic innervation in adult rat cerebral cortex: a quantitative immunocytochemical descriptionJ Comp Neurol 428:305–318Google Scholar
78
1. Freedman R.
2. Foote S. L.
3. Bloom F. E
1975Histochemical characterization of a neocortical projection of the nucleus locus coeruleus in the squirrel monkeyJ Comp Neurol 164:209–231https://doi.org/10.1002/cne.901640205 Google Scholar
79
1. Chubykin A. A.
2. Roach E. B.
3. Bear M. F.
4. Shuler M. G
2013A cholinergic mechanism for reward timing within primary visual cortexNeuron 77:723–735https://doi.org/10.1016/j.neuron.2012.12.039 Google Scholar
80
1. Thiele A.
2. Bellgrove M. A.
2018Neuromodulation of AttentionNeuron 97:769–785https://doi.org/10.1016/j.neuron.2018.01.008 Google Scholar
81
1. Hangya B.
2. Ranade S. P.
3. Lorenc M.
4. Kepecs A
2015Central Cholinergic Neurons Are Rapidly Recruited by Reinforcement FeedbackCell 162:1155–1168https://doi.org/10.1016/j.cell.2015.07.057 Google Scholar
82
1. Froemke R. C.
2. Merzenich M. M.
3. Schreiner C. E
2007A synaptic memory trace for cortical receptive field plasticityNature 450:425–429https://doi.org/10.1038/nature06289 Google Scholar
83
1. Letzkus J. J.
2. et al.
2011A disinhibitory microcircuit for associative fear learning in the auditory cortexNature 480:331–335https://doi.org/10.1038/nature10674 Google Scholar
84
1. Gasselin C.
2. Hohl B.
3. Vernet A.
4. Crochet S.
5. Petersen C. C. H
2021Cell-type-specific nicotinic input disinhibits mouse barrel cortex during active sensingNeuron https://doi.org/10.1016/j.neuron.2020.12.018 Google Scholar
85
1. Berger T. K.
2. Silberberg G.
3. Perin R.
4. Markram H
2010Brief bursts self-inhibit and correlate the pyramidal networkPLoS Biol 8https://doi.org/10.1371/journal.pbio.1000473 Google Scholar
86
1. Kapfer C.
2. Glickfeld L. L.
3. Atallah B. V.
4. Scanziani M
2007Supralinear increase of recurrent inhibition during sparse activity in the somatosensory cortexNat Neurosci 10:743–753https://doi.org/10.1038/nn1909 Google Scholar
87
1. Naka A.
2. Adesnik H
2016Inhibitory Circuits in Cortical Layer 5Front Neural Circuits 10https://doi.org/10.3389/fncir.2016.00035 Google Scholar
88
1. Ljungberg T.
2. Apicella P.
3. Schultz W
1992Responses of monkey dopamine neurons during learning of behavioral reactionsJ Neurophysiol 67:145–163https://doi.org/10.1152/jn.1992.67.1.145 Google Scholar
89
1. Pan W. X.
2. Schmidt R.
3. Wickens J. R.
4. Hyland B. I
2005Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning networkJ Neurosci 25:6235–6242https://doi.org/10.1523/JNEUROSCI.1478-05.2005 Google Scholar
90
1. Bouret S.
2. Sara S. J
2004Reward expectation, orientation of attention and locus coeruleus-medial frontal cortex interplay during learningEur J Neurosci 20:791–802https://doi.org/10.1111/j.1460-9568.2004.03526.x Google Scholar
91
1. Sheffield M. E.
2. Dombeck D. A
2015Calcium transient prevalence across the dendritic arbour predicts place field propertiesNature 517:200–204https://doi.org/10.1038/nature13871 Google Scholar
92
1. Oberlaender M.
2. et al.
2012Cell type-specific three-dimensional structure of thalamocortical circuits in a column of rat vibrissal cortexCereb Cortex 22:2375–2391https://doi.org/10.1093/cercor/bhr317 Google Scholar
93
1. Clack N. G.
2. et al.
2012Automated tracking of whiskers in videos of head fixed rodentsPLoS Comput Biol 8:e1002591https://doi.org/10.1371/journal.pcbi.1002591 Google Scholar
94
1. Voleti V.
2. et al.
2019Real-time volumetric microscopy of in vivo dynamics and large-scale samples with SCAPE 2.0Nat Methods 16:1054–1062https://doi.org/10.1038/s41592-019-0579-4 Google Scholar
95
1. Pnevmatikakis E. A.
2. Giovannucci A
2017NoRMCorre: An online algorithm for piecewise rigid motion correction of calcium imaging dataJ Neurosci Methods 291:83–94https://doi.org/10.1016/j.jneumeth.2017.07.031 Google Scholar
96
1. Pnevmatikakis E. A.
2. et al.
2016Simultaneous Denoising, Deconvolution, and Demixing of Calcium Imaging DataNeuron 89:285–299https://doi.org/10.1016/j.neuron.2015.11.037 Google Scholar
97
1. Petty G. H.
2. Kinnischtzke A. K.
3. Hong Y. K.
4. Bruno R. M.
2020Effects of arousal and movement on secondary somatosensory and visual thalamusbioRxiv https://doi.org/10.1101/2020.03.04.977348 Google Scholar

Article and author information

Author information

Sam E. Benezra
Department of Neuroscience, Columbia University, New York, NY 10027, Kavli Institute for Brain Science, Columbia University, New York, NY 10027, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
Kripa B. Patel
Departments of Biomedical Engineering and Radiology, Columbia University, New York, NY 10027, Kavli Institute for Brain Science, Columbia University, New York, NY 10027, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
Citlali Pérez Campos
Departments of Biomedical Engineering and Radiology, Columbia University, New York, NY 10027, Kavli Institute for Brain Science, Columbia University, New York, NY 10027, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
Elizabeth M.C. Hillman
Department of Neuroscience, Columbia University, New York, NY 10027, Departments of Biomedical Engineering and Radiology, Columbia University, New York, NY 10027, Kavli Institute for Brain Science, Columbia University, New York, NY 10027, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
ORCID iD: 0000-0001-5511-1451
Randy M. Bruno
Department of Neuroscience, Columbia University, New York, NY 10027, Kavli Institute for Brain Science, Columbia University, New York, NY 10027, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford, OX1 3PT
ORCID iD: 0000-0002-5122-4632
- Correspondence: randy.bruno@dpag.ox.ac.uk (R.M.B.)
- Lead contact

Version history

Preprint posted: March 30, 2024
Sent for peer review: April 8, 2024
Reviewed Preprint version 1: June 13, 2024
Reviewed Preprint version 2: December 3, 2024
Version of Record published: December 27, 2024
Version of Record updated: January 17, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.98349. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Brice Bathellier
Centre National de la Recherche Scientifique, Paris, France
Senior Editor
John Huguenard
Stanford University School of Medicine, Stanford, United States of America

Reviewer #1 (Public Review):

What neurophysiological changes support the learning of new sensorimotor transformations is a key question in neuroscience. Many studies have attempted to answer this question at the neuronal population level - with varying degrees of success - but few, if any, have studied the change in activity of the apical dendrites of layer 5 cortical neurons. Neurons in layer 5 of the sensory cortex appear to play a key role in sensorimotor transformations, showing important decision and reward-related signals, and being the main source of cortical and subcortical projections from the cortex. In particular, pyramidal track (PT) neurons project directly to subcortical regions related to motor activity, such as the striatum and brainstem, and could initiate rapid motor action in response to given sensory inputs. Additionally, layer 5 cortical neurons have large apical dendrites that extend to layer 1 where different neuromodulatory and long-range inputs converge, providing motor and contextual information that could be used to modulate layer 5 neurons output and/or to establish the synaptic plasticity required for learning a new association.

In this study, the authors aimed to test whether the learning of a new sensorimotor transformation could be supported by a change in the evoked response of the apical dendrites of layer 5 neurons in the mouse whisker primary somatosensory cortex. To do this, they performed longitudinal functional calcium imaging of the apical dendrites of layer 5 neurons while mice learned to discriminate between two multi-whisker stimuli. The authors used a simple conditioning task in which one whisker stimulus (upward or backward air puff, CS+) is associated with a reward after a short delay, while the other whisker stimulus (CS-) is not. They found that task learning (measured by the probability of anticipatory licking just after the CS+) was not associated with a significant change in the average population response evoked by the CS+ or the CS-, nor a change in the average population selectivity. However, when considering individual dendritic tufts, they found interesting changes in selectivity, with approximately equal numbers of dendrites becoming more selective for CS+ and dendrites becoming more selective for CS-.

One of the major challenges when assessing changes in neural representation during the learning of such Go/NoGo tasks is that the movements and rewards themselves may elicit strong neural responses that may be a confounding factor, that is, inexperienced mice do not lick in response to the CS+, while trained mice do. In this study, the authors addressed this issue in three ways: first, they carefully monitored the orofacial movements of mice and showed that task learning is not associated with changes in evoked whisker movements. Second, they show that whisking or licking evokes very little activity in the dendritic tufts compared to whisker stimuli (CS+ and CS-). Finally, the authors introduced into the design of their task a post-conditioning session after the last conditioning session during which the CS+ and the CS- are presented but no reward is delivered. During this post-session, the mice gradually stopped licking in response to the CS+. A better design might have been to perform the pre-conditioning and post-conditioning sessions in non-water-restricted, unmotivated mice to completely exclude any lick response, but the fact that the change in selectivity persists after the mice stopped licking in the last blocks of the post-conditioning session (in mice relying only on their whiskers to perform the task) is convincing.

The clever task design and careful data analysis provide compelling evidence that learning this whisker discrimination task does not result in a massive change in sensory representation in the apical dendritic tufts of layer 5 neurons in the primary somatosensory cortex on average. Nevertheless, individual dendritic tufts do increase their selectivity for one or the other sensory stimulus, likely enhancing the ability of S1 neurons to accurately discriminate the two stimuli and trigger the appropriate motor response (to lick or not to lick).

One limitation of the present study is the lack of evidence for the necessity of the primary somatosensory cortex in the learning and execution of the task. As the authors have strongly emphasized in their previous publications, the primary somatosensory cortex may not be necessary for the learning and execution of simple whisker detection tasks, especially when the stimulus is very salient. Although this new task requires the discrimination between two whisker stimuli, the simplicity and salience of the whisker stimuli used could make this task cortex-independent. Especially when considering that some mice seem to not rely entirely on their whiskers to execute the task.

Nevertheless, this is an important result that shows for the first time changes in the selectivity to sensory stimuli at the level of individual apical dendritic tufts in correlation with the learning of a discrimination task. This study sheds new light on the cortical cellular substrates of reward-based learning and opens interesting perspectives for future research in this area. In future studies, it will be important to determine whether the change in selectivity of dendritic calcium spikes is causally involved in the learning of the task or whether it simply correlates with learning, as a consequence of changes in synaptic inputs caused by reward. The dendritic calcium spikes may be involved in the establishment of synaptic plasticity required for learning and impact the output of layer 5 pyramidal neurons to trigger the appropriate motor response. It would be important also to study the changes in selectivity in the apical dendrite of the identified projection neurons.

https://doi.org/10.7554/eLife.98349.1.sa1

Reviewer #2 (Public Review):

Summary:

The authors did not find an increased representation of CS+ throughout reinforcement learning in the tuft dendrites of Rbp4-positive neurons from layer 5B of the barrel cortex, as previously reported for soma from layer 2/3 of the visual cortex.

Alternatively, the authors observed an increased selectivity to both stimuli (CS+ and CS-) during reinforcement learning. This feature:

(1) was not present in repeated exposures (without reinforcement),
(2) was not explained by the animal's behaviour (choice, licking, and whisking), and
(3) was long-lasting, being present even when the mice disengaged from the task.

Importantly, increased selectivity was correlated with learning (% correct choices), and neural discriminability between stimuli increased with learning.

In conclusion, the authors show that tuft dendrites from layer 5B of the barrel cortex increase the representation of conditioned (CS+) and unconditioned stimuli (CS-) applied to the whiskers, during reinforcement learning.

Strengths:

The results presented are very consistent throughout the entire study, and therefore very convincing:

(1) The results observed are very similar using two different imaging techniques (2-photon -planar imaging- and SCAPE-volumetric imaging). Figure 3 and Figure 4 respectively.

(2) The results are similar using "different groups" of tuft dendrites for the analysis (e.g. initially unresponsive and responsive pre- and post-learning). Figure 5.

(3) The results are similar from a specific set of trials (with the same sensory input, but different choices). Figure 7.

(4) Additionally, the selectivity of tuft dendrites from layer 5B of the barrel cortex was higher in the mice that exclusively used the whisker to respond to the stimuli (CS+ and CS-).
The results presented are controlled against a group of mice that received the same stimuli presentation, except for the reinforcement (reward).

Additionally, the behaviour outputs, such as choice, whisking, and licking could not account for the results observed.

Although there are no causal experiments, the correlation between selectivity and learning (percentage of correct choices), as well as the increased neural discriminability with learning, but not in repeated exposure, are very convincing.

Weaknesses:

The biggest weakness is the absence of causality experiments. Although inhibiting specifically tuft dendritic activity in layer 1 from layer 5 pyramidal neurons is very challenging, tuft dendritic activity in layer 1 could be silenced through optogenetic experiments as in Abs et al. 2018. By manipulating NDNF-positive neurons the authors could specifically modify tuft dendritic activity in the barrel cortex during CS presentations, and test if silencing tuft dendritic activity in layer 1 would lead to the lack of selectivity and an impairment of reinforcement learning. Additionally, this experiment will test if the selectivity observed during reinforcement learning is due to changes in the local network, namely changes in local synaptic connectivity, or solely due to changes in the long-range inputs.

https://doi.org/10.7554/eLife.98349.1.sa0

Significance of findings

Strength of evidence

Abstract

Summary

Introduction

Results

Direction discrimination behavior

Mice rapidly learn to discriminate stimulus direction in head-fixed paradigm.

Overall stimulus-evoked activity is unbiased and stable across conditioning

Overall tuft response to stimuli is unbiased and relatively stable across conditioning.

Development of tuft selectivity with task learning

Reinforcement learning, but not stimulus exposure, enhances tuft selectivity for CS+ and CS- stimuli.

High-speed volumetric imaging of apical tufts confirms the emergence of enhanced selectivity after learning.

Selective tufts emerge from both initially unresponsive and responsive populations

Longitudinal tracking reveals that reward enhances the selectivity of both initially unresponsive and responsive tufts.

Neither movement nor behavioral choice account for enhanced selectivity

Whisking is only weakly correlated with tuft activity and cannot account for changes in selectivity during learning.

Behavioral responses do not account for enhancement of stimulus selectivity during learning.

Enhanced selectivity in barrel cortex is long-lasting when mice exclusively use whiskers

Apical tufts in barrel cortex of mice performing the task exclusively with their whiskers undergo long-lasting changes in selectivity.

Discussion

Enhanced Representation of Behaviorally Relevant Stimuli

Candidate Plasticity Mechanisms

Global versus local dendritic spikes

Stability of learned tuft representations

Conclusion

Supporting information

Acknowledgements

Author contributions

Data availability

Methods

Surgery

Behavior

Intrinsic signal optical imaging and two-photon imaging

SCAPE imaging

Analysis

CS+ trials evoke a second, long-latency peak during early learning, but not late learning.

Selectivity was enhanced in individual animals that received rewards.

Segmented tufts from two-photon and SCAPE microscopy.

Calcium event rate of tufts that were either unresponsive or responsive to air puff stimuli.

Licking cannot account for changes in selectivity during learning.

References

Article and author information

Author information

Sam E. Benezra

Kripa B. Patel

Citlali Pérez Campos

Elizabeth M.C. Hillman

Randy M. Bruno6

Version history

Cite all versions

Copyright

Peer review process

Editors

Randy M. Bruno