1. Neuroscience
Download icon

Specialized coding patterns among dorsomedial prefrontal neuronal ensembles predict conditioned reward seeking

  1. Roger I Grant
  2. Elizabeth M Doncheck
  3. Kelsey M Vollmer
  4. Kion T Winston
  5. Elizaveta V Romanova
  6. Preston N Siegler
  7. Heather Holman
  8. Christopher W Bowen
  9. James M Otis  Is a corresponding author
  1. Department of Neuroscience, Medical University of South Carolina, United States
  2. Hollings Cancer Center, Medical University of South Carolina, United States
Research Article
  • Cited 0
  • Views 1,351
  • Annotations
Cite this article as: eLife 2021;10:e65764 doi: 10.7554/eLife.65764

Abstract

Non-overlapping cell populations within dorsomedial prefrontal cortex (dmPFC), defined by gene expression or projection target, control dissociable aspects of reward seeking through unique activity patterns. However, even within these defined cell populations, considerable cell-to-cell variability is found, suggesting that greater resolution is needed to understand information processing in dmPFC. Here, we use two-photon calcium imaging in awake, behaving mice to monitor the activity of dmPFC excitatory neurons throughout Pavlovian reward conditioning. We characterize five unique neuronal ensembles that each encodes specialized information related to a sucrose reward, reward-predictive cues, and behavioral responses to those cues. The ensembles differentially emerge across daily training sessions – and stabilize after learning – in a manner that improves the predictive validity of dmPFC activity dynamics for deciphering variables related to behavioral conditioning. Our results characterize the complex dmPFC neuronal ensemble dynamics that stably predict reward availability and initiation of conditioned reward seeking following cue-reward learning.

Introduction

The dorsomedial prefrontal cortex (dmPFC) has garnered considerable interest due to its dysregulation in diseases associated with disordered reward processing (Chen et al., 2018; Courchesne et al., 2011; Dienel and Lewis, 2019; Holmes et al., 2018; Koob and Volkow, 2010; Ye et al., 2012). These abnormalities include aberrant cell morphology and regional mass (Courchesne et al., 2011), abnormal activity patterns (Dienel and Lewis, 2019), and reduced behavioral performance on tasks that involve dmPFC activity (Goldstein and Volkow, 2011). Despite this knowledge, how unique cell types in dmPFC encode complex reward-related information to guide behavioral output is unclear, limiting our understanding of how reward processing occurs in healthy individuals as compared to those with neuropsychiatric diseases.

Neuronal activity in dmPFC neurons is observed to be heterogeneous in a variety of behavioral tasks (Kim et al., 2016; Kobayashi et al., 2006; Matsumoto et al., 2003; Powell and Redish, 2014), including those that involve reward-seeking behaviors (Sun et al., 2011; Moorman and Aston-Jones, 2015; Otis et al., 2017; Otis et al., 2019; Sparta et al., 2014). Recent studies have aimed to resolve this heterogeneity through cell-type specific recording strategies, such as in vivo calcium imaging in genetically or projection-defined neurons (Otis et al., 2017; Otis et al., 2019; Siciliano et al., 2019; Ye et al., 2016). Although some variability can be explained by identified neuronal subpopulations, a vast majority of response diversity remains unexplained (Otis et al., 2017). For example, we recently demonstrated that many dmPFC excitatory neurons that project to the nucleus accumbens (NAc) show diverse responses to reward-predictive cues, with about two-thirds of the responding cells being excitatory responders and the other third being inhibitory responders. Similarly, dmPFC neurons that project to the paraventricular thalamus (PVT) also show diverse responses to reward-predictive cues, with about two-thirds of responding cells being inhibitory responders and the other one-third being excitatory responders (Otis et al., 2017; Otis et al., 2019). Finally, responses can be further subdivided by their temporal relation to the cue, as well as the reward (i.e., anticipation vs. consumption). Overall, although subpopulations of dmPFC output neurons could be labeled as ‘generally excited’ (e.g., dmPFC→NAc) or ‘generally inhibited’ (e.g., dmPFC→PVT), such assignment ignores much of the variability that is likely critical for behavioral control. Thus, a more thorough and unbiased means of defining the heterogeneous activity patterns among dmPFC output neurons is needed to understand how these neurons are engaged during reward-related behavioral tasks.

Here, we use in vivo two-photon calcium imaging to measure and longitudinally track the activity dynamics of single dmPFC excitatory output neurons throughout a Pavlovian conditioned licking task. We observe five unique neuronal ensembles after task acquisition that encode specialized information related to the sucrose reward, reward-predictive cues, and behavioral responses to those cues. These five ensembles differentially emerge across days during learning in a manner that improves the predictive validity of dmPFC population dynamics for deciphering reward delivery, cue presentation, and behavior. These adaptations were specific to a reward-predictive cue, but not another neutral cue, suggesting that the identified neuronal activity patterns are likely related to associative learning. Finally, we find that single neurons within dmPFC neuronal ensembles display both trial-to-trial (within sessions) and day-to-day (between sessions) stability after learning. Overall, our data reveal that heterogeneous excitatory neuronal ensembles in dmPFC evolve specialized coding patterns across cue-reward learning that are stably maintained after learning. Our results highlight the importance of ensemble-specific recording and manipulation strategies for understanding the function of dmPFC activity for reward processing.

Results

Unique excitatory neuronal ensembles in dmPFC during reward seeking

We employed a Pavlovian conditioned licking task wherein head-restrained mice were trained to associate one tone conditioned stimulus (CS+), but not another (CS-), with the delivery of a liquid sucrose reward (Figure 1A–B). Mice readily acquired this task across sessions (one session per day; see Figure 1—figure supplement 1), showing conditioned licking behavior between the CS+ offset and reward delivery (trace interval), but not the CS- offset and equivalent no-reward epoch during sessions on later days (deemed ‘late in learning’; Figure 1C–D). A two-way ANOVA revealed a significant cue by session interaction for conditioned licking behavior (Δ lick rate; F1,47 = 165.1; p-value < 0.001), and post hoc tests confirmed that mice licked significantly more during CS+ trials during late in learning sessions as compared with CS- trials during both sessions (p-values < 0.001) and CS+ trials during early in learning sessions (p-value < 0.001). Thus, conditioned licking behavior for the CS+, but not CS-, developed across days of training revealing that the cue-reward association had been acquired by sessions identified as ‘late in learning’.

Figure 1 with 6 supplements see all
Distinct excitatory neuronal ensembles revealed in dorsomedial prefrontal cortex (dmPFC) during head-fixed Pavlovian conditioning.

(A) Illustration of head fixation for reward conditioning experiments with concurrent two-photon imaging. (B) Schematic for reward conditioning experiments, in which the CS+ and CS- are presented in a random order 50 times each. The CS+ denotes the availability of a liquid sucrose reward following a 1 s trace interval (TI). Anticipatory licks are seen during the trace interval in well-trained mice. (C) Example behavior data during early in learning (left) vs. late in learning (right) behavioral sessions. (D) Cue discrimination scores (auROC; CS+ vs. CS-) and change in lick rate for each cue during early and late in learning behavioral sessions. Data presented as mean ± SEM. (E–F) Surgery schematic (E) allowing for in vivo imaging of GCaMP6s-expressing neurons (F). (G–H) Heat maps displaying the responses of all dmPFC neurons (G) and responses separated by cluster (H) aligned to the cues. (I–J) Line plots displaying the mean activity traces of all cells (I) and mean activity of all cells separated by cluster (J).

We monitored the activity dynamics of putative dmPFC excitatory output neurons during both ‘early in learning’ and subsequent ‘late in learning’ behavioral sessions using two-photon calcium imaging (via AAVdj-CaMK2α-GCaMP6s) in vivo (Figure 1E–F; Figure 1—figure supplement 2; Figure 1—video 1). The number of sessions early and late in learning was unique to each animal based on speed of acquisition in the task (see Figure 1—figure supplement 1) as well as the number of visualizable imaging planes (see Figure 1F). A single imaging plane (field of view [FOV]) was selected during each day of training, resulting in 28 FOVs recorded early in learning (n = 10 mice, 28 FOVs, 2092 neurons) and 21 FOVs late in learning (n = 7 mice, 21 FOVs, 1511 neurons; three mice did not reach late in learning due to headcap issues). Some, but not all, of these FOVs were visualized during both early and late in learning behavioral sessions (to allow cell tracking in Figure 4). Like previous reports (Otis et al., 2017; Otis et al., 2019), we found that neurons display excitatory and inhibitory responses to each cue and/or reward during late in learning behavioral sessions (Figure 1G and I). To better classify these post-learning activity patterns, we used a spectral clustering algorithm (Namboodiri et al., 2019) to isolate unique responses between recorded neurons (based on optimal clustering performance as compared to agglomerative and K-means clustering; see Figure 1—figure supplements 3 and 4). The analysis revealed the existence of five clusters or ‘neuronal ensembles’ that comprise most (but not all) of the response variability (Figure 1H and J) and are spatially intermixed within dmPFC (Figure 1—figure supplement 5). Each neuronal ensemble displayed a unique activity pattern during the CS+ and/or CS- trials, with qualitative analyses revealing the following dynamics: Cluster 1: excitatory responses during CS+, CS-, and reward delivery (n = 192 neurons from 5/7 mice; 17/21 FOVs), Cluster 2: excitatory responses during CS+ trials (n = 346 neurons from 7/7 mice; 19/21 FOVs), Cluster 3: excitatory responses during CS+ and CS- trials (n = 291 neurons from 7/7 mice; 20/21 FOVs), Cluster 4: excitatory responses during reward delivery (n = 320 neurons from 5/7 mice; 17/21 FOVs), and Cluster 5: inhibitory responses during CS+ trials (n = 362 neurons; 7/7 mice; 21/21 FOVs). Overall, we find the existence of five unique ensembles among dmPFC excitatory output neurons, with each of these ensembles displaying unique activity patterns after learning in a Pavlovian conditioned licking task.

Select excitatory neuronal ensembles in dmPFC predict behavioral performance during conditioned reward seeking

Considering that some neuronal ensembles were absent within some FOVs during a behavioral session, we determined whether the relative proportion of each neuronal ensemble in each FOV predicts behavioral performance. To this end, we quantified the proportion of neurons within each ensemble for all late in learning behavioral sessions (n = 21 sessions, 21 unique FOVs), and compared those values to cue discrimination licking scores (normalized auROC, CS+ vs. CS- lick rate). Overall, we find that proportion of neurons within Cluster 5 (which showed inhibitory responses during CS+ trials) predicts cue discrimination licking scores (Figure 2A). Pearson-R correlation values, found in the inset of each subpanel (Figure 2A), reveal a significant, positive relationship between cue discrimination licking scores and the percentage of neurons in Cluster 5 (p-value = 0.012). However, there was no significant correlation found for Clusters 1–4 (p-values > 0.2). These data are particularly interesting considering previous findings showing that dmPFC→PVT neurons, which are also primarily inhibited during CS+ trials, are critical for the acquisition and expression of cue-induced reward seeking (Otis et al., 2017; Otis et al., 2019). These findings suggest that mice with greater numbers of neurons displaying inhibitory CS+ responses after learning, as in Cluster 5, may have improved behavioral performance in the appetitive learning task.

Behavioral performance is predicted by the relative percentage of dorsomedial prefrontal cortex (dmPFC) excitatory neurons within select ensembles.

(A) Correlation plots separated by cluster displaying the relationship between cue discrimination behavioral scores after learning (auROCs: CS+ vs. CS- licking) and the percentage of detected neurons in a cluster during that behavioral session. The relative percentage of neurons in Cluster 5 positively predicted behavioral performance (**p-value = 0.012). (B) Correlation plots separated by cluster displaying the relationship between CS- licking error rate after learning (auROC: CS- vs. baseline licking) and the percentage of neurons in a cluster during that behavioral session. The percentage of neurons in Cluster 3 positively predicted CS- licking error rate (**p-value = 0.009).

We also investigated whether the number of neurons within a particular neuronal ensemble predicted the probability that mice would increase licking during the CS- after learning (normalized auROC, CS- vs. baseline lick rate), which could be considered the initiation of a licking ‘error’. Overall, we find that the relative proportion of neurons within Cluster 3 (which showed equivalent excitatory CS+ and CS- responses) predicts an increase in licking during CS- trials (Figure 2B). Pearson-R correlation values, found in each subpanel (Figure 2B), reveal a significant, positive relationship between the initiation of CS- licking ‘errors’ and the percentage of neurons in Cluster 3 (p-value = 0.009). However, there was no significant correlation found for other clusters (p-values > 0.2). These data suggest that mice with greater numbers of neurons that display equivalent, excitatory responses to both cues, as in Cluster 3, may be more likely to initiate reward seeking when rewards are not available.

Excitatory neuronal ensembles in dmPFC display specialized coding during reward seeking

We find that dmPFC excitatory neuronal ensembles display unique activity patterns after developing Pavlovian conditioned behavioral responses, and that the relative proportion of neurons in each ensemble (specifically, Clusters 3 and 5) can predict behavioral task performance. Despite these findings, whether dmPFC activity patterns can be used to reliably infer environmental or behavioral events during the task is unknown. To this end, we trained a decoder to predict cue, reward, and licking events based on the activity dynamics of all neurons within each FOV (early in learning, n = 28 FOVs, 2092 neurons; late in learning, n = 21 FOVs, 1511 neurons). Overall, we find that dmPFC population dynamics within each FOV can be used to detect the presentation of the CS+, CS-, CS+ vs. CS- (cue discrimination), reward, and licking rate during late in learning sessions, whereas these activity patterns can be used to predict only the CS+, CS-, and CS+ vs. CS- (cue discrimination), but not reward delivery or licking rate during early in learning sessions (Figure 3A). ANOVAs, main effects of shuffling: CS+, F1,92 = 37.29, p-value < 0.001; post hoc p-values < 0.05; CS-, F1,92 = 22.88, p-value < 0.001; post hoc p-values < 0.05. CS+ vs. CS-, F1,92 = 35.94, p-value < 0.001; post hoc p-values < 0.01; Reward, F1,92 = 27.41, p-value < 0.001; early in learning post hoc p-value = 0.087, late in learning post hoc p-value < 0.001; Licking, F1,92 = 23.94, p-value < 0.001; early in learning post hoc p-value = 0.199, late in learning post hoc p-value < 0.001. A heat map illustrating normalized decoding early and late in learning, and the change in that decoding across learning, reveals improved CS+ detection (post hoc p-value = 0.020), reward detection (post hoc p-value = 0.020), and lick rate prediction across learning (post hoc p-value = 0.009; other p-values > 0.37; Figure 3B). Overall, the activity dynamics of dmPFC excitatory output neurons can be used for cue detection, cue discrimination, reward detection, and prediction of licking after learning. However, activity in these neurons cannot be used to accurately infer reward delivery or licking during sessions early in learning, suggesting that the coding of these variables may be learning dependent.

Activity of dorsomedial prefrontal cortex (dmPFC) excitatory neuronal ensembles can decode specialized information related to reward seeking.

(A) Cumulative distribution frequency (CDF) plots illustrating the population decoding accuracy for variables related to conditioned reward seeking (CS+, CS-, CS+ vs. CS-, reward, and licking). Dotted lines refer to shuffled control data for early and late in learning. (B) Heat maps depicting population decoding accuracy early in learning (first column), late in learning (second column), and the change across learning (third column). Data have been normalized to CS+ vs. CS- late in learning to provide comparison of decoding strength across variables. (C) CDF plots illustrating the decoding accuracy of specific ensembles for variables related to conditioned reward seeking. Dotted lines refer to shuffled control data for all ensembles. (D) Heat maps depicting the contribution of each ensemble to decoding, with each column corresponding to a different ensemble. Data have been normalized to the maximum decoding strength by an ensemble for each variable to allow comparison of ensemble decoding strength across each variable. *p-value < 0.05; **p-value < 0.01; ***p-value < 0.001 for post hoc comparisons.

The population dynamics of dmPFC excitatory output neurons can be used to predict environmental and behavioral factors related to conditioned reward seeking, but how unique dmPFC neuronal ensembles contribute to this information coding is unknown. Thus, we next trained a decoder to predict information related to the Pavlovian conditioned licking task based on the activity dynamics of neurons within each ensemble. Overall, we find superior decoding of the CS+, CS-, CS+ vs. CS- (cue discrimination), reward, and conditioned licking in select neuronal ensembles (Figure 3C and D). CS+: The timing of CS+ presentation could be decoded based on the activity of all cell clusters, although it was best predicted based on the activity of neurons within Cluster 1. CS-: The CS- was significantly predicted by Clusters 1, 3, 4, and 5, and was also best predicted based on the activity of Cluster 1. CS+ vs. CS-: Activity in Cluster 2, but not other clusters, could be used to significantly discriminate between the CS+ and CS-. Reward: Activity within Clusters 1, 4, and 5 could be used to detect the reward, with activity in Cluster 4 being the best predictor. Licking: Activity in Cluster 5, but not other clusters, could be used to decode conditioned licking. ANOVAs, main effects of shuffling: CS+, F5,162 = 14.10, p-value < 0.001; post hoc p-values < 0.05 for all clusters; CS-, F5,162 = 12.72, p-value < 0.001; post hoc p-values < 0.05 for Clusters 1, 3–5; post hoc p-value = 0.84 for Cluster 2. CS+ vs. CS-, F5,162 = 3.46, p-value = 0.005; post hoc p-value = 0.004 for Cluster 2, post hoc p-values > 0.10 for other clusters; Reward, F5,162 = 8.03, p-value < 0.001; post hoc p-values < 0.007 for Clusters 1, 4, and 5; post hoc p-values > 0.24 for Clusters 2 and 3. Licking, F5,162 = 2.91, p-value = 0.015; post hoc p-value = 0.005 for Cluster 5, post hoc p-values > 0.16 for all other clusters. Overall, these data reveal that dmPFC excitatory neuronal ensembles predict select environmental and behavioral factors related to conditioned reward seeking after learning.

Excitatory neuronal ensembles in dmPFC differentially develop during Pavlovian reward conditioning and are stable after learning

Two-photon microscopy enables visual tracking of single, virally labeled neurons across days (Namboodiri et al., 2019; Otis et al., 2017). Thus, we were able to track a subset of the above dmPFC excitatory output neurons from early to late in learning behavioral sessions (n = 5 mice, 9 FOVs, 416 neurons) to evaluate neuronal response evolution across learning (Figure 4A). Overall, we found that neurons in Cluster 1, which show excitatory responses to both cues and to the reward late in learning, also show robust responses to the same stimuli early in learning (Figure 4B). In contrast, neurons in Clusters 2–5 did not show obvious responses before learning during CS+ or CS- trials (Figure 4B), suggesting that their activity patterns evolved across conditioning and may therefore be reflective of learning.

Figure 4 with 2 supplements see all
Ensemble-specific activity dynamics differentially evolve across learning.

(A) Example field of view (FOV) showing the same neurons (arrowheads) from early (top) and late (bottom) in learning sessions, tracked across days. (B) Mean activity traces during CS+ and CS- trials for each ensemble early (top row) and late (bottom row) in learning. (C) Mean responses of all tracked neurons during CS+ and CS- trials early and late in learning. (D) Responses separated by cluster reveal adaptations in CS+ and/or CS- encoding for Clusters 2–5, but no significant changes for Cluster 1. ***p-value < 0.001; **p-values = 0.01. (E) Correlation plot displaying mean responses (baseline vs. cue/reward period) of all tracked neurons during CS+ and CS- trials early and late in learning. (F) Correlation plots separated by cluster displaying the mean response of each neuron early and late in learning. Pearson-R values are displayed in the top left corner for all cells and for each ensemble (E–F). *p-value < 0.05.

To confirm that neurons in Clusters 2–5, but not Cluster 1, displayed CS+ and CS- response adaptations across learning, we compared responses from early in learning to late in learning behavioral sessions (Figure 4C and D). Two-way ANOVAs for Clusters 2–5 revealed an effect of behavioral session (F-values > 21.4; p-values < 0.001), whereas a two-way ANOVA for Cluster 1 did not reveal an effect of behavioral session (F-value = 0.034; p-value = 0.854). Post hoc comparisons confirmed no change in activity across sessions for Cluster 1 for CS+ or CS- trials (post hoc p-values > 0.20). In contrast, neurons in Cluster 2, 4, and 5 showed significant response adaptations for CS+ trials (post hoc p-values < 0.001), and Clusters 3 and 4 both showed a significant increase in response amplitudes across sessions for CS- trials (post hoc p-values < 0.02; all other p-values > 0.05 as shown in Figure 4D). Thus, neurons in Clusters 2–5 showed CS+ and/or CS- trial response adaptions from early in learning to late in learning behavioral sessions, whereas neurons in Cluster 1 did not. Interestingly, responses during CS+ and CS- trials late in learning were highly predicted by responses early in learning for all neurons in combination (Figure 4E) and for individual cell clusters except Cluster 2 (Figure 4F). Pearson-R correlation values can be found in the inset of each subpanel (Figure 4E and F) and reveal positive correlations during CS+ and/or CS- trials that are significant for Clusters 1, 3, 4, and 5 (*denotes p-value < 0.05). Thus, although responses in Clusters 2, 4, and 5 adapted from early to late in learning behavioral sessions, responses in these clusters (and Cluster 1) early in learning could be used to predict their subsequent responses late in learning. Next, we confirmed that these correlated activity patterns were not simply due to similarities in responses between cells within tracked FOVs. To do so, we shuffled the tracking IDs for each neuron within each FOV and repeated the Pearson-R correlation analysis. Shuffling abolished the correlated response patterns between early and late in learning behavioral sessions (Figure 4—figure supplement 1), confirming that correlated activity patterns from the unshuffled datasets represent similarities in activity across learning for individual neurons – rather than correlated activity patterns within tracked FOVs in general. In further support of this idea, cross-correlation analysis reveals little lag between neurons in the same cluster, as compared to neurons across clusters, both during early and late in learning behavioral sessions (Figure 4—figure supplement 2).

We next evaluated changes in neuronal activity within sessions, rather than between sessions, by examining trial-by-trial cue response evolution for each cluster. To do so, we used data from tracked neurons (Figure 4) such that we could examine cluster-specific adaptations for both early in learning and late in learning datasets. Mice showed consistent behavioral responses across trials both early in learning and late in learning (Figure 5A and D; n = 5 mice). Furthermore, the activity of dmPFC excitatory neurons (n = 9 FOVs, 416 tracked neurons) was consistent across trials during these behavioral sessions (Figure 5B and E), leading to a strong correlation in activity from trials at the beginning of the session (CS- and CS+ trials 1–10) vs. the end of the session (CS- and CS+ trials 41–50) for all neurons (Figure 5C and F). Pearson-R correlation values can be found in the inset of each subpanel (Figure 5C and F) and reveal positive correlations during CS+ and/or CS- trials (*denotes p-value < 0.001). Overall, these data reveal within-session response stability (behavioral and neuronal) during early in learning and late in learning datasets. Whether the observed response stability within sessions is also maintained between sessions after learning, however, remains unclear.

Stable conditioned licking and ensemble-specific cue responses within early and late in learning behavioral sessions.

(A) Licking heat maps from sessions early in learning averaged across mice reveal stable licking across CS+ but not CS- trials, specifically during the 3 s reward epoch. (B) Activity heat maps from sessions early in learning averaged across neurons reveal consistent activity patterns across CS+ and CS- trials for each cluster. (C) Mean cue responses of all neurons in the first 10 trials during early in learning sessions were correlated with mean cue responses during the last 10 trials during early in learning sessions. (E) Licking heat maps from sessions late in learning averaged across mice reveal stable licking across CS+ but not CS- trials, specifically during CS+ delivery and subsequent 3 s reward epoch. (F) Activity heat maps from sessions late in learning averaged across neurons reveal consistent activity patterns across CS+ and CS- trials for each cluster. (G) Mean cue responses of all neurons in the first 10 trials during late in learning sessions were correlated with mean cue responses during the last 10 trials during late in learning sessions. *p-values < 0.001.

We next tracked the activity dynamics of dmPFC excitatory neurons after learning to determine if the defined neuronal ensembles remain stable or adapt across days (n = 3 mice, 4 FOVs, 142 neurons). Mice showed equivalent behavioral responses during these two late in learning behavioral sessions (Figure 6A), as a repeated t-test revealed no change in cue discrimination scores across days (t3 = 0.217, p-value = 0.835). Furthermore, neuronal responses during both CS+ and CS- trials were highly correlated across these two behavioral sessions (Figure 6B and C; CS+: Pearson-R = 0.75, p-value < 0.001; CS-: Pearson-R = 0.30, p-value < 0.001), unless tracking IDs were shuffled for each FOV (Figure 6—figure supplement 1). Heat maps for each cluster reveal these highly correlated response patterns during both CS+ trials (Figure 6D) and CS- trials (Figure 6E). Overall, these data suggest that dmPFC excitatory neuronal ensembles display day-to-day response stability after cue-reward learning.

Figure 6 with 1 supplement see all
Ensemble-specific activity dynamics are maintained across days after learning.

(A) Cue discrimination scores (auROC; CS+ vs. CS-) showing similar behavior across days after learning. Data presented as mean ± SEM. (B) Correlation plots displaying the change in activity during CS+ and CS- trials for all tracked neurons during two imaging sessions late in learning. These responses were highly correlated (*p-values < 0.001). (C) Activity heat maps for all tracked neurons separated by cue (columns) and day (rows) reveal similar population dynamics across days after learning. (D–E) Activity heat maps for each ensemble separated into CS+ trials (D) and CS- trials (E) confirm similar activity patterns across days after learning.

Discussion

Here, we characterize unique excitatory neuronal ensembles in dmPFC that differentially predict behavioral task performance and encode specialized information related to Pavlovian conditioning. The responses of each ensemble differentially emerge across learning in a manner that improves the predictive validity of dmPFC population dynamics for deciphering reward delivery, reward-predictive cue presentation, and task-related behavioral output. Considering the day-to-day stability of dmPFC activity dynamics after learning, our results suggest that each ensemble may be comprised of a unique set of cell types. Future studies that characterize the circuit connectivity, gene expression, and behavioral function of each neuronal ensemble, defined based on in vivo activity dynamics, are essential for understanding the dmPFC circuit contributions to reward processing.

Like previous studies, we find heterogenous activity patterns among dmPFC excitatory output neurons during reward seeking (Murugan et al., 2017; Otis et al., 2017; Otis et al., 2019; Siciliano et al., 2019) and use a spectral clustering algorithm to isolate five unique neuronal ensembles. Despite these findings, how each neuronal ensemble may be composed of unique cell types – for example, based on projection specificity – remains unclear. Previously, using the same Pavlovian reward-seeking task described here, we found that dmPFC→NAc neurons are ‘generally excited’ whereas dmPFC→PVT neurons are ‘generally inhibited’ following the presentation of a CS+, but not CS-, such that their overall activity patterns fit well within Cluster 2 (dmPFC→NAc) and Cluster 5 (dmPFC→PVT; Otis et al., 2017). Interestingly, in that study we found that optogenetic inhibition of dmPFC→PVT neurons facilitates cue-reward learning, whereas optogenetic activation of the pathway prevents learning and cue-evoked reward seeking. These data are consistent with the current findings showing that the proportion of cue-inhibited dmPFC neurons positively predicts behavioral performance in the task (see Figure 2A). Thus, we predict that Cluster 5 is accounted for in part by dmPFC→PVT neurons, although further investigation is required to confirm this idea. Considering the heterogeneity found even in projection-specific recording studies in dmPFC (Murugan et al., 2017; Otis et al., 2017; Otis et al., 2019; Siciliano et al., 2019; Vander Weele et al., 2018), it is unlikely that a single projection pathway could be isolated to a single neuronal ensemble. To unravel the circuit connections of these unique cell types, it will therefore be necessary to selectively label neurons based on their in vivo activity dynamics, allowing for post hoc examination of their connectivity. Virally packaged fluorescent proteins that allow light-driven labeling of activated neurons, such as CaMPARI (Fosque et al., 2015), could be useful in this regard but also have limitations that would prevent precision labeling of selected neuronal ensembles (e.g., all activated cells would be labeled during UV light delivery, rather than only cells determined to be within a defined ensemble). Thus, development of novel technologies that allow robust labeling of experimenter-selected neurons are critical for identifying the projection profile of unique neuronal ensembles not only in dmPFC, but throughout the brain.

In addition to distinct circuit connectivity patterns, dmPFC neuronal ensembles may display differences in gene expression that could account for their unique activity dynamics. Although little is known about ensemble-specific gene expression in the dmPFC, cortical excitatory projection neurons are thought to express Camk2α (Dittgen et al., 2004), and there is evidence that the immediate early gene NPAS4 is upregulated in reward-responsive, but not aversion-responsive projection neurons (Ye et al., 2016). Additionally, layer-specific gene expression patterns may be present, such as in the case for genes encoding dopamine receptors (Gaspar et al., 1995), nicotinic acetylcholine receptors (Verhoog et al., 2016), noradrenergic receptors (Santana et al., 2013), and more (for review, see Santana and Artigas, 2017). However, recording experiments from subpopulations of dmPFC excitatory output neurons, defined based on gene expression, during reward seeking have been limited. Altogether, a more thorough characterization of ensemble-specific gene expression is needed to ascertain whether genetic differences account for unique activity patterns among dmPFC excitatory neuronal ensembles. Experiments involving single-cell sequencing ex vivo, such as patch-seq (Cadwell et al., 2016; Cadwell et al., 2017), could provide gene expression readouts from neuronal ensembles detected in vivo to improve our understanding of the gene expression differences that contribute to the evolution of distinct neuronal ensembles.

Our data showing learning-related, stimulus-specific activity patterns among dmPFC excitatory neuronal ensembles is consistent with previous studies. Previous investigations harnessing in vivo electrophysiology have found coordinated activity among undefined dmPFC cell populations during reward seeking, for example, related to consummatory behavior (licking) in a learning task (Horst and Laubach, 2013), initiation of complex behavioral strategies (Powell and Redish, 2014), recently committed errors in behavioral responding (Powell and Redish, 2014), and behavioral actions based on flexible information (such as rule shifting; Bissonette and Roesch, 2015; Del Arco et al., 2017; Durstewitz et al., 2010; Powell and Redish, 2016; Rodgers and DeWeese, 2014). Using waveform matching across days, one study even demonstrated day-to-day stability in behavioral strategy-related firing patterns (Powell and Redish, 2014), like another study showing stable fos expression, a marker of activated neurons, in dmPFC neurons across days in an appetitive conditioning task (Brebner et al., 2020). Altogether, these data are consistent with our findings showing day-to-day stability in variable-specific encoding patterns among dmPFC neuronal ensembles.

An important consideration for our study is the existence of heterogeneity not only between experimenter-defined cell clusters, but also within these clusters. We chose to use spectral clustering to define unique response dynamics in dmPFC neurons, as opposed to other methods (e.g., agglomerative, K-means), based on preliminary analyses (see Figure 1—figure supplements 3 and 4). Although these analyses suggested that spectral clustering provides the best separation of dmPFC neurons into groups, likely due to its ability to separate dynamic and non-singular response features, these methods overall are far from perfect. Simply put, we are trying to simplify a heterogeneous pattern of activity into homogeneous groups, which given current methodologies is only possible to some degree. Further advancement of clustering and other computational methodologies should continue to improve our ability to detect and understand unique response patterns within complex brain circuits.

One caveat to our study is the possibility that observed changes in neuronal activity could be driven by factors other than associative learning, such as non-associative learning (e.g., habituation) or feeding in general. However, there are several lines of evidence that suggest otherwise. First, cluster-specific responses or adaptations in activity across learning were generally distinct for CS+ and CS- trials (see Figure 4), suggesting that the observed adaptations are related to cue-reward associative information. Second, cue discrimination, lick, and reward decoding improved over time, despite mice consuming the reward both before and after learning (see Figure 5A and D). Finally, previous evidence in the same behavioral task reveals that both excitatory and inhibitory responses among subpopulations of dmPFC pyramidal neurons are critical for cue-reward associative learning and cue-driven licking but not licking alone (Otis et al., 2017; Otis et al., 2019). Overall, evidence suggests that our identified adaptations in dmPFC activity dynamics are related to cue-reward associative learning. However, the possibility that there are components not related to associative learning should certainly be considered. Another consideration to note is that the observed conditioned licking responses could be influenced by the effects of water restriction, as well as the ratio of sucrose/water used as a reward in our behavioral paradigm (Davey and Cleland, 1982; Harris and Thein, 2005; Tabbara et al., 2016). However, we did not test the influence of these variables in the current experiments.

Here, we identify several distinct dmPFC excitatory neuronal ensembles during a Pavlovian conditioned licking task. Despite the apparent simplicity of this task, our findings reveal complex and specialized coding patterns among these heterogeneous neuronal ensembles, which are unlikely to be specific to one projection pathway or gene expression profile (Otis et al., 2017). Furthermore, our data suggest that unique aspects of reward seeking may be controlled by distinct neuronal ensembles. Functionally targeting each neuronal ensemble independently, such as through ensemble-specific single-cell optogenetic experiments, is therefore critical for understanding how these complex coding patterns control behavioral output (Marshel et al., 2019). Although our results improve our working knowledge of the unique excitatory neuronal ensembles within dmPFC during conditioned reward seeking, they also highlight critical gaps in the field of neuroscience that are important to resolve through new and emerging neurotechnologies.

Materials and methods

Subjects

Male and female C57BL/6J mice (8 weeks of age/25–35 g at study onset; Jackson Labs) were group-housed pre-operatively and single-housed post-operatively under a reversed 12:12 hr light cycle (lights off at 8:00 a.m.) with access to standard chow and water ad libitum. Experiments were performed in the dark phase and in accordance with the NIH Guide for the Care and Use of Laboratory Animals with approval from the Institutional Animal Care and Use Committee at the Medical University of South Carolina.

Surgeries

Request a detailed protocol

Mice were anesthetized with isoflurane (0.8–1.5% in oxygen; 1 L/min) and placed within a stereotactic frame (Kopf Instruments) for cranial surgeries. Ophthalmic ointment (Akorn), topical anesthetic (2% Lidocaine; Akorn), analgesic (Ketorolac, 2 mg/kg, i.p.), and subcutaneous sterile saline (0.9% NaCl in water) treatments were given pre- and intra-operatively for health and pain management. Before lens implantation, a virus encoding the calcium indicator GCaMP6s (AAVdj-CaMK2α-GCaMP6s; UNC Vector Core) was unilaterally microinjected into the dmPFC (specifically targeting prelimbic cortex; 400 nL; anterior-posterior [AP], +1.85 mm; medial-lateral [ML], −0.50 mm; dorsal-ventral [DV], −2.45 mm). Next, a microendoscopic gradient refractive index lens (GRIN lens; 4 mm long, 1 mm diameter; Inscopix) was implanted dorsal to dmPFC (AP, +1.85 mm; ML, −0.50 mm; DV, −2.15 mm) as previously described (Otis et al., 2017; Resendez et al., 2016). A custom-made ring (stainless steel; 5 mm ID, 11 mm OD) was then adhered to the skull using dental cement and skull screws. Head rings were scored on the base using a drill for improved adherence. Following surgeries, mice received antibiotics (Cefazolin, 200 mg/kg, sc) and recovered with access to food and water ad libitum for at least 21 days. Histology was performed after the experiments to ensure virus placement in dmPFC and lens placement dorsal to dmPFC GCaMP6s-expressing neurons.

Behavioral procedure

Request a detailed protocol

Mild water restriction facilitates appetitive learning in head-restrained mice, particularly when the reinforcer is sucrose mixed in water (Guo et al., 2014). Additionally, mild water restriction plus head-restraint results in minimal signs of distress while allowing simultaneous two-photon imaging across many trials of an appetitive behavioral task (Guo et al., 2014; Goltstein et al., 2018; Otis et al., 2017; Otis et al., 2019; Namboodiri et al., 2019). Thus, we used water restriction in combination with Pavlovian conditioning for a liquid sucrose reward to study appetitive learning in mice. Following recovery from surgery, mice were water restricted (water bottles removed from cages), and ~1 mL of water was delivered every day to a dish placed within each home cage (we gave less or more water during beginning phases of each experiment in attempt to calibrate weights to 90% of starting weights, which were 25–35 g). No health issues related to dehydration arose at any point during or after implementation of this protocol, as previously reported (Guo et al., 2014). Once mice reached 87.5–92.5% of their free drinking weight, they underwent 3 days of 30 min habituation sessions, during which they were head-restrained and received droplets of liquid sucrose (12.5% sucrose in water; ~2.0 μL per droplet, ~0.1 mL total per session) at random intervals through a gravity-driven, solenoid-controlled lick spout (see Figure 1A). Next, mice underwent head-fixed Pavlovian conditioning, wherein two conditioned stimuli (CS; 70 dB; 3 or 12 kHz as described in Otis et al., 2017) were randomly presented 50 times each (see Figure 1B). One tone (CS+) was paired to the delivery of a sucrose reward (12.5% sucrose in water, ~2.0 μL per droplet, ~0.1 mL total per session) after a 1 s trace interval, whereas the other tone (CS-) did not result in sucrose delivery. The trace interval was included to allow isolated detection of sensory cue- and sucrose reward-related neuronal activity patterns, as described previously (Otis et al., 2017). The inter-trial interval between the previous reward delivery (CS+ trials) or equivalent time epoch for unrewarded trials (CS- trials) and the next cue was chosen as a random sample from a uniform distribution ranging from 20 to 50 s. Cue discrimination was quantified using the normalized area under a receiver operating characteristic (2 × (auROC-0.5)) formed by the number of baseline-subtracted licks during the CS+ vs. CS- trace intervals (1 s epoch after tone offset). For all behavioral experiments, we classified sessions as ‘early’ or ‘late’ in learning based on animals’ behavioral performance, quantified by their cue discrimination (early, any sessions before auROC < 0.3; late, any sessions after auROC > 0.31). Recordings from early in learning and late in learning sessions were on separate days, as only one session was given per day (see Figure 1—figure supplement 1). Mice received ~1 mL of water placed in a dish in their home cage after each conditioning session. Most mice readily acquired this task, but some required more sessions (without imaging) to discriminate between the cues. A small subset of animals had their head rings fall off during initial phases of the experiment, and thus their neural data is only included for early in learning behavioral sessions (n = 3 mice).

Multiphoton imaging

Request a detailed protocol

We visualized and longitudinally tracked GCaMP6s-expressing dmPFC neurons throughout Pavlovian reward conditioning using a multiphoton microscope (Bruker Nano Inc) equipped with a hybrid scanning core with galvanometers and fast resonant scanners (>30 Hz; we recorded with four frame averaging to improve signal-to-noise ratio), GaAsP photodetectors with adjustable voltage and gain, a single green/red NDD filter cube, a long working distance 20× air objective designed for optical transmission at infrared wavelengths (Olympus, LCPLN20XIR, 0.45NA, 8.3 mm WD), a moveable objective in the X, Y, and Z dimensions, and a tunable InSight DeepSee laser (Spectra Physics, laser set to 920 nm, ~100 fs pulse width). Data were acquired using PrairieView software and converted into an hdf5 format for motion correction using SIMA (Kaifosh et al., 2014). FOVs were visualized from 0 to 300 μm beneath the GRIN lens and were selected before ‘early in learning’ imaging sessions. Each FOV was separated by at least 50 μm of objective movement in the Z-plane to avoid visualization of the same cells in multiple FOVs. Due to non-linear ray transformation introduced by the GRIN lens, this was especially important when imaging deeper FOVs. To ensure there was no signal bleed-through from superficial FOVs, a full cell layer was visualized between each FOV. Since neurons are ~20 µm in diameter, visualization of three layers (two imaging planes and an in-between plane) led to roughly 50 µm of z-movement and isolation of unique cell layers (Figure 1—figure supplement 2).

We attempted to image each FOV twice, once during a session early in learning and once during a session late in learning. The exact number of conditioning sessions per animal depended on the speed of learning and the number of visualizable FOVs in that animal. For example, if an animal had two FOVs, FOV 1 would be imaged on its first day of conditioning and FOV 2 would be imaged on its second day (these FOVs were ‘early in learning’ if cue discrimination scores remained below 0.3). Once the animal learned (cue discrimination scores above 0.31), both FOVs would be reimaged during separate conditioning sessions (these FOVs were ‘late in learning’). Within each FOV, regions of interest around each cell were manually traced using the ‘polygon selection’ tool in FIJI (Schindelin et al., 2012). Care was taken to only assign regions of interest to visually distinct cells, and each region of interest was confirmed independently by an observer who was blind to the experimental conditions. In cases where neighboring cells or processes overlapped, regions of interest were drawn to exclude areas of overlap. The blind observer also ensured that cells were clearly in view (cells were not counted if the center-of-mass was not in focus) and that neurons were not visible in multiple FOVs (if they were, the imaging plane would not be used; n = 0). To do so, the blind observer examined z-stack videos to determine if the neurons between FOVs were independent (see Figure 1—video 1). Fluorescence trace extraction and all subsequent analyses were performed using custom-written Python code (Namboodiri et al., 2019; Otis et al., 2019).

Data collection and statistics

Request a detailed protocol

Behavioral sessions were controlled through a custom MATLAB graphical user interface connected to Arduino and associated electronics. Transistor-transistor logic pulses between the Arduino and the microscope were used to start and stop imaging and behavioral programs, and to allow frame timestamp collection for post hoc synchronization of the behavioral and imaging data. Behavioral data were recorded and extracted using MATLAB, analyzed and graphed using Python and/or Graphpad Prism, and figures were produced using Adobe Illustrator. Behavioral data were presented as normalized auROC ‘cue discrimination’ scores (2 × (auROC-0.5)), comparing licking rates during the 1 s period between each CS+ and reward (1 s trace interval) or CS- and equivalent no-reward epoch (1 s trace interval). A cue discrimination score of −1.0 would therefore suggest more licking during all CS- trials vs. CS+ trials. In contrast, a score of +1 would suggest more licking on all CS+ trials vs. CS- trials. Following data collection, a two-way ANOVA was used to compare baseline-subtracted lick rates (Δ lick rate; calculated as: 1 s trace interval licking frequency – 3 s baseline licking frequency), followed by Bonferroni multiple comparisons tests if appropriate (Otis et al., 2017).

Fluorescence signals from each cell were extracted following motion correction using custom-written Python code. Activity in each cell was then aligned to each CS- and CS+ trial, which included a ~3 s baseline (23 frames), ~3 s cue period (including CS+ or CS- trace intervals; 23 frames), and ~3 s reward period (following sucrose delivery for CS+ trials or equivalent no-reward epoch during CS- trials; 23 frames). This resulted in 69 frames for each CS+ and CS- trial, which was then combined into a 138-column vector of data points for each cell, referred to as a peristimulus time histogram (PSTH). Due to the robust responses of dmPFC neurons late in learning (as also seen in Otis et al., 2017), data included within a 138 × 1511 vector (138 frames, 1511 neurons) from late in learning sessions then underwent principal components analysis to reduce their dimensionality in preparation for clustering, an unbiased means of identifying putative neuronal ensembles in dmPFC. We used an analysis and code that was previously created by others and kindly shared (full description of clustering in Namboodiri et al., 2019). The principal components were determined using the point of inflection on a scree plot, which graphs the PSTH variance explained vs. an increasing number of principal components (Figure 1—figure supplement 3A). Beyond this inflection point, minimal variability can be explained by additional principal components. The data were then projected onto the subspace formed by these principal components (Figure 1—figure supplement 3B), which was fed into the clustering algorithm. We used the Scikit-learn function sklearn.cluster.SpectralClustering to perform spectral clustering on these data, which uses a k-nearest neighbor connectivity matrix to create clusters. The optimal number of clusters and nearest neighbors were determined by checking a range of values for each and choosing the parameters with the maximum silhouette score. After spectral clustering, each neuron was assigned a label based on its corresponding cluster. Results from spectral clustering, including the silhouette scores and formed ensembles, were compared to agglomerative (‘hierarchal’) clustering and K-means clustering datasets, each of which was also performed using Scikit-learn. Silhouette plots (Figure 1—figure supplement 3C–E) revealed that spectral clustering performed better than agglomerative and k-means clustering algorithms, although all three clustering algorithms resulted in separation of neurons in a manner that resulted in similar neuronal response dynamics across five clusters (see Figure 1G–H for spectral clustering and Figure 1—figure supplement 4 for agglomerative and k-means clustering). Due to the improved performance of spectral clustering over agglomerative and k-means, we used spectral clustering for all additional datasets as previously described (Namboodiri et al., 2019).

Spatial mapping was performed for all FOVs recorded during late in learning behavioral sessions, since the clustering analysis was performed using activity patterns measured during those sessions (see above). To estimate the relative spatial locations along the AP and ML axes, we measured the central point of each neuron within a 512 × 512 standard deviation project image following XY plane transformation (to account for GRIN lens light reflection; Figure 1—figure supplement 5A). Additionally, we estimated the relative spatial locations along the DV axis by measuring the amount of objective movement along the Z axis after focusing on the bottom of the lens (Figure 1—figure supplement 5B). Kernel density estimation for cell locations relative to other cells are quantified in histograms along each axis in Figure 5. It should be noted that due to non-linear ray transformation through the GRIN lens, variability in lens angles, variability in head ring tilt, and general difficulty identifying the exact location of each neuron post-mortem, these anatomical locations are highly approximate and are not to a precise scale. Despite these limitations, the data provide a general idea of the spread of neurons, grouped by cluster, along the AP, ML, and DV axes in our experiments.

We compared the behavioral performance of each mouse with the number of neurons detected per ensemble in the corresponding FOV visualized during that behavioral session. Specifically, we used Pearson’s correlations to compare cue discrimination scores (normalized auROC, CS+ vs. CS-) with the number of neurons detected in each ensemble during late in learning behavioral sessions (one unique FOV per session; Figure 2A). Additionally, we used Pearson’s correlations to compare the initiation of licking ‘errors’, wherein mice increased licking rates following the presentation of the CS- (normalized auROC, CS- vs. baseline epoch), with the number of detected neurons per ensemble (Figure 2B).

Decoding analyses were employed to determine whether neuronal activity within each FOV, and each ensemble within each FOV, could predict variables within the behavioral paradigm better than by chance. Specifically, we used these neuronal data to inform a decoder to predict (1) CS+: 2 s CS+ epoch vs. 2 s baseline, (2) CS-: 2 s CS- epoch vs. 2 s baseline, (3) CS+ vs. CS-: 2 s CS+ epoch vs. 2 s CS- epoch, (4) Reward: 1 s epoch starting at sucrose delivery vs. 1 s pre-sucrose baseline, and (5) Licking: relative licking rate for each mouse during CS+ trials; 6 s epoch starting from CS+ onset to include anticipatory licking and sucrose consumption. Decoding was performed on the entire dataset (population decoding, Figure 3A–B), and separately for each ensemble (ensemble contribution, Figure 3C–D). To perform these analyses, we used a binary decoder as described previously (Otis et al., 2017; Otis et al., 2019), implemented using the Scikit-learn functions sklearn.discriminant_analysis, sklearn.svm, and sklearn.decomposition. For population decoding (decoding based on the activity of all neurons within each FOV), decoding scores were normalized to the maximum value observed for a decoded variable (which was CS+ vs. CS-, late in learning). For ensemble-specific decoding (decoding based on the activity of all neurons within each ensemble within each FOV), scores were normalized to the maximum for each variable, such that the contribution of each ensemble to variable decoding could be evaluated. To determine whether decoding performance was significantly better than chance, we compared the decoding accuracy to that of randomized ‘shuffled’ data using two-way ANOVAs (population decoding) or one-way ANOVAs (ensemble contribution), followed by program-recommended post hoc analysis (Dunnett’s for one-way ANOVA, Tukey’s for two-way ANOVA). Because shuffled data were not significantly different between ensembles, these data were combined to improve data visualization and simplify analysis (dotted lines in Figure 3C).

Specific neurons could be reliably identified across days based on structure and relative position within each FOV (Figure 4A), allowing us to evaluate the evolution and maintenance of activity in single neurons across days. To do so, single-cell tracking was performed from early to late in learning behavioral sessions to determine neuronal response evolution across learning (Figure 4). Additionally, tracking was performed across two late in learning behavioral sessions each separated by a minimum of 48 hr to evaluate post-learning response adaptation or maintenance (Figure 6). Cell tracking was performed by a student blinded to experimental conditions to reduce the potential for experimenter-related biases. Cells were identified based on relative position and morphology across days, with conservative longitudinal tracking to prevent between-cell comparisons. Following initial cell tracking, a second experimenter confirmed accurate day-to-day cellular co-registration for cross validation. Two-way ANOVAs were used to compare the mean response (3 s baseline vs. 6 s cue/reward epoch) of each cluster across cues (CS+ vs. CS-) and time (e.g., early vs. late in learning; Figure 4D). Additionally, Pearson’s correlation analyses were used to determine linear association of responses for each cluster across days (Figures 4E–F and 6B) or across trials within sessions (Figure 5C and F). We performed control Pearson’s correlation analyses for each tracking experiment wherein the student-identified tracking IDs within each FOV (but not between) were shuffled (Figure 4—figure supplement 1; Figure 6—figure supplement 1) to confirm that correlated activity patterns required accurate cell tracking. Finally, we used a cross-correlation analysis to identify lags for correlated activity patterns for each neuron as compared to all other neurons in each FOV for tracked datasets, which we then separated by cluster and averaged across each cluster-to-cluster comparison. To perform the cross-correlation analysis, we used the fluorescence signal of each neuron measured across the entire session to feed the SciPy functions scipy.signal.correlation and scipy.signal.correlation_lags. Optimal lags were then averaged across all neurons for each cluster-to-cluster comparison and plotted within a heat map (Figure 4—figure supplement 2). It should be noted that the cross-correlation analysis has limitations due to the temporal dynamics of the calcium indicator and general two-photon imaging speed, preventing spike-to-spike comparisons across neurons. Rather, the analysis allows assessment of general correlations, and associated lags, across neurons on a longer timescale (hundreds of milliseconds).

Raw behavioral data and fluorescent signals from imaging datasets, along with example images presented throughout this manuscript, can be found in an open-source data repository (https://doi.org/10.5061/dryad.xksn02vg8).

Data availability

All code and data generated for this study are available on Dryad Digital Repository, accessible here: https://doi.org/10.5061/dryad.xksn02vg8. Raw imaging videos are available on request.

The following data sets were generated
    1. Grant RI
    2. Doncheck EM
    3. Vollmer KM
    4. Winston KT
    5. Romanova EV
    6. Siegler PN
    7. Holman H
    8. Bowen CW
    9. Otis JM
    (2021) Dryad Digital Repository
    Behavioral and Imaging data for: Grant et al. (2021). Specialized coding patterns among dorsomedial prefrontal neuronal ensembles predict conditioned reward seeking.
    https://doi.org/10.5061/dryad.xksn02vg8

References

Decision letter

  1. Mario Penzo
    Reviewing Editor; National Institute of Mental Health, United States
  2. Kate M Wassum
    Senior Editor; University of California, Los Angeles, United States
  3. Maria M Diehl
    Reviewer; Kansas State University, United States
  4. Anthony Burgos-robles
    Reviewer; University of Texas San Antonio, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This study offers novel insight on neurophysiological mechanisms occurring in the prefrontal cortex (PFC) during the learning of cue-reward associations. The results will be of great interest to scientists in the behavioral and systems neuroscience fields, but also to the broader scientific audience due to the use of challenging in vivo techniques, computational analyses, and statistical methods.

Decision letter after peer review:

Thank you for submitting your article "Specialized coding patterns among dorsomedial prefrontal neuronal ensembles predict conditioned reward seeking" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Kate Wassum as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Maria M Diehl (Reviewer #2); Anthony Burgos-robles (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential Revisions:

The reviewers agree that the authors' identification of dmPFC neuronal ensembles with heterogeneous coding patterns offers important insight about the neurophysiological mechanisms governing cue-reward learning. However, as independently outlined below by each one of the reviewers, there are multiple aspects of the study that must be strengthened before the paper can be considered for publication in eLife. With the exception of the optogenetic behavioral manipulations requested by Reviewer # 3, we consider that all other concerns raised by the reviewers must be addressed in full. Specifically, the authors must address:

1) All technical concerns regarding the imaging technique that were raised by Reviewer # 1.

2) All statistical and data analysis concerns raised by Reviewers #1, 2 and 3.

3) Additional clarifications of methods and analyses, and an improved discussion as suggested by the reviewers.

4) The concerns about the behavioral design raised by Reviewer #1.

5) Due to the lack of causality, revise the text to soften the language a bit in some some of the sentences describing the interpretation.

Please notice that addressing these concerns will require, at a minimum, the incorporation of new data analyses, validation data, and an extensive revision of the manuscript's text.

If you have not already done so, please include a key resource table.

Reviewer #1:

The manuscript addresses a critical question in cortex and neuroscience in general – how do neuronal coding patterns lead to behavioral outputs and learned behaviors? While the manuscript takes a technically innovative approach there are multiple issues with the behavioral design, imaging, and interpretation.

There are issues with the using multiple imaging planes in each animal and with the longitudinal co-registration. Regarding the FOVs, the authors report that each imaging plane was separated by 50uM. However, since GRIN lenses display non-linear ray transformations in both the lateral and axial planes, movement of the external objective in 50uM steps cannot be assumed to produce a 50uM change in imaging plane. Even if we are to accept that no cells were double counted, a more critical issue is that collection efficiency, and therefore SNR of the recording, will be altered as a function of the distance of each neuron from the ideal focal plan of the implanted GRIN endoscope. Additionally, more information is needed on the longitudinal registration including how these data were validated and the percentage of neurons tracked.

With respect to the behavior, it is not clear whether the changes are the result of reward learning or more simply related to non-associative variables like habituation, lick rate, or volume consumed.

The manuscript from Grant et al., explores heterogeneity in coding patterns of mPFC pyramidal neurons during reward learning. The manuscript addresses a critical question in cortex and neuroscience in general – how do neuronal coding patterns lead to behavioral outputs and learned behaviors? While the manuscript takes a technically innovative approach there are multiple issues with the behavioral design, imaging, and interpretation. These issues are addressable, and the manuscript has potential for high impact in the field, but to support the current conclusions would require significant additional analysis and experimentation. Issues are listed below:

1. There are major issues with the imaging methodologies, particularly with the using multiple imaging planes in each animal and with the longitudinal co-registration. Regarding the FOVs, the authors report that each imaging plane was separated by 50uM. However, since GRIN lenses display non-linear ray transformations in both the lateral and axial planes, movement of the external objective in 50uM steps cannot be assumed to produce a 50uM change in imaging plane. Indeed, in the representative images the same vasculature can be seen in multiple planes, and the veins in this area often smaller than 50uM. Though it is difficult to discern, it appears that the same cell constellations appear in multiple planes in the representatives.

2. Even if we are to accept that no cells were double counted, a more critical issue for the current claims of the manuscript is that collection efficiency, and therefore SNR of the recording, will be altered as a function of the distance of each neuron from the ideal focal plan of the implanted GRIN endoscope. This is a potentially critical flaw without significant additional analysis. For example, all of the clustering analysis could be highly biased by the number of neurons that were included from each imaging plane which is likely to vary from animal to animal. While this is always somewhat of a concern with GRIN imaging, because 3-4 imaging planes were used in each animal and that clustering analysis was performed on pooled data it is possible that difference in SNR across imaging planes is driving many of the effects in the manuscript.

3. Regarding longitudinal registration, minimal methodological information is provided which is concerning given that this a notoriously difficult endeavor especially in dense recording such as these data. How were these data validated? Was the data set scored by a second experimenter for cross validation? What percent of neurons were tracked? Was any network analysis of cell location used to verify results?

4. While the issues with the imaging are critical to address, it is likely that in depth analysis could resolve the problems without the need for additional experiments. However, there is also some problems with the behavioral design – these will either require additional experiments or require that the claims of the manuscript be altered. All of the changes in mPFC dynamics that occur across the behavioral task are claimed to be related to reward learning, but there are several processes that are not parsed in the task design. For example, would some of these changes happen with habituation? Would some of these changes happen with lick rate, volume consumed? None of those are dependent on associative learning and could just as strongly predict the changes that are seen in the dataset.

5. What is the justification for using water restriction combined with a sucrose solution in water as a reinforcer? Given that sucrose functions as a reinforcer without water restriction and that water functions as reinforcer under water deprived conditions, it is unclear whether the water or the sucrose is functioning as the primary reinforcer.

Reviewer #2:

Grant et al., used two-photon calcium imaging of dorsomedial prefrontal cortical neurons to examine the neuronal ensemble activity during a sucrose conditioning task in head-fixed mice. Using a spectral clustering algorithm, the authors characterized ensemble activity into 5 distinct clusters whose activity correlated with various aspects of the task: CS+ responding, CS- responding, CS discrimination, reward responding, and licking behavior. Cluster 1 exhibited excitatory responses to both CSs and reward delivery, Cluster 2 exhibited excitatory responses to CS+ only, Cluster 3 exhibited excitatory responses to both CSs, Cluster 4 exhibited excitatory responses to reward delivery only, and Cluster 5 exhibited inhibitory responses to CS+. Next, the authors determined whether each Cluster predicted licking behavior across each conditioning session for each mouse. They found that the proportions of neurons in Cluster 5 positively correlated with successful licking behavior (licking in response to CS+), whereas proportions of neurons in Cluster 3 positively correlated with licking errors (licking in response to CS-). The authors were next interested in whether the neural activity across all dmPFC neurons (regardless of cluster ensembles) predicted task events or animal licking behavior during early vs. late in learning sessions of the task. Overall, CS presentation, CS discrimination, reward delivery, and licking rate were predicted by dmPFC activity during late, but not early, in learning. Taking into account the cluster ensembles, the authors also identified whether the activity of each cluster could predict CS presentation, CS discrimination, reward delivery, and licking rate. Finally, the authors assessed whether the cluster ensembles remain stable after learning; Cluster 1 showed robust responses to both CSs and reward delivery during both early and late in learning sessions of the task, whereas Clusters 2-5 did not, suggesting that these latter Clusters changed their activity patterns as a function of learning. The authors conclude that excitatory neuronal ensembles in dmPFC differentially predict events and behaviors during a sucrose conditioning task and the responses can change across learning. The conclusions of the paper are supported by the data; however, some aspects of the paper need to be clarified, additional analyses performed, and more interpretation of the data is warranted in order to strengthen the importance of this study.

More clarity is needed on how early and late session classification and whether the results would be similar if data were based on trial number instead of auROC across session.

The cluster analysis would benefit from the addition of location information to determine whether the 5 identified clusters are anatomically segregated. The data would also benefit from a cross-correlation analysis to reveal if there are any significant interactions between neurons within the same FOV, determine whether they are from the same cluster, whether cells active early in learning interact with cells that are active late in learning, and whether these cross-correlations remain stable after learning.

1. Behavioral sessions were classified as either "early in learning" or "late in learning" and are defined based on the animal's performance (using auROC) across multiple sessions of the sucrose conditioning task. The authors perform an independent t-test comparing performance in early vs. late in learning sessions; however, these 2 groups of sessions were pre-defined by the animal performance itself. Therefore, it is inappropriate to use a statistical test since the periods of early and late were selected based on the behavioral data (i.e. doing a statistical test on data that was pre-selected). A statistical test is not needed here, but the authors should emphasize that the number of early vs late in learning sessions were unique to each animal and selected based on their performance (this is only mentioned in the methods, but authors should consider reiterating this point in the results). As a reader, it was very difficult to understand how early and late in learning was defined and I had to go back and read the methods multiple times and look through the results for clarification. I initially thought early vs. late in learning referred to early trials vs. late trials within a session. If early and late was defined based on trial number instead of auROC across sessions, are the behavioral results similar to what was reported? On average, how many sessions did it take for each mouse to reach criteria?

2. Because I was confused about early vs late in learning being within session vs. across session, I was also confused about the data analyzed in Figure 4. How many sessions were in early vs late? Was the miniscope lens advanced after each conditioning session? Was there a subset or a different set of neurons that were recorded from across multiple sessions/days? Also, why do the traces in Figure 1I look very identical to the traces in Figure 4B (lower)? Are the cells analyzed in Figure 4 a subset of those shown in Figure 1? Please clarify. I think it would also be helpful to use the same terms consistently throughout the text when referring to the conditioning sessions and clearly state at the beginning of the results that one session was recorded per day and the lens was advanced to a new FOV (if that was the case) – sometimes sessions are referred to as FOVs or different days.

3. PFC-PVT neurons are located in lateral layers of PFC, whereas PFC-NAc are located in more medial layers within PFC. Based on this anatomical distinction, it would be interesting to determine the location of the 5 different cluster ensembles in the layers of PFC, which might suggest cells in particular clusters project to PVT or NAc. For example, are Cluster 5 neurons largely located in lateral layers of dmPFC based on their similarity in neuronal activity to PFC-PVT neurons, whereas neurons from other Clusters are located in medial layers of PFC? This analysis would provide more evidence that Cluster 5 neurons in lateral dmPFC are likely projecting to PVT and show the inhibitory activity profile, whereas neurons from different Clusters in medial dmPFC are likely projecting to Nac and show an excitatory activity profile. This analysis would also provide more concrete interpretation of the data in the Discussion section.

4. In Figure 5, D1 and D2 are denoted to indicate dmPFC activity "across days after learning (lines 621-622). Which conditioning sessions do these refer to – which session/day # relative to all sessions for each mouse? are sessions D1 and D2 the first two days in Late in Learning sessions? Are these neurons a subset of the neurons recorded during the conditioning session and if so, were they in recorded in more ventral regions of dmPFC compared to Early in Learning sessions if the lens was advanced from dorsal to ventral along dmPFC? Could there be differences in neural processing of appetitive cues in dorsal vs. ventral Cg1?

5. One major advantage of 2-photon calcium imaging is the ability to measure calcium dynamics between neurons that are recorded simultaneously (i.e. measured within the same FOV). It would be interesting perform a cross-correlation analysis to reveal if there are any significant interactions between neurons within the same FOV, determine whether they are from the same cluster, whether cells active early in learning interact with cells that are active late in learning, and whether these cross-correlations remain stable after learning.

6. Looking at activity across early vs late in learning, it was found that Cluster 1 was stable but Clusters 2-5 were not on the basis of responding to CSs. How did Clusters 2-5 change as a function of learning? Were they different based on reward delivery or licking behavior? Additional analyses are needed to strengthen this finding.

7. As it reads, the authors discuss their findings in relation to whether they agree with other studies and tools needed to answer questions about the role of specific cell types in prefrontal circuits for appetitive discrimination tasks. To strengthen the importance of this study, further discussion is needed that includes more interpretation of the data. Doing additional analyses will provide more findings to interpret, so the reader has a better grasp of the importance of this study.

8. In figure 3, CDF is undefined. Please define.

Reviewer #3:

In this study, Otis and colleagues evaluated the neural dynamics in the PFC governing reward learning, particularly those occurring during Pavlovian cue-sucrose associations. In particular, the study characterizes five unique neuronal ensembles exhibiting complex response patterns that seem to be relevant for the encoding of reward-predictive cues, the reward itself, and/or reward-related behavioral responses. The study also shows that the activity of these neuronal ensembles decodes behavioral variables better than chance. Interestingly, the study also shows that the activity of these neuronal ensembles during early stages of learning predicts their activity profile during late stages of learning, which remain stable afterwards.

The following list represents other strengths of the study.

1. New insights are revealed on neurophysiological mechanisms in the PFC governing cue-reward learning, using an in-vivo technique that provides great anatomical resolution, 2-photon calcium imaging.

2. This study double downs on the validity and power of head-fixed preparations to evaluate neural dynamics and their relationship to behavioral output.

3. Computational and statistical analyses are used to disentangle neuron-to-neuron variability, and to cluster neurons into distinct categories based on their response patterns.

4. In addition, throughout the study, authors describe complex methods in easy-to-understand language and illustrations to facilitate understanding of otherwise complex neurophysiological datasets and analyses.

Despite these strengths, this study could still be improved in a couple of aspects to better support the main claims and conclusions.

1. The initial analysis to cluster neurons together based on their activity patterns during the task produced somewhat confusing results in which for instance neurons exhibiting either excitatory or inhibitory responses to certain task events (e.g., either CS+, CS-, reward, or licks) were clustered together.

2. In addition, this study can greatly benefit from additional experiments (e.g., optogenetics) to manipulate neural activity during certain task epochs to test the importance of the observed activity patterns.

In general, I feel enthusiastic with the prospect of publication for this article. Though, I have several concerns that require further attention and revision to improve the overall impact of the study. As it stands, I believe this study is not yet ready for publication in eLife. But if my concerns are properly addressed with substantial revisions, I will feel even more enthusiastic to consider this paper for publication in eLife.

In the list below highlights some issues, confusions, or suggestions for additional analyses or experiments with the hopes to improve the overall quality of the study.

1. Results from the initial analysis to separate neurons into distinct neuronal ensembles are confusing. While this analysis (PCA) revealed five "unique" neuronal ensembles that supposedly encode specialized information during cue-reward learning, it is quite confusing that within-cluster responses are still very heterogeneous. For example, as shown in the Figure 1H heatmaps, within-cluster responses varied a lot across and included excitatory responses (in purple), inhibitory responses (in green), and weak responses (colors in between) within each cluster. There is also heterogeneity in the temporal profile of the responses within each cluster. How or why did neurons exhibiting excitatory and inhibitory responses get clustered together? And why did neurons exhibiting very fast responses get clustered together with neurons exhibiting slower responses? What am I missing here? Perhaps the authors could try a different clustering method (e.g., hierarchical analysis) to either confirm their clusters or potentially reveal more homogenous clusters. After all, in theory there could be many more neuronal clusters due to the many experimental variables analyzed (CS+, CS-, sucrose, sucrose omission, licks), all the possible ways neurons could respond (i.e., excitation, inhibition, no response, fast response, delayed response), and all possible response combinations (e.g., excitation to the cue, but inhibition to sucrose, etc).

2. Figure 1J summarizes the average within-cluster response patterns in PSTH form. Keeping in mind my first issue above, these PSTHs then seem misleading. For instance, the average PSTHs for Cluster-2 shows selective excitatory responses to the CS+. Yet, heterogeneity can be appreciated in the heatmaps in Figure 1H, with even some neurons exhibiting inhibitory responses to the CS-. Again, the authors should consider reinforcing or revising these results using a different clustering analysis.

3. In Figure 4, authors attempted to evaluate the evolution of response patterns in the distinct neuronal ensembles across learning. They did so by comparing ensemble activity during early versus late learning sessions. While significant Pearson correlations were detected for most ensembles during CS+ and CS-, I am not convinced that this is the best analysis to explore the evolution of neuronal activity patterns across learning. These results may just indicate that activity patterns may have developed very rapidly early in training, even before significant learning was observed at the behavioral level. To overcome this, authors could instead compare the magnitude of responses across learning sessions, or even at different segments within the early sessions (e.g., first 10 trials, versus 10 subsequent trials, and so on) to better explore whether the magnitude of responses is amplified as training progresses.

4. Caution is recommended for the type of t-test used in some analyses. For example, an independent t-test was used to compare cue discrimination scores between two days in the same subset of animals. Should this rather be a paired sample t-test?

5. Finally, all findings in this study are of correlative nature. Thus, additional experiments are needed to reinforce some of the claims raised in the study. For instance, the last sentence in the abstract says – "Our results characterize the complex dmPFC neuronal ensemble dynamics that relay learning-dependent signals for prediction of reward availability and initiation of conditioned reward seeking". If this is true, then manipulations of neural activity during certain epochs should produce significant changes in behavioral responses. A potential additional experiment could then be optogenetic-mediated inhibition during the CS+ to see whether lick rates are impaired. While this experiment could be performed in a non-ensemble-specific manner (i.e., optogenetic inhibition of all excitatory neurons in the area), it would be even better if the microscope used by the authors has holographic stimulation capabilities to selectively manipulate particular ensembles based on their response pattern. This is consistent with the last suggestion by the authors towards the end of the discussion ("functionally targeting each neuronal ensemble independently…").

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Specialized coding patterns among dorsomedial prefrontal neuronal ensembles predict conditioned reward seeking" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Kate Wassum as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Maria M Diehl (Reviewer #2); Anthony Burgos-robles (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential Revisions:

As you will find from the evaluation summaries provided by each of the reviewers below, the general consensus is that the revised manuscript has addressed most of the critiques raised after the initial submission. However, Reviewer #1 has raised a few remaining points that we feel must be addressed for the manuscript to be considered appropriate for publication in eLife.

Specifically, we deem essential that in addition to authors current graphics on the relative distance of clustered neurons from the GRIN lens, they provide a quantitative analysis of how this distance influences clustering or relation to behavior. For additional details on the requested analyses please refer to the comments from Reviewer #1.

We also ask that when referring to the behavioral procedure the authors adopt terminology that most accurately represent the conditions surrounding behavioral tests (second point from Reviewer #1).

If you have not already done so, please ensure your manuscript complies with the eLife policies for statistical reporting: https://reviewer.elifesciences.org/author-guide/full "Report exact p-values wherever possible alongside the summary statistics and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05."

Please include a key resource table.

Reviewer #1:

The authors were overall responsive to critiques and several of the issues raised in the previous reviews have been addressed. However, some concerns remain, the first of which is still requires additional analysis prior to publication.

1. The most significant remaining issue is the potential effect of differential SNR across imaging planes due to the GRIN lens properties. The fact that clusters show differential patterns and not only differences in amplitude does not negate the potential impact of SNR on the clustering – the ability to detect a differential pattern of responses between neurons is dependent on sufficient SNR, which is evidenced directly in the dataset by the fact that some of the clusters are defined by a lack of response. The authors have made great improvements on dealing with this issue from the original submission but given that essentially all the claims in the manuscript are based on the clustering analyses some quantitative assessment should be provided.

I appreciate the authors caution in using relative measurements to estimate the relative distance of clustered neurons from the GRIN lens – this is the most appropriate way to begin to approach the issue. However, the estimations are only graphically displayed, without quantitative analysis of their influence on clustering or relation to behavior, and the visualization of the data in figure 1 S5 makes it difficult to discern if there are topographical effects due to the number of overlapping points. In Figure 1 Sup 5B, how many neurons are in each line across the D/V axis? Is there a correlation between estimated location and probability of cluster membership? This is critical for determining if there is an influence of imaging plane on the clustering analysis. A complimentary approach would be to subsample and perform the clustering analysis only from a subset of DV planes at a time and determine reproducibility of cluster membership and their relationships with behavior. This is most concerning for the interpretation of Figure 2, where differential number of neurons sampled from each plane across animals could easily produce spurious correlations that reflect sampling bias rather than biological relationships.

2. Regarding the use of concurrent thirst and sucrose to motivate behavior, while it is true that head-fixed procedures often include water deprivation, these procedures were developed to motivate engagement in sensory processing tasks, not to analyze the effects of the reward itself as in the current manuscript. This is highlighted in both of the citations provided by the authors (other than their previous work) – the Goldstein et al., reference also goes on to show that how the deprivation is performed (e.g. water vs food) can dramatically impact the resulting reward-conditioned behaviors. This is not necessarily an inherent flaw in the study, but with the current wording/claims it becomes an issue.

For example, the authors refer to the behavioral procedure as 'Pavlovian sucrose conditioning' throughout – would the conditioned response (anticipatory licking) still occur if only water was delivered? Given that mice typically drink ~4 mL per day and only ~1mL is provided outside of the behavioral task, a strong argument can be made from the literature that the fluid has a much greater reinforcing/conditioning strength than the sucrose itself. I don't see any utility to empirically testing this, but given that the goal of the study is to examine conditioned reward seeking at the very least accurate terminology should be used throughout (e.g. Pavlovian conditioned licking or similar). To facilitate integration with the literature it would also be useful to add a discussion point noting that this protocol is likely to influence sucrose palatability (e.g. PMID: 16248727) as well as magnitude and nature of conditioned responses (e.g. PMID: 26913541 and 16812301).

3. The authors should clarify in the methods when the homecage water was provided in relation to behavioral testing, as well as provide an estimate of the range of total fluid and sucrose consumed in a typical session.

https://doi.org/10.7554/eLife.65764.sa1

Author response

Essential Revisions:

The reviewers agree that the authors' identification of dmPFC neuronal ensembles with heterogeneous coding patterns offers important insight about the neurophysiological mechanisms governing cue-reward learning. However, as independently outlined below by each one of the reviewers, there are multiple aspects of the study that must be strengthened before the paper can be considered for publication in eLife. With the exception of the optogenetic behavioral manipulations requested by Reviewer # 3, we consider that all other concerns raised by the reviewers must be addressed in full. Specifically, the authors must address:

1) All technical concerns regarding the imaging technique that were raised by Reviewer # 1.

We now provide substantial new experimental data and analyses to address concerns related to imaging techniques as raised by Reviewer #1, as can be found in new figures:

Figure 1 – Supplement 2

Figure 1 – Supplement 5

Figure 1 – Video 1

Figure 4 – Supplement 1

Figure 6 – Supplement 1

Additionally, we provide clarifications throughout the methods, as described point-by-point below, to address the methodological concerns. It should be noted that we are using techniques (both imaging and behavioral) that are well characterized in the field, as we now indicate through extensive referencing.

2) All statistical and data analysis concerns raised by Reviewers #1, 2 and 3.

We have addressed all statistical and data analysis concerns. In addition to the figures listed above, we have added the following new figures to address these concerns:

Figure 1 – Supplement 1,3-5

Figure 4C, 4D

Figure 4 – Supplement 2

Figure 5

3) Additional clarifications of methods and analyses, and an improved discussion as suggested by the reviewers.

We have clarified our methods, analyses, and have broadened our discussion as described point-by-point below.

4) The concerns about the behavioral design raised by Reviewer #1.

We have addressed these concerns in the methods and discussion, which includes referencing of papers from other labs that were integral in the development of the head-fixed behavioral assay described in the manuscript. It should be noted that the same behavioral assay and associated parameters have been used extensively in the field, including by the lead PI (Otis et al., 2017 Nature; Otis et al., 2019 Neuron; Namboodiri et al., 2019 Nature Neuroscience), due to its power for integrating two-photon imaging with behavioral models of associative learning.

5) Due to the lack of causality, revise the text to soften the language a bit in some of the sentences describing the interpretation.

We have softened the language of the manuscript as appropriately suggested.

Reviewer #1:

The manuscript from Grant et al., explores heterogeneity in coding patterns of mPFC pyramidal neurons during reward learning. The manuscript addresses a critical question in cortex and neuroscience in general – how do neuronal coding patterns lead to behavioral outputs and learned behaviors? While the manuscript takes a technically innovative approach there are multiple issues with the behavioral design, imaging, and interpretation. These issues are addressable, and the manuscript has potential for high impact in the field, but to support the current conclusions would require significant additional analysis and experimentation. Issues are listed below:

1. There are major issues with the imaging methodologies, particularly with the using multiple imaging planes in each animal and with the longitudinal co-registration. Regarding the FOVs, the authors report that each imaging plane was separated by 50uM. However, since GRIN lenses display non-linear ray transformations in both the lateral and axial planes, movement of the external objective in 50uM steps cannot be assumed to produce a 50uM change in imaging plane. Indeed, in the representative images the same vasculature can be seen in multiple planes, and the veins in this area often smaller than 50uM. Though it is difficult to discern, it appears that the same cell constellations appear in multiple planes in the representatives.

The point that cells could be visualized in multiple FOVs is a valid concern, and something that we addressed rigorously prior to performing our experiments (using practices based on the lead PI’s previous studies using microendoscopic GRIN lenses or cannula for deep brain 2-photon imaging and longitudinal cell tracking; see Otis et al., 2017 Nature; Namboodiri et al., 2019 Nature Neuroscience; Otis et al., 2019 Neuron; McHenry et al., 2017 Nature Neuroscience; Rossi et al., 2019 Science). We agree that the non-linear ray transformations through microendoscopic GRIN lenses make it difficult to know exactly the distance we are traveling in the X, Y, and Z imaging planes (which we now estimate in Figure 1 – Supplement 5). This is especially true for deeper fields of view (FOVs), which will have poorer SNR and thus more cells from surrounding FOVs could unfortunately be included in the imaging field. As such, we travel at a minimum 50 microns (objective movement) between each FOV (the number generally increases with greater depth). We now describe our protocol for distinct cell visualization in more detail in the methods (page 20-21):

“Each FOV was separated by at least 50 μm of objective movement in the Z-plane to avoid visualization of the same cells in multiple FOVs. Due to non-linear ray transformation introduced by the GRIN lens, this was especially important when imaging deeper FOVs. To ensure there was no signal bleed-through from superficial FOVs, a full cell layer was visualized between each FOV. Since neurons are ~20 µm in diameter, visualization of 3 layers (two imaging planes and an inbetween plane) lead to roughly 50 µm of z-movement and isolation of unique cell layers (Figure 1 – Supplement 2).”

Additionally, we include a z-stack video (new Figure 1 – Video 1), which pauses at distinct FOVs to transparently demonstrate that the FOVs under study included non-overlapping neurons. Finally, we include a supplemental figure (new Figure 1 –Supplemental Figure 2) wherein we show three FOVs separated along the z-axis overlaid in separate colors, to better visualize the distinct cell constellations between imaging planes. Regarding the vasculature, although we agree that vessels can be small within mPFC, as seen in Figure 1 the capillaries are >3x the diameter of nearby neurons. Furthermore, the center of those capillaries is not in focus between FOVs, confirming that we are visualizing distinct imaging fields. Finally, we now clarify how distinct FOVs were identified and how ROIs were drawn to ensure identification of unique cellular layers (page 21):

“Care was taken to only assign regions of interest to visually distinct cells, and each region of interest was confirmed independently by an observer who was blind to the experimental conditions. In cases where neighboring cells or processes overlapped, regions of interest were drawn to exclude areas of overlap. The blind observer also ensured that cells were clearly in view (cells were not counted if the center-of-mass was not in focus) and that neurons were not visible in multiple FOVs (if they were, the imaging plane would not be used; n = 0). To do so, the blind observer examined z-stack videos to determine if the neurons between FOVs were independent (see Figure 1 – Video 1).”

2. Even if we are to accept that no cells were double counted, a more critical issue for the current claims of the manuscript is that collection efficiency, and therefore SNR of the recording, will be altered as a function of the distance of each neuron from the ideal focal plan of the implanted GRIN endoscope. This is a potentially critical flaw without significant additional analysis. For example, all of the clustering analysis could be highly biased by the number of neurons that were included from each imaging plane which is likely to vary from animal to animal. While this is always somewhat of a concern with GRIN imaging, because 3-4 imaging planes were used in each animal and that clustering analysis was performed on pooled data it is possible that difference in SNR across imaging planes is driving many of the effects in the manuscript.

The reviewer raises a credible concern, which we now address. First, it is worth noting SNR is unlikely to influence the clustering results shown in the manuscript, as all clusters show unique response patterns (see Figure 1G-I) rather than unique response amplitudes (only the latter would be accounted for by SNR differences. Second, we now show the estimated relative spatial location of each recorded neuron grouped by cell cluster (new Figure 1 – Supplement 5). Although the data only provide a gross estimation of relative locations that are not to scale), the figure does provide assurance that each cell cluster is represented regardless of imaging depth in our experiments (see panel B).

Due to the technical limitations of this spatial analysis (due to issues highlighted by the Reviewer), we do not make it a focus of the manuscript. Additionally, we highlight its limitations in both the methods and figure legend.

3. Regarding longitudinal registration, minimal methodological information is provided which is concerning given that this a notoriously difficult endeavor especially in dense recording such as these data. How were these data validated? Was the data set scored by a second experimenter for cross validation? What percent of neurons were tracked? Was any network analysis of cell location used to verify results?

We agree, this is an excellent point that we first address with clarification in the methods section (see page 25):

“Cell tracking was performed by a student blinded to experimental conditions to reduce the potential for experimenter-related biases. Cells were identified based on relative position and morphology across days, with conservative longitudinal tracking to prevent between-cell comparisons. Following initial cell tracking, a second experimenter confirmed accurate day-to-day cellular co-registration for cross validation.”

Second, we provide new data and new analyses to highlight the accuracy of our longitudinal coregistration across days. Specifically, we provide shuffled datasets wherein we shuffled the tracking labels for neurons within each FOV (but not between FOVs). This shuffling analysis abolished the strong correlations found when tracking neurons from early to late in learning behavioral sessions (see new Figure 4 – Supplement 1) as well as across days in late in learning (maintenance) behavioral datasets (see new Figure 6 – Supplement 1). These new data provide strong evidence that we are indeed tracking the same neurons across days in Figures 4 and 6, and that the correlated activity patterns across days are not related to general correlations in activity between neurons in the same FOVs.

4. While the issues with the imaging are critical to address, it is likely that in depth analysis could resolve the problems without the need for additional experiments. However, there is also some problems with the behavioral design – these will either require additional experiments or require that the claims of the manuscript be altered. All of the changes in mPFC dynamics that occur across the behavioral task are claimed to be related to reward learning, but there are several processes that are not parsed in the task design. For example, would some of these changes happen with habituation? Would some of these changes happen with lick rate, volume consumed? None of those are dependent on associative learning and could just as strongly predict the changes that are seen in the dataset.

We have softened the language of the text and consider alternative ideas as alluded to by the reviewer and thank them for pointing this out. Altogether, although there are several lines of evidence which posit that the measured adaptations in activity are related to associative learning (habituation is of course another type of non-associative learning), we agree that other mechanisms could be involved as described in our new discussion paragraph (page 15-16):

“One caveat to our study is the possibility that observed changes in neuronal activity could be driven by factors other than associative learning, such as nonassociative learning (e.g., habituation) or feeding in general. However, there are several lines of evidence that suggest otherwise. First, cluster-specific responses or adaptations in activity across learning were generally distinct for CS+ and CS- trials (see Figure 4), suggesting that the observed adaptations are related to cuereward associative information. Second, cue discrimination, lick, and reward decoding improved over time, despite mice consuming the reward both before and after learning (see peristimulus time histograms; Figure 1C). Finally, previous evidence in the same behavioral task reveals that both excitatory and inhibitory responses among subpopulations of dmPFC pyramidal neurons are critical for cue-reward associative learning and cue-driven licking but not licking alone (Otis et al., 2017; Otis et al., 2019). Overall, evidence suggests that identified adaptations in dmPFC activity dynamics are likely related to cue-reward associative learning. However, the possibility that there are components not related to associative learning should certainly be considered.”

5. What is the justification for using water restriction combined with a sucrose solution in water as a reinforcer? Given that sucrose functions as a reinforcer without water restriction and that water functions as reinforcer under water deprived conditions, it is unclear whether the water or the sucrose is functioning as the primary reinforcer.

The justification for using water restriction along with the water + sucrose reward was to 1) facilitate appetitive learning and 2) increase the number of trials wherein the mice would perform for the reward while head restrained (through anticipatory licking). This has been characterized by Karel Svoboda’s lab as well as others (Guo et al., 2014; Goldstein et al., 2018), and has become standard practice for many of the PI’s papers within Dr. Garret Stuber’s laboratory (Otis et al., 2017; Otis et al., 2019; Namboodiri et al., 2019; Rossi et al., 2019). We now discuss the rationale for using water restriction in combination with liquid sucrose in the methods section (page 19):

“Mild water restriction facilitates appetitive learning in head-restrained mice, particularly when the reinforcer is sucrose mixed in water (Guo et al., 2014). Additionally, mild water restriction plus head restraint results in minimal signs of distress while allowing simultaneous two-photon imaging across many trials wherein a behavioral task is repeatedly performed (Guo et al., 2014; Goldstein et al., 2018; Otis et al., 2017; Otis et al., 2019; Namboodiri et al., 2019). Thus, we used water restriction in combination with Pavlovian conditioning for a liquid sucrose reward to study appetitive learning in mice…”

Reviewer #2:

1. Behavioral sessions were classified as either "early in learning" or "late in learning" and are defined based on the animal's performance (using auROC) across multiple sessions of the sucrose conditioning task. The authors perform an independent t-test comparing performance in early vs. late in learning sessions; however, these 2 groups of sessions were pre-defined by the animal performance itself. Therefore, it is inappropriate to use a statistical test since the periods of early and late were selected based on the behavioral data (i.e. doing a statistical test on data that was pre-selected). A statistical test is not needed here, but the authors should emphasize that the number of early vs late in learning sessions were unique to each animal and selected based on their performance (this is only mentioned in the methods, but authors should consider reiterating this point in the results). As a reader, it was very difficult to understand how early and late in learning was defined and I had to go back and read the methods multiple times and look through the results for clarification. I initially thought early vs. late in learning referred to early trials vs. late trials within a session. If early and late was defined based on trial number instead of auROC across sessions, are the behavioral results similar to what was reported? On average, how many sessions did it take for each mouse to reach criteria?

We agree and thank the reviewer for catching that a t-test test is not appropriate considering that we used pre-determined testing criteria to include the data itself. Thus, we have removed this analysis from the results and figure/caption. Additionally, we clarify that ‘early-inlearning’ and ‘late-in-learning’ data were collected during separate behavioral sessions on separate days throughout the manuscript (both in the methods and results). Additionally, we show data for each animal illustrating learning performance across days, including the criteria for defining early and late in learning behavioral sessions (new Figure 1 – Supplement 1). Finally, we also now evaluate both behavioral and neuronal response adaptations within early-in-learning and late-in-learning behavioral sessions (across trials rather than across sessions). Overall, these new data reveal quite homogenous trial-by-trial behavioral and neuronal response patterns within sessions for each cluster (new Figure 5), rather than within-session adaptations as one might expect if learning occurred within a single session.

2. Because I was confused about early vs late in learning being within session vs. across session, I was also confused about the data analyzed in Figure 4. How many sessions were in early vs late? Was the miniscope lens advanced after each conditioning session? Was there a subset or a different set of neurons that were recorded from across multiple sessions/days? Also, why do the traces in Figure 1I look very identical to the traces in Figure 4B (lower)? Are the cells analyzed in Figure 4 a subset of those shown in Figure 1? Please clarify. I think it would also be helpful to use the same terms consistently throughout the text when referring to the conditioning sessions and clearly state at the beginning of the results that one session was recorded per day and the lens was advanced to a new FOV (if that was the case) – sometimes sessions are referred to as FOVs or different days.

Thank you for these observations, and our apologies for any confusion. For clarity, we will respond to each point individually:

How many sessions were in early vs late?

We now indicate this in the results (page 5):

“A single imaging plane (FOV) was selected during each day of training, resulting in 28 FOVs recorded early in learning (10 mice; 28 FOVs, 2092 neurons) and 21 FOVs late in learning (7 mice, 21 FOVs; 1511 neurons; 3 mice did not reach late in learning due to headcap issues)”

Was the miniscope lens advanced after each conditioning session?

We used a moveable two-photon objective to capture fields of view ranging from 0-300 µm beneath the GRIN lens. We now address this on page 20:

“Fields of view (FOVs) were visualized from 0-300 µm beneath the GRIN lens and were selected before “early” behavioral imaging sessions. Each FOV was separated by at least 50 μm of objective movement in the Z-plane to avoid visualization of the same cells in multiple FOVs”

Was the miniscope lens advanced after each conditioning session? Was there a subset or a different set of neurons that were recorded from across multiple sessions/days?

Yes, we were able to visually track only a subset of neurons across learning, from early to late in learning behavioral sessions. Thus, Figure 1l and 4a are identical because they are the subset of neurons that were tracked across learning, as the Reviewer has indicated. We apologize for the lack of clarity on this issue, and now provide further clarification throughout the methods and results (e.g., page 10):

“Two-photon microscopy enables visual tracking of single, virally labeled neurons across days (Namboodiri et al., 2019; Otis et al., 2017). Thus, we were able to track a subset of the above dmPFC excitatory output neurons from early to late in learning behavioral sessions (n = 5 mice; 9 FOVs; 416 neurons) to evaluate neuronal response evolution across learning”.

I think it would also be helpful to use the same terms consistently throughout the text when referring to the conditioning sessions and clearly state at the beginning of the results that one session was recorded per day and the lens was advanced to a new FOV (if that was the case) – sometimes sessions are referred to as FOVs or different days.

We apologize for the general lack of consistency and agree that this requires adjustments. We have therefore clarified throughout the manuscript as suggested such that we refer to recordings as early in learning or late in learning. Furthermore, we provide a figure to show that early and late in learning sessions are on different days of training (Figure 1 – Supplement 1). Finally, we now clearly state in the results that one session was recorded per day. For example, on page 20:

“Mice readily acquired this task across sessions (one session per day; see Figure 1 – Supplement 1), showing conditioned licking behavior between the CS+ offset and reward delivery (trace interval), but not the CS- offset and equivalent no reward epoch during sessions on later days (deemed ‘late in learning; Figure 1CD).”

3. PFC-PVT neurons are located in lateral layers of PFC, whereas PFC-NAc are located in more medial layers within PFC. Based on this anatomical distinction, it would be interesting to determine the location of the 5 different cluster ensembles in the layers of PFC, which might suggest cells in particular clusters project to PVT or NAc. For example, are Cluster 5 neurons largely located in lateral layers of dmPFC based on their similarity in neuronal activity to PFC-PVT neurons, whereas neurons from other Clusters are located in medial layers of PFC? This analysis would provide more evidence that Cluster 5 neurons in lateral dmPFC are likely projecting to PVT and show the inhibitory activity profile, whereas neurons from different Clusters in medial dmPFC are likely projecting to Nac and show an excitatory activity profile. This analysis would also provide more concrete interpretation of the data in the Discussion section.

We agree, identification of the anatomical location of each neuron (and therefore cluster) within dmPFC would provide insight into layer-specific activity dynamics, including those that may control behavior. Thus, we have done our best to approximate the relative position of each neuron, grouped by cluster, within dmPFC (see Figure 1 – Supplement 5). These data show the vast heterogeneity in dmPFC, such that each cluster is represented across axes in our recordings.

It should be noted that there are major caveats to this cell mapping analysis due to methodological challenges that we could not overcome, significant enough that we discuss them up front in the methods and in the supplementary figure legend. Additionally, we do not focus on this data in the main body of the manuscript, and only briefly refer to the data in the Results section. Broadly speaking, it is extremely difficult (if not impossible, at least for us) to determine if neurons were in specific layer of dmPFC due to minor deviations in GRIN lens angle, placement (which are difficult to identify post-mortem as the tissue surrounding the lens is generally damaged when pulling out the lens), headcap orientation (a tilted headcap changes how the light focuses beneath the lens), and due to non-linear ray trasformations through the lens itself (brought up by Reviewer #1). Additionally, because dmPFC is not as well layered as other cortices, it is very challenging to see the layers in the first place. Thus, we were hesitant to include these data in the manuscript, as they could be interpreted as defined locations rather than extremely gross estimates of relative locations. We would like to hear more from the Reviewer(s) and Editor as to whether they think this data is worth including as is, or if it could rather be misleading.

4. In Figure 5, D1 and D2 are denoted to indicate dmPFC activity "across days after learning (lines 621-622). Which conditioning sessions do these refer to – which session/day # relative to all sessions for each mouse? are sessions D1 and D2 the first two days in Late in Learning sessions? Are these neurons a subset of the neurons recorded during the conditioning session and if so, were they in recorded in more ventral regions of dmPFC compared to Early in Learning sessions if the lens was advanced from dorsal to ventral along dmPFC? Could there be differences in neural processing of appetitive cues in dorsal vs. ventral Cg1?

The data shown in Figure 6 refer to two conditioning sessions, each recorded late in learning. The first (session D1) shows data from the first day the specified FOV was eligible to be recorded late in learning (based on the animal’s cue discrimination). The second (session D2) shows data from a second late in learning session, in which the same FOV was recorded from 48 hours or more after session D1. There is no lens movement as these are two-photon recordings not miniscope recordings. We have clarified this in both the methods and Results sections and thank the reviewer for pointing out that this was previously unclear (e.g., see page 25):

“Specific neurons could be reliably identified across days based on structure and relative position within each FOV (Figure 4A), allowing us to evaluate the evolution and maintenance of activity in single neurons across days. To do so, single cell tracking was performed from early to late in learning behavioral sessions to determine neuronal response evolution across learning (Figure 4). Additionally, single cell tracking was performed across two late in learning behavioral sessions each separated by a minimum of 48 hours to evaluate post-learning response adaptation or maintenance (Figure 6).”

5. One major advantage of 2-photon calcium imaging is the ability to measure calcium dynamics between neurons that are recorded simultaneously (i.e. measured within the same FOV). It would be interesting perform a cross-correlation analysis to reveal if there are any significant interactions between neurons within the same FOV, determine whether they are from the same cluster, whether cells active early in learning interact with cells that are active late in learning, and whether these cross-correlations remain stable after learning.

As suggested by the Reviewer, we have performed extensive cross correlation analyses on all our neurons early and late in learning, grouped by cluster. We show data from tracked neurons so that we could split neurons by cluster both early in learning and late in learning. Interestingly, neurons within the same cluster have little to no lag both early in learning and late in learning, whereas neurons in different clusters generally do have lag (see new Figure 4 – Supplement 2). These data suggest that there may be within-cluster similarities in activity even before appetitive learning has occurred.

While we agree with the Reviewer that this is an interesting analysis, one issue that limits its usefulness is the relatively slow dynamics of the calcium sensor and the speed of imaging itself. This prevents identification of spike-to-spike correlations in activity across neurons, and rather only allows a somewhat slow approximation of correlated activity patterns (which is a far cry from suggesting causal relationships in activity). Thus, while we believe the cross-correlation analysis that we have performed does add to the manuscript, we have not made it a primary focus due to this significant limitation which hinders data interpretation.

6. Looking at activity across early vs late in learning, it was found that Cluster 1 was stable but Clusters 2-5 were not on the basis of responding to CSs. How did Clusters 2-5 change as a function of learning? Were they different based on reward delivery or licking behavior? Additional analyses are needed to strengthen this finding.

Based on these questions we have added additional data and analysis to examine mean response evolution during CS+ and CS- trials (new Figure 4C-D). As can be seen in Figure 4, the only cluster that responds early in learning and does not adapt across learning is indeed Cluster 1. Differential encoding of these newly formed clusters is already a primary focus of Figure 3, wherein we show differential encoding patterns of each cluster for the cues, reward, and licking after learning. Thus, these newly developed responses are specific to the cues, reward, and licking as shown in Figure 3. Our reasoning for not redoing the decoding on tracked datasets is that such analysis requires extremely high power (generally >250 neurons) and given the low number of neurons that were difficult to track (currently 417 split into 5 clusters), we cannot sufficiently power the analysis for tracking datasets in Figure 4 (and find that it is unnecessary based on findings in Figure 3).

7. As it reads, the authors discuss their findings in relation to whether they agree with other studies and tools needed to answer questions about the role of specific cell types in prefrontal circuits for appetitive discrimination tasks. To strengthen the importance of this study, further discussion is needed that includes more interpretation of the data. Doing additional analyses will provide more findings to interpret, so the reader has a better grasp of the importance of this study.

In response to the excellent suggestions by the Reviewers, we have performed new analyses and show new data ( = new Figure 1 – Video 1; Figure 1 – Supplements 1, 2, 3, 4 and 5; Figure 4C-D; Figure 4 – Supplements 1 and 2; Figure 5; and Figure 6 – Supplement 1) to provide more findings for interpretation. We refer to these new data and analyses throughout the manuscript (see all new changes in red) and have lengthened the discussion considerably to adequately interpret and address the impact of these new findings. We present what we hope is a much-improved manuscript thanks to the reviewers.

8. In figure 3, CDF is undefined. Please define.

Thank you to the reviewer for catching this error, we now define CDF in the legend.

Reviewer #3:

In the list below highlights some issues, confusions, or suggestions for additional analyses or experiments with the hopes to improve the overall quality of the study.

1. Results from the initial analysis to separate neurons into distinct neuronal ensembles are confusing. While this analysis (PCA) revealed five "unique" neuronal ensembles that supposedly encode specialized information during cue-reward learning, it is quite confusing that within-cluster responses are still very heterogeneous. For example, as shown in the Figure 1H heatmaps, within-cluster responses varied a lot across and included excitatory responses (in purple), inhibitory responses (in green), and weak responses (colors in between) within each cluster. There is also heterogeneity in the temporal profile of the responses within each cluster. How or why did neurons exhibiting excitatory and inhibitory responses get clustered together? And why did neurons exhibiting very fast responses get clustered together with neurons exhibiting slower responses? What am I missing here? Perhaps the authors could try a different clustering method (e.g., hierarchical analysis) to either confirm their clusters or potentially reveal more homogenous clusters. After all, in theory there could be many more neuronal clusters due to the many experimental variables analyzed (CS+, CS-, sucrose, sucrose omission, licks), all the possible ways neurons could respond (i.e., excitation, inhibition, no response, fast response, delayed response), and all possible response combinations (e.g., excitation to the cue, but inhibition to sucrose, etc).

This is an understandable concern that we address through the addition of new data, analysis, and a more thorough description of the data. First, we used PCA to reduce the dimensionality of our dataset, followed by spectral clustering to identify unique activity patterns among neurons (i.e., “neuronal ensembles”). The PCs used to inform the clustering algorithm can now be found in new Figure 1 – Supplement 3A-B. This methodology is based on previous papers showing the suitability of spectral clustering for evaluating heterogenous and dynamic response patterns in cortical circuits (e.g., Namboodiri et al., 2019). The results from this PCA analysis hopefully provide a better framework for the readers to understand the dynamic principal components that were used for the clustering analyses, which is likely the reason that not all cells within each cluster look (by eye) like they obviously fit.

In the initial submission of our manuscript, we did not discuss the alternative clustering approaches that we took to analyze our data, which specifically included agglomerative (“hierarchical”) and k-means clustering. We now add these analyses and the resulting data to the manuscript, as was insightfully suggested by the Reviewer. Overall, our preliminary analysis showed that spectral clustering outperformed agglomerative and k-means algorithms, as evidenced by improved “fitting” of each neuron within its corresponding cluster (see example silhouette plots in new Figure 1 – Supplement 3C-E). Additionally, we show how each algorithm caused separation of neurons into clusters (see new Figure 1 – Supplement 4 for agglomerative and k-means clustering; Figure 1G-I shows spectral clustering results that were included in the initial submission). The idea that spectral clustering results in improved fitting of each neuron into its corresponding cell cluster is observable by looking at the resulting clusters in Figure 1 – Supplement 4. Specifically, although the clusters overall look similar across algorithms, agglomerative and k-means clustering both result in large groups of neurons in Cluster 2 that do not seem to fit well, as compared the spectral clustering dataset (as is shown in Figure 1G-I).

Although spectral clustering seems to be the best tool to separate dmPFC neuronal responses in our task into distinct groups, the Reviewer brings up a critical point that the clustering method is not perfect. Simply stated, we are trying to simplify a heterogenous pattern of activity into homogeneous response profiles, which although is possible to some degree does is unlikely to be possible in total. We now describe this in the discussion, as we feel it is an important point to consider not only for our dataset but for most neuronal clustering datasets (see page 15):

“An important consideration for our study is the existence of heterogeneity not only between experimenter-defined cell clusters, but also within these clusters. We chose to use spectral clustering to define unique response dynamics in dmPFC neurons, as opposed to other methods (e.g., agglomerative, k-means), based on preliminary analyses (see Figure 1 – Supplement 3 and 4). Although these analyses suggested that spectral clustering provides the best separation of dmPFC neurons into groups, likely due to its ability to separate dynamic and non-singular response features, these methods overall are far from perfect. Specifically, we are trying to simplify a heterogeneous pattern of activity into homogeneous groups, which given current methodologies is only possible to some degree. Further advancement of clustering and other computational methodologies should continue to improve our ability to detect and understand unique response patterns within complex brain circuits.”

2. Figure 1J summarizes the average within-cluster response patterns in PSTH form. Keeping in mind my first issue above, these PSTHs then seem misleading. For instance, the average PSTHs for Cluster-2 shows selective excitatory responses to the CS+. Yet, heterogeneity can be appreciated in the heatmaps in Figure 1H, with even some neurons exhibiting inhibitory responses to the CS-. Again, the authors should consider reinforcing or revising these results using a different clustering analysis.

This is an understandable point, but as discussed above the heterogeneity observed in each cluster is a result of how the clustering algorithm handles the non-singular response dynamics found in our neuronal dataset (see PCs in Figure 1 – Supplement 3), and its inability to separate the neurons into perfectly unique clusters. Alternative clustering methods that we have used have not resulted in improved clustering, but rather poorer performance (see Figure 1 – Supplement 3 and 4). Furthermore, the clusters identified here are encoding unique task features (Figure 3), confirming that the identified groups of neurons may be uniquely relevant for behavioral control and are separating appropriately.

Due to this within-cluster heterogeneity, we feel that showing the responses of all neurons for each cluster (in the peristimulus heat maps) and the average response across neurons for each cluster (in the peristimulus line plots) is critical for data transparency and description. Thus, we have chosen to include both within the main figure.

3. In Figure 4, authors attempted to evaluate the evolution of response patterns in the distinct neuronal ensembles across learning. They did so by comparing ensemble activity during early versus late learning sessions. While significant Pearson correlations were detected for most ensembles during CS+ and CS-, I am not convinced that this is the best analysis to explore the evolution of neuronal activity patterns across learning. These results may just indicate that activity patterns may have developed very rapidly early in training, even before significant learning was observed at the behavioral level. To overcome this, authors could instead compare the magnitude of responses across learning sessions, or even at different segments within the early sessions (e.g., first 10 trials, versus 10 subsequent trials, and so on) to better explore whether the magnitude of responses is amplified as training progresses.

We agree, this is an excellent point that we now address through the addition of new data.

First, as suggested by the Reviewer, we compare the magnitude of responses of responses across learning sessions (see new Figure 4C-D). Overall, the results demonstrate that the magnitude of responses during CS+ and/or CS- trials changed for Clusters 2-5, but not Cluster 1. These results confirm neuronal response adaptation for Clusters 2-5, but not Cluster 1, across learning.

Second, as also insightfully suggested by the Reviewer we compare the magnitude of behavioral and neuronal responses (separated by cluster) within sessions (first 10 trials versus last 10 trials; see new Figure 5). Results are almost identical from trials at the beginning of each session versus trials at the end of each session, suggesting that behavioral and neuronal responses may not rapidly adapt within sessions (in the task used here).

4. Caution is recommended for the type of t-test used in some analyses. For example, an independent t-test was used to compare cue discrimination scores between two days in the same subset of animals. Should this rather be a paired sample t-test?

This is a good point, and something similarly brought up by Reviewer #2. We have therefore removed the inappropriate independent t-test (Figure 1D; see response to Reviewer #2, Point #1) or changed the t-test to paired rather than independent as suggested here (Figure 6A).

5. Finally, all findings in this study are of correlative nature. Thus, additional experiments are needed to reinforce some of the claims raised in the study. For instance, the last sentence in the abstract says – "Our results characterize the complex dmPFC neuronal ensemble dynamics that relay learning-dependent signals for prediction of reward availability and initiation of conditioned reward seeking". If this is true, then manipulations of neural activity during certain epochs should produce significant changes in behavioral responses. A potential additional experiment could then be optogenetic-mediated inhibition during the CS+ to see whether lick rates are impaired. While this experiment could be performed in a non-ensemble-specific manner (i.e., optogenetic inhibition of all excitatory neurons in the area), it would be even better if the microscope used by the authors has holographic stimulation capabilities to selectively manipulate particular ensembles based on their response pattern. This is consistent with the last suggestion by the authors towards the end of the discussion ("functionally targeting each neuronal ensemble independently…").

Yes, we completely agree with the assessment of the Reviewer that this study is missing some sort of functional manipulation. However, we believe that the ensemble-specific nature of these manipulations is critical, since non-ensemble-specific manipulation could produce equal and opposing effects on behavior (Otis et al., 2017) and would poorly inform us of how distinct ensemble dynamics contribute to behavior. Despite the necessity of these experiments, current viral strategies for simultaneous recording and manipulation of neural activity are limited and have provided us with many technical challenges (though we have tried extensively based on viral approaches used by Rafael Yuste, Karel Svoboda, Karl Deisseroth, and others). As these technologies improve, we hope to be among the first groups to perform ensemble-specific manipulations in dmPFC through a GRIN lens during reward seeking. Given these technological challenges, to address the reviewer’s comment, we have softened the language regarding the impact of our findings and stress the need for improved neurotechnologies to undertake these experiments (e.g., see page 16).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Essential Revisions:

Reviewer #1:

The authors were overall responsive to critiques and several of the issues raised in the previous reviews have been addressed. However, some concerns remain, the first of which is still requires additional analysis prior to publication.

1. The most significant remaining issue is the potential effect of differential SNR across imaging planes due to the GRIN lens properties. The fact that clusters show differential patterns and not only differences in amplitude does not negate the potential impact of SNR on the clustering – the ability to detect a differential pattern of responses between neurons is dependent on sufficient SNR, which is evidenced directly in the dataset by the fact that some of the clusters are defined by a lack of response. The authors have made great improvements on dealing with this issue from the original submission but given that essentially all the claims in the manuscript are based on the clustering analyses some quantitative assessment should be provided.

I appreciate the authors caution in using relative measurements to estimate the relative distance of clustered neurons from the GRIN lens – this is the most appropriate way to begin to approach the issue. However, the estimations are only graphically displayed, without quantitative analysis of their influence on clustering or relation to behavior, and the visualization of the data in figure 1 S5 makes it difficult to discern if there are topographical effects due to the number of overlapping points. In Figure 1 Sup 5B, how many neurons are in each line across the D/V axis? Is there a correlation between estimated location and probability of cluster membership? This is critical for determining if there is an influence of imaging plane on the clustering analysis. A complimentary approach would be to subsample and perform the clustering analysis only from a subset of DV planes at a time and determine reproducibility of cluster membership and their relationships with behavior. This is most concerning for the interpretation of Figure 2, where differential number of neurons sampled from each plane across animals could easily produce spurious correlations that reflect sampling bias rather than biological relationships.

We have now quantified the number of neurons along the A/P, M/L, and D/V axis relative to the number of total neurons for each cluster. These data are displayed as histograms along each axis in Figure 1- S5 and provide further confirmation that there is a spread of each identified cluster along each axis. Although we appreciate the Reviewer’s concern that cells could be difficult to identify if not in superficial layers, this simply is not the case as can be visualized in Figure 1 – S5. Neurons in all clusters are in the middle and deeper layers.

2. Regarding the use of concurrent thirst and sucrose to motivate behavior, while it is true that head-fixed procedures often include water deprivation, these procedures were developed to motivate engagement in sensory processing tasks, not to analyze the effects of the reward itself as in the current manuscript. This is highlighted in both of the citations provided by the authors (other than their previous work) – the Goldstein et al., reference also goes on to show that how the deprivation is performed (e.g. water vs food) can dramatically impact the resulting reward-conditioned behaviors. This is not necessarily an inherent flaw in the study, but with the current wording/claims it becomes an issue.

For example, the authors refer to the behavioral procedure as 'Pavlovian sucrose conditioning' throughout – would the conditioned response (anticipatory licking) still occur if only water was delivered? Given that mice typically drink ~4 mL per day and only ~1mL is provided outside of the behavioral task, a strong argument can be made from the literature that the fluid has a much greater reinforcing/conditioning strength than the sucrose itself. I don't see any utility to empirically testing this, but given that the goal of the study is to examine conditioned reward seeking at the very least accurate terminology should be used throughout (e.g. Pavlovian conditioned licking or similar). To facilitate integration with the literature it would also be useful to add a discussion point noting that this protocol is likely to influence sucrose palatability (e.g. PMID: 16248727) as well as magnitude and nature of conditioned responses (e.g. PMID: 26913541 and 16812301).

We agree and thank the reviewer for making this distinction. We have now modified the text where applicable, for example to read “Pavlovian conditioned licking” as opposed to “Pavlovian sucrose conditioning” which seems more appropriate (for example, see page 4):

“Here we use in vivo two-photon calcium imaging to measure and longitudinally track the activity dynamics of single dmPFC excitatory output neurons throughout a Pavlovian conditioned licking task.”

In addition, we have added a discussion point as the reviewer rightfully suggested to address the issue of water restriction and palatability potentially influencing our results (see page 16):

“Another consideration to note is that the observed conditioned licking responses could be influenced by the effects of water restriction, as well as the ratio of sucrose/water used as a reward in our behavioral paradigm (Davey and Cleland, 1982; Harris and Thein, 2005; Tabbara et al., 2016). However, we did not test the influence of these variables in the current experiments.

3. The authors should clarify in the methods when the homecage water was provided in relation to behavioral testing, as well as provide an estimate of the range of total fluid and sucrose consumed in a typical session.

This information has been added to the Methods section (see page 19, 20):

“(12.5% sucrose in water, ~2.0 μl per droplet, ~0.1 mL total per session)”

“Mice received ~1 mL of water placed in a dish in their home cage after each conditioning session.”

https://doi.org/10.7554/eLife.65764.sa2

Article and author information

Author details

  1. Roger I Grant

    Department of Neuroscience, Medical University of South Carolina, Charleston, United States
    Contribution
    Conceptualization, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4609-5773
  2. Elizabeth M Doncheck

    Department of Neuroscience, Medical University of South Carolina, Charleston, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  3. Kelsey M Vollmer

    Department of Neuroscience, Medical University of South Carolina, Charleston, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  4. Kion T Winston

    Department of Neuroscience, Medical University of South Carolina, Charleston, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  5. Elizaveta V Romanova

    Department of Neuroscience, Medical University of South Carolina, Charleston, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  6. Preston N Siegler

    Department of Neuroscience, Medical University of South Carolina, Charleston, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  7. Heather Holman

    Department of Neuroscience, Medical University of South Carolina, Charleston, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  8. Christopher W Bowen

    Department of Neuroscience, Medical University of South Carolina, Charleston, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Investigation, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
  9. James M Otis

    1. Department of Neuroscience, Medical University of South Carolina, Charleston, United States
    2. Hollings Cancer Center, Medical University of South Carolina, Charleston, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Investigation, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    otis@musc.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0953-9283

Funding

National Institute on Drug Abuse (R01-DA051650)

  • James M Otis

Medical University of South Carolina

  • James M Otis

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors would like to thank Vijay MK Namboodiri and Garret D Stuber for creating and sharing clustering codes for imaging analysis.

Ethics

Animal experimentation: Experiments were performed in the dark phase and in accordance with the NIH Guide for the Care and Use of Laboratory Animals with approval from the Institutional Animal Care and Use Committee at the Medical University of South Carolina (Approval ID: IACUC-2018-00363; Renewed November 30, 2020).

Senior Editor

  1. Kate M Wassum, University of California, Los Angeles, United States

Reviewing Editor

  1. Mario Penzo, National Institute of Mental Health, United States

Reviewers

  1. Maria M Diehl, Kansas State University, United States
  2. Anthony Burgos-robles, University of Texas San Antonio, United States

Publication history

  1. Received: December 15, 2020
  2. Accepted: June 22, 2021
  3. Accepted Manuscript published: June 29, 2021 (version 1)
  4. Version of Record published: July 13, 2021 (version 2)

Copyright

© 2021, Grant et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,351
    Page views
  • 208
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Cell Biology
    2. Neuroscience
    Shahzad S Khan et al.
    Research Advance Updated

    Activating LRRK2 mutations cause Parkinson’s disease, and pathogenic LRRK2 kinase interferes with ciliogenesis. Previously, we showed that cholinergic interneurons of the dorsal striatum lose their cilia in R1441C LRRK2 mutant mice (Dhekne et al., 2018). Here, we show that cilia loss is seen as early as 10 weeks of age in these mice and also in two other mouse strains carrying the most common human G2019S LRRK2 mutation. Loss of the PPM1H phosphatase that is specific for LRRK2-phosphorylated Rab GTPases yields the same cilia loss phenotype seen in mice expressing pathogenic LRRK2 kinase, strongly supporting a connection between Rab GTPase phosphorylation and cilia loss. Moreover, astrocytes throughout the striatum show a ciliation defect in all LRRK2 and PPM1H mutant models examined. Hedgehog signaling requires cilia, and loss of cilia in LRRK2 mutant rodents correlates with dysregulation of Hedgehog signaling as monitored by in situ hybridization of Gli1 and Gdnf transcripts. Dopaminergic neurons of the substantia nigra secrete a Hedgehog signal that is sensed in the striatum to trigger neuroprotection; our data support a model in which LRRK2 and PPM1H mutant mice show altered responses to critical Hedgehog signals in the nigrostriatal pathway.

    1. Neuroscience
    Taisuke Miyazaki et al.
    Research Article Updated

    Ionotropic neurotransmitter receptors at postsynapses mediate fast synaptic transmission upon binding of the neurotransmitter. Post- and trans-synaptic mechanisms through cytosolic, membrane, and secreted proteins have been proposed to localize neurotransmitter receptors at postsynapses. However, it remains unknown which mechanism is crucial to maintain neurotransmitter receptors at postsynapses. In this study, we ablated excitatory or inhibitory neurons in adult mouse brains in a cell-autonomous manner. Unexpectedly, we found that excitatory AMPA receptors remain at the postsynaptic density upon ablation of excitatory presynaptic terminals. In contrast, inhibitory GABAA receptors required inhibitory presynaptic terminals for their postsynaptic localization. Consistent with this finding, ectopic expression at excitatory presynapses of neurexin-3 alpha, a putative trans-synaptic interactor with the native GABAA receptor complex, could recruit GABAA receptors to contacted postsynaptic sites. These results establish distinct mechanisms for the maintenance of excitatory and inhibitory postsynaptic receptors in the mature mammalian brain.