1. Neuroscience
Download icon

Interneuron-specific gamma synchronization indexes cue uncertainty and prediction errors in lateral prefrontal and anterior cingulate cortex

  1. Kianoush Banaie Boroujeni  Is a corresponding author
  2. Paul Tiesinga
  3. Thilo Womelsdorf  Is a corresponding author
  1. Department of Psychology, Vanderbilt University, United States
  2. Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Netherlands
  3. Department of Biology, Centre for Vision Research, York University, Canada
Research Article
  • Cited 1
  • Views 1,208
  • Annotations
Cite this article as: eLife 2021;10:e69111 doi: 10.7554/eLife.69111

Abstract

Inhibitory interneurons are believed to realize critical gating functions in cortical circuits, but it has been difficult to ascertain the content of gated information for well-characterized interneurons in primate cortex. Here, we address this question by characterizing putative interneurons in primate prefrontal and anterior cingulate cortex while monkeys engaged in attention demanding reversal learning. We find that subclasses of narrow spiking neurons have a relative suppressive effect on the local circuit indicating they are inhibitory interneurons. One of these interneuron subclasses showed prominent firing rate modulations and (35–45 Hz) gamma synchronous spiking during periods of uncertainty in both, lateral prefrontal cortex (LPFC) and anterior cingulate cortex (ACC). In LPFC, this interneuron subclass activated when the uncertainty of attention cues was resolved during flexible learning, whereas in ACC it fired and gamma-synchronized when outcomes were uncertain and prediction errors were high during learning. Computational modeling of this interneuron-specific gamma band activity in simple circuit motifs suggests it could reflect a soft winner-take-all gating of information having high degree of uncertainty. Together, these findings elucidate an electrophysiologically characterized interneuron subclass in the primate, that forms gamma synchronous networks in two different areas when resolving uncertainty during adaptive goal-directed behavior.

Introduction

Inhibitory interneurons in prefrontal cortex are frequently reported to be altered in neuropsychiatric diseases with debilitating consequences for cognitive functioning. Groups of fast spiking interneurons with basket cell or chandelier morphologies have consistently been found to be abnormal in individuals with schizophrenia and linked to dysfunctional working memory and reduced control of attention (Dienel and Lewis, 2019). Altered functioning of a non-fast spiking interneuron class is linked to reduced GABAergic tone in individuals with severe major depression (Levinson et al., 2010; Fee et al., 2017). These findings suggest that the circuit functions of different subtypes of interneurons in prefrontal cortices are important to regulate specific aspects of cognitive and affective functioning.

But it has remained a challenge to identify how individual interneuron subtypes support specific cognitive or affective functions in the nonhuman primate. For rodent prefrontal and anterior cingulate cortices, cells with distinguishable functions express differentially cholecystokinin (CCK), parvalbumin (PV), or somatostatin (SOM), amongst others (Roux and Buzsáki, 2015; Cardin, 2018). Prefrontal CCK expressing basket cells have been shown to impose inhibition that is required during the choice epoch, but not during the delay epoch of a working memory task (Nguyen et al., 2020). In contrast, retention of visual information during working memory delays has been shown to require activation specifically of PV+ expressing fast spiking interneurons (Lagler et al., 2016; Kamigaki and Dan, 2017; Nguyen et al., 2020). In the same prefrontal circuits, the PV+ neurons have also been associated with attentional orienting (Kim et al., 2016), shifting of attentional sets and response strategies during reward learning (Cho et al., 2015; Canetta et al., 2016; Cho et al., 2020), and with spatial reward choices (Lagler et al., 2016), among other functions (Pinto and Dan, 2015). Distinct from PV+, the group of somatostatin expressing neurons (SOM+) have been shown to be necessary during the initial encoding phase of a working memory task but not during the delay (Abbas et al., 2018), and in anterior cingulate cortex they activate specifically during the approach of reward sites (Kvitsiani et al., 2013; Urban-Ciecko and Barth, 2016). Taken together, these findings illustrate that rodent prefrontal cortex interneurons expressing PV, SOM, or CCK fulfill separable, unique roles at different processing stages during goal-directed task performance (Pinto and Dan, 2015; Lagler et al., 2016).

The rich insights into cell-specific circuit functions in rodent prefrontal cortices stand in stark contrast to the limited empirical data from primate prefrontal cortex. While there are recent advances using optogenetic tools for use in primates (Acker et al., 2016; Dimidschstein et al., 2016; Gong et al., 2020), most existing knowledge about cell-specific circuit functions are indirectly inferred from studies that distinguish only one group of putative interneurons that show narrow action potential spike width. Compared to broad spiking neurons the group of narrow spiking, putative interneurons in lateral prefrontal cortex have been found to more likely encode categorical information during working memory delays (Diester and Nieder, 2008), show stronger stimulus onset responses during cognitive control tasks (Johnston et al., 2009), stronger attentional modulation (Thiele et al., 2016), more location-specific encoding of task rules (Johnston et al., 2009), stronger reduction of firing selectivity for task irrelevant stimulus features (Hussar and Pasternak, 2009), stronger encoding of errors and loss (Shen et al., 2015; Sajad et al., 2019), more likely encoding of outcome history (Kawai et al., 2019), and stronger encoding of feature-specific reward prediction errors (Oemisch et al., 2019), amongst other unique firing characteristics (Constantinidis and Goldman-Rakic, 2002; Ardid et al., 2015; Rich and Wallis, 2017; Voloh and Womelsdorf, 2018; Torres-Gomez et al., 2020).

These summarized findings suggest that there are subtypes of narrow spiking neurons that are particularly important to regulate prefrontal circuit functions. But it is unclear whether these narrow spiking neurons are inhibitory interneurons and to which interneuron subclass they belong. Comparisons of protein expression with action potential spike width have shown for prefrontal cortex that > 95% of all PV+ and ~ 87% of all SOM + interneurons show narrow spike width (Ghaderi et al., 2018; Torres-Gomez et al., 2020), while narrow spikes are also known to occur in ~20% of VIP interneurons (Torres-Gomez et al., 2020) among other GABAergic neurons (Krimer et al., 2005; Zaitsev et al., 2009), and (at least in primate motor cortex) in a subgroup of pyramidal cells (Soares et al., 2017). In addition, electrophysiological characterization has shown at least three different types of firing patterns in narrow spiking neurons of monkeys during attention demanding tasks (Ardid et al., 2015; Dasilva et al., 2019; Trainito et al., 2019). Taken together, these insights raise the possibility that spike width and electrophysiology will allow identifying the interneuron subtypes that are particularly important for prefrontal cortex functions.

Here, we investigated this possibility by recording narrow spiking cells in nonhuman primate prefrontal and cingulate cortex during an attention demanding reversal learning task. We found that in both areas three narrow spiking neuron classes are well distinguished and show a suppressive influence on the local circuit activity compared to broad spiking neurons, supporting labeling them as inhibitory interneurons. Among these interneurons the same sub-type showed significant functional correlations in both ACC and LPFC, firing stronger to reward predictive cues when their predictability is still learned during the reversal (in LPFC), and firing stronger to outcomes when they are most unexpected during reversal (in ACC). Notably, in both, ACC and LPFC, these functions were evident in 35–45 Hz gamma rhythmic synchronization to the local field potential in the same interneuron subclass.

Results

We used a color-based reversal paradigm that required subjects to learn which of two colors were rewarded as described previously (Oemisch et al., 2019). The rewarded color reversed every ~30–40 trials. Two different colors were assigned to stimuli appearing randomly left and right to a central fixation point (Figure 1A). During the task the color information was presented independently from the up-/downward- direction of motion of the stimuli. The up-/downward direction instructed the saccade direction that animals had to show to a Go event in order to receive reward. Motion was thus the cue for an overt choice (with saccadic eye movements), while color was the cue for covert selective attention. Color was shown either before (as Feature-1) or after the motion onset (as Feature-2) (Figure 1B). Both animals took on average 7/7 (monkey H/K) trials to reach criterion performance, that is, they learned which color was rewarded within seven trials (Figure 1C). The asymptotic performance accuracy was 83/86% for monkey’s H/K (see Materials and methods).

Figure 1 with 2 supplements see all
Task paradigm and cell classification.

(A) Trials required animals to covertly attend one of two peripheral stimuli until a dimming (Go-event) instructed to make a saccade in the direction of the motion of the attended stimulus. During the trial, the two stimuli were initially static black/white and then either were colored first or started motion first. Following this feature 1 Onset the other feature (Feature two on) was added 0.5–0.9 s later. (B) The task reversed the color (red or green) that was rewarded over at least 30 trials. (C) Two monkeys learned through trial-and-error the reward-associated color as evident in increased accuracy choosing the rewarded stimulus (y-axis) over trials since reversal (x-axis). (D) Recorded areas (details in Figure 1—figure supplement 1). (E) Top: Average normalized action potential waveforms of recorded neurons were narrow (red) or broad (blue). Bottom: Inferred hyperpolarization ratio and repolarization duration distinguishes neurons. (F) Average spike-triggered multiunit modulation for narrow and broad spiking neurons (Errors are SE’s). Spiking neuron and MUA were from different electrodes. The bottom panel zooms into the ±20 ms around the spike time and shows the difference between neuron classes (in green). (G) The histogram of post-to-pre spike AUC ratios for narrow (red) and broad (blue) spiking neurons. (H) Average ratio of post- to pre-spike triggered MUA for narrow and broad cell classes in ACC (left) and in LPFC (right). Values < 0 indicate reduced post- versus pre-spike MUA modulation. Error bars are SE.

Characterizing narrow spiking neurons as inhibitory interneurons

During reversal performance, we recorded the activity of 329 single neurons in LPFC areas 46/9 and anterior area 8 (monkey H/K: 172/157) and 397 single neurons in dorsal ACC area 24 (monkey H/K: 213/184) (Figure 1D, Figure 1—figure supplement 1). The average action potential waveform shape of recorded neurons distinguished neurons with broad and narrow spikes similar to previous studies in LPFC and ACC (Gregoriou et al., 2012; Ardid et al., 2015; Westendorff et al., 2016; Dasilva et al., 2019; Oemisch et al., 2019; Figure 1E). Prior biophysical modeling has shown that the extracellular action potential waveform shape, including its duration, is directly related to transmembrane currents and the intracellularly measurable action potential shape and duration (Gold et al., 2006; Bean, 2007; Gold et al., 2007; Buzsáki et al., 2012). Based on this knowledge we quantified the extracellularly recorded spike duration of the inferred hyperpolarization rates and their inferred time-of-repolarizations (see Materials and methods, Figure 1—figure supplement 2A,B). These measures split narrow and broad spiking neurons into a bimodal distribution (calibrated Hartigan’s dip test for bimodality, p<0.001), which was better fit with two than one gaussian (Figure 1E, Bayesian information criterion for two and one gaussian fit: 4.0450, 4.8784, where a lower value indicates a better model). We found in LPFC 21% neurons had narrow spikes (n = 259 broad, n = 70 narrow cells) and in ACC 17% of neurons had narrow action potentials (n = 331 broad, n = 66 narrow cells).

To assess the excitatory or inhibitory identity of the broad and narrow spiking neuron classes (B- and N-type neurons), we estimated the power of multi-unit activity (MUA) in its vicinity (at different electrodes than the spiking neuron) around the time of spiking for each cell and tested how this spike-triggered MUA-power changed before versus after the cell fired a spike (see Materials and methods). This approach expects for an excitatory neuron to spike concomitant with neurons in the local population reflected in a symmetric rise and fall of MUA before and after its spike. In contrast, inhibitory neurons are expected to spike when MUA rises, but when the spike occurs, the spike should contribute to suppress the local MUA activity, which should be reflected in a faster drop in MUA activity after the spike occurred (Oemisch et al., 2015). We found that B-type cells showed on average a symmetric pre- to post- spike triggered MUA activity modulation indicative of excitatory participation with local activity (Figure 1F). In contrast, spikes of N-type cells were followed by a faster drop of MUA activity indicating an inhibitory influence on MUA (Figure 1F). The excitatory and inhibitory effects on local MUA activity were consistent across the population and significantly distinguished B- and N-type neurons (Figure 1G; MUA modulation index: [(post MUAspike - pre MUAspike)/pre MUAspike] for B- vs N-type cells, Wilcoxon test, p=0.001). This distinction was evident in ACC and in LPFC (Figure 1H; for the N-type the MUA modulation index was different from zero, Wilcoxon test, in ACC, p<0.001, and in LPFC, p=0.03; for B-type cells the difference was not sign.). These findings suggest narrow spiking cells contain mostly inhibitory interneurons (see Discussion).

Putative interneurons in prefrontal cortex index choices when choice probability is low

To discern how B- and N- type neurons encoded the learning of the rewarded color during reversal, we analyzed neuronal response modulation around color onset, which instructed animals to covertly shift attention to the stimulus with the reward predicting color. In addition to this color cue (acting as attention cue), we also analyzed activity around the motion onset that served as action cue. Its direction of motion indicated the saccade direction the animal had to elicit for receiving reward. This action cue could happen either 0.5–0.9 s. before or 0.5–0.9 s. after the color cue. Many neurons in LPFC selectively increased their firing to the color attention cue with no apparent modulation to the motion action cue (n = 71 cells with firing increases to the color but not motion cue) (for examples: Figure 2A,B). These neurons increased firing to the color onset when it was the first, or the second feature that was presented, but did not respond to the motion onset when it was shown as first or second feature (for more examples, Figure 2—figure supplement 1).

Figure 2 with 4 supplements see all
Firing rate modulation of narrow and broad spiking neurons to the color cue correlate with choice probability.

(A, B) Spike rasters for example neurons around the onset of feature-1 and feature-2 when feature-1 was color (magenta) or motion (green). Both neurons responded stronger to the color than the motion onset irrespective of whether it was shown as first or as second feature during a trial. (C) Narrow spiking neurons (red) in LPFC respond to the color onset when it occurred as feature-2 (upper panel), or as feature-1 (bottom panel). (D) Same as c for the ACC shows no or weak feature onset responses. (E) Firing rates of narrow spiking neurons (red) in LPFC correlate with the choice probability of the to be chosen stimulus (left). The average Rate x Choice Probability correlation in LPFC was significantly larger in narrow than in broad spiking neurons (right). (F) Same as e for ACC shows no significant correlations with choice probability. Source data 1 Correlation data and script for ploting panels E, and F.

We found that N-type neurons in LPFC change transiently their firing to the attention cue when it occurred either early or late relative to the action cue (significant increase within 25–275 ms post-cue for Feature 1 and within 50–250 ms post-cue for Feature 2, p<0.05 randomization statistics, n = 21 N-type cells with increases and seven with decreases to the color cue, Figure 2C). This attention cue-specific increase was absent in B-type neurons in LPFC (n.s., randomization statistics, n = 44 B-type cells with increases and n = 35 with decreases to the color cue, Figure 2C). In contrast to LPFC, ACC N- and B-type neurons did not show an on-response to the color cue (n = 36/6 B- and N- type cells with increases, respectively, and n = 31/12 B- and N- type cells with decreased firing, respectively, to the color cue, the total cell number included in this analysis for the B- and N- type was n = 216/50, respectively) (Figure 2D).

The N-type-specific response to the attention cue might carry information about the rewarded stimulus color or the rewarded stimulus location. We found that the proportion of neurons whose firing rate significantly distinguished rewarded and nonrewarded colors sharply increased for N-type cells after the onset of the color cue in LPFC proportion of color selective responses within 0–0.5 s. after cue, 18%; n = 10 of 54 N-type cells, randomization test p<0.05 within [175 575] ms after cue onset, but not in ACC (cells with significant information: 6%; n = 3 of 50 N-type cells, ns., randomization test within [300 700] ms after cue onset) (Figure 2—figure supplement 2A,B). Similar to the selectivity for the rewarded stimulus color N-type cells in LPFC (but not in ACC) showed significant encoding of the right versus left location of the rewarded stimulus (in LPFC: 22% with reward location information; n = 12 of 54 N-type cells, randomization test p<0.05 within [200 500] ms after cue onset; in ACC: 10% with reward location information; n = 5 of 50 N-type cells, n.s. randomization test) (Figure 2—figure supplement 2C,D).

The color-specific firing increase and the encoding of the rewarded color by N-type neurons in LPFC suggest they support reversal learning performance. We tested this by correlating their firing rates around the color cue onset with the trial-by-trial variation of the choice probability for choosing the stimulus with the rewarded color. Choice probability, p(choice), was calculated with a reinforcement learning model that learned to optimize choices based on reward prediction errors (see Equation 3 in Materials and methods and Oemisch et al., 2019). Choice probability was low (near ~0.5) early during learning and rose after each reversal to reach a plateau after around ~10 trials (Figure 1C, for example blocks, Figure 2—figure supplement 3A). We found that during the post-color onset time period 17% (n = 20 of 120) of B-type cells and 27% (n = 11 of 41) of N-type cells in LPFC significantly correlated their firing with p(choice), which was larger than expected by chance (binomial test B-type cells: p<0.001; N-type cells: p<0.001). On average, N-type cells in LPFC showed positive correlations (Pearson r = 0.068, Wilcoxon rank test, p=0.011), while B-type neurons showed on average no correlation (Wilcoxon rank test, p=0.20) (Figure 2E). The positive p(choice) correlations of N-type neurons in LPFC grew following color onset and remained significant for 0.7 s following color onset (N = 41 N-type neurons, randomization test, p<0.05 from 0 to 0.7 s post-cue, Figure 2E). N-type neurons in LPFC of both monkeys showed a similar pattern of response to the attention cue and positive correlation of firing rate with p(choice) (Figure 2—figure supplement 4A–C). Compared to LPFC, significantly less N-type cells in ACC correlated their firing with choice probability (6%, n = 2 of 33 in ACC, versus 27% in LPFC, X2-test for prop. difference, X2-stat = 5.45, p=0.019) and showed no p(choice) correlations over time (Wilcoxon rank test, p=0.49, n.s., Figure 2F).

Putative interneurons in anterior cingulate cortex index high reward prediction errors

Choice probabilities (p(choice)) increase during reversal learning when reward prediction errors (RPEs) of outcomes decrease, which was evident in an anticorrelation of (p(choice)) and RPE of r = −0.928 in our task (Figure 2—figure supplement 3A,B) with lower p(choice) (near ~0.5) and high RPE over multiple trials early in the reversal learning blocks when the animals adjusted to the newly rewarded color (Figure 2—figure supplement 3E,F). Prior studies have shown that RPEs are prevalently encoded in the ACC (Kennerley et al., 2011; Oemisch et al., 2019). We therefore reasoned that RPEs might preferentially be encoded by narrow spiking putative interneurons. First, we analyzed N- and B-type cell responses to the reward. In both, LPFC and ACC, N- and B-type cells on average increased firing after the reward onset (p<0.05, randomization test, n = 26 of 54 and 18 of 188 B- type cells with increases, respectively, and n = 14 of 54 N- type and 5 of 188 B-type cells with decreased firing in LPFC, and n = 30 of 50 N-type and 13 of 216 B- type cells with increases, respectively, and n = 19 of 50 and 8 of 216 B-type cells with decreased firing in ACC). However, the N- and B-type responses to the reward were not significantly different in ACC or LPFC (ns., randomization test, Figure 3A,B). We estimated trial-by-trial RPEs with the same reinforcement learning model that also provided p(choice) for the previous analysis. RPE is calculated as the difference of received outcomes R and expected value V of the chosen stimulus (see Materials and methods). We found that on average 23% of LPFC and 35% of ACC neurons showed significant firing rate correlations with RPE in the post-outcome epoch with only moderately and non-significantly more N-type than B-type neurons having significant rate-RPE correlations (n = 9 N-type neurons, n = 31 B-type neurons, X2-test; p=0.64 for LPFC; n = 15 N-type neurons, n = 47 B-type neurons, X2-test; p=0.83 for ACC; Figure 3C,D). However, time-resolved analysis of the strength of the average correlations revealed a significant positive firing x RPE correlation in the 0.2–0.6 s after reward onset for ACC N-type neurons, which was absent in LPFC (ACC, n = 43 N-type neurons, randomization test p<0.05; LPFC: n = 31 N-type neurons, no time bin with sign.; Figure 3E,F). In ACC, the positive correlation of N-type neurons firing rate and RPE was evident in both monkeys (Figure 2—figure supplement 4D).

Firing rate modulation to trial outcomes correlate with reward prediction errors.

(A, B) Narrow (red) and broad spiking neurons (blue) in LPFC (A) and ACC (B) on average activate to the reward outcome. (C, D) Proportion of narrow and broad spiking neurons in LPFC (C) and ACC (D) with significant firing rate X reward prediction error correlations in the [0 0.75] s after trial outcomes were received. (E, F) Time course of firing rate X reward prediction error correlations for narrow and broad spiking neurons in LPFC (E) and ACC (F) around the time of reward onset. Horizontal bar denotes time with significant correlations. Source data 1 Correlation data and script for ploting panels E, and F.

Classification of neural subtypes of putative interneurons

We next asked whether the narrow spiking, putative interneurons whose firing indexed relatively lower p(choice) in LPFC and relatively higher RPE in ACC are from the same electrophysiological cell type, or e-type (Markram et al., 2015; Gouwens et al., 2019). Prior studies have distinguished different narrow spiking e-types using the cells’ spike train pattern and spike waveform duration (Ardid et al., 2015; Dasilva et al., 2019; Trainito et al., 2019; Banaie Boroujeni et al., 2020b). We followed this approach using a cluster analysis to distinguish e-types based on spike waveform duration parameters (inferred hyperpolarization rate and time to 25% repolarization, Figure 1—figure supplement 2A,B), on whether their spike trains showed regular or variable interspike intervals (local variability ‘LV’, Figure 1—figure supplement 2D), or more or less variable firing relative to their mean interspike interval (coefficient of variation ‘CV’, Figure 1—figure supplement 2C). LV and CV are moderately correlated (r = 0.26, Figure 1—figure supplement 2E), with LV indexing the local similarity of adjacent interspike intervals, while CV is more reflective of the global variance of higher and lower firing periods (Shinomoto et al., 2009). We ran the k-means clustering algorithm on neurons in ACC and LPFC using variables mentioned above and their firing rate (details in Materials and methods). Clustering resulted in eight e-types (Figure 4A–C). Cluster boundaries were highly reliable (Figure 4—figure supplement 1). Moreover, the assignment of a cell to its class was statistically consistent, and reliably evident for cells from each monkey independently (Figure 4—figure supplement 2). Narrow spiking neurons fell into three e-types. The first narrow spiking N1 e-type (n = 18, 13% of narrow spiking neurons) showed high firing rates and highly regular spike trains (low LVs, mean LV 0.47, SE 0.05). The second N2 e-type (n = 27, 20% of narrow spiking neurons) showed on average Poisson spike train variability (LVs around 1) and the narrowest waveforms, and the N3 e-type (n = 91, 67% of all narrow spiking neurons) showed intermediate narrow waveform duration and regular firing (LV’s < 1, mean LV 0.84, SE 0.02) (Figure 4C). Neurons within an e-type showed similar feature characteristics irrespective of whether they were from ACC or LPFC. For example, N3 e-type neurons from ACC and in LPFC were indistinguishable in their firing and action potential characteristics (LVACC / LPFC = 0.79/0.88, ranksum-test, p=0.06; CVACC / LPFC = 1.19/1.31, ranksum-test, p=0.07; Firing RateACC/LPFC = 4.41/4.29, ranksum-test p=0.71; action potential repolarization time (hyperpolarization rate)ACC / PFC = 0.18 sec. (97 s.−1)/0.17 s. (93 s.−1)).

Figure 4 with 2 supplements see all
Clustering of e-type sub-classes of cells using their spike width, firing variability and rate.

(A) Dendrogram of cluster distances for neuron classes with broad spikes (five subclasses, blue), and narrower spikes (three subclasses, orange and red). (B) For each e-type (x-axis) the average LV, CV and firing rate. The rightmost point shows the average for all e-types combined. (C) Illustration of the average spike waveform, spiketrain raster example, and Local Variability (LV, upper histograms) for each clustered e-type. The bottom grey LV histogram includes all recorded cells to allow comparison of e-type specific distribution. (D) The average post- to pre- spike MUA modulation (y-axis) for neurons of the different e-types. Values below 0 reflect reduced multiunit firing after the neuron fires a spike compared to before the spike, indicating a relative suppressive relationship. Only the N3 etype showed a systematically reduced post-spike MUA modulation. MUA were always recorded from other electrodes nearby the spiking neuron. Source data 2 Data and script used for clustering (panel A) and data used for plotting panels B, and C.

Beyond the narrow spiking classes, spiketrains and LV distributions showed five broad spiking neuron e-types. The B1-B5 e-types varied from irregular burst firing in e-types B2, B3 and B4 (LV >1, class B2 mean LV 1.20, SE 0.02, class B3 mean LV 0.93, SE 0.02, class B4 mean 1.24, SE 0.03), regular firing in B1 (LV <1, class B1 mean LV 0.75, SE 0.02) to regular non-Poisson firing in B5 (LV >1, class B5 mean LV 1.68, SE 0.02) (number and % of broad spiking cells: B1: 109 (18%), B2: 103 (17%), B3: 94 (16%), B4: 146 (25%), B5: 138 (23%)) (Figure 4B,C). LV values > 1 indicate bursty firing patterns which is supported by a positive correlation of the LV of neurons with their probability to fire bursts defined as spikes occurring ≤5 ms apart (r = 0.44, p<0.001, Figure 1—figure supplement 2F). We next calculated the post- to pre- spike-triggered MUA modulation ratio for each of the e-types. Across all e-types only the spike-triggered MUA modulation ratio for the N3 e-type was different from zero (p<0.05, FDR-corrected) (Figure 4D). Comparison between cell classes showed that the spike-triggered MUA modulation ratio for the N3 e-type differed significantly from the B4 (p=0.02) and B5 (p=0.03) e-types.

The same interneuron subclass indexes P(choice) in LPFC and RPE in ACC

The distinct e-types allowed testing how they correlated their firing with choice probability and with RPE. We found that the only e-type with a significant average correlation of firing and choice probability during the cue period was the N3 e-type in LPFC (r = 0.08, Kruskal Wallis test, p=0.04; randomization test difference to zero, Tukey-Kramer multiple comparison corrected, p<0.05; Figure 5A,B). Consistent with this correlation, neurons of the N3 e-type in LPFC also significantly increased firing to the color cue, irrespective of whether the color cue appeared early or later in the trial (p<0.05 during 0.04–0.2 s after feature two onset, and p<0.05 during 0.175–0.225 s after feature one onset, Figure 5—figure supplement 1). The on-average positive correlation of firing rate and p(choice) was also evident in an example N3 e-type cell (Figure 5—figure supplement 2A–C). There was no other e-type in LPFC and in ACC showing significant correlations with choice probability. In LPFC, a linear classifier trained on multiclass p(choice) values was able to label N3 e-type neurons based on their p(choice) values with an accuracy of 31% (Figure 5—figure supplement 3A).

Figure 5 with 3 supplements see all
E-type-specific correlations with choice probability and reward prediction error in LPFC and ACC.

(A, B) Firing Rate X Choice Probability correlations for neurons of each e-type subclass in LPFC (A) and ACC (B). Only the N3 e-type neurons in LPFC show significant correlations. (C, D) Firing Rate X Reward Prediction Error correlations for neurons of each e-type subclass in LPFC (C) and ACC (D). The N3 e-type neurons in ACC show significant positive correlations, and the B3 e-type shows negative firing rate x RPE correlations. Grey shading denotes significance at p<0.05 (multiple comparison corrected). Error bars are SE’s. Source data 1 Correlation data and script for ploting panels A-D.

Similar to the N3 e-type in LPFC, in ACC it was the N3 e-type that was the only narrow spiking subclass with a significant functional firing rate correlation with reward prediction errors (RPE) (n = 30 neurons; r = 0.09, Kruskal Wallis test, p=0.01, randomization test for sign. difference to zero, Tukey-Kramer multiple comparison corrected p<0.05, Figure 5C,D). The only other e-type with a significant firing rate x RPE correlation was the B4 class which fired stronger with lower RPE’s (n = 18 neurons; r = −0.08, Kruskal Wallis test, p=0.01, randomization test for sign. difference to zero, multiple comparison corrected p<0.05). There was no subtype-specific RPE correlation in LPFC (Figure 5C,D). The average positive correlation of firing rate and RPE was also evident in example ACC N3 e-type cells (Figure 5—figure supplement 2D–F). In ACC, a linear classifier trained on multiclass RPE values was able to label N3 e-type neurons from their RPE value with an accuracy of 34% (Figure 5—figure supplement 3B).

Narrow spiking neurons synchronize to theta, beta, and gamma band network rhythms

Prior experimental studies have suggested that interneurons have unique relationships to oscillatory activity (Puig et al., 2008; Cardin et al., 2009; Sohal et al., 2009; Vinck et al., 2013; Womelsdorf et al., 2014a; Chen et al., 2017; Voloh and Womelsdorf, 2018; Shin and Moore, 2019; Banaie Boroujeni et al., 2020b; Onorato et al., 2020), raising the possibility that the N3 e-type neurons realize their functional contributions to p(choice) and RPE processing also through neuronal synchronization. To discern this, we first inspected the spike-triggered LFP averages (STAs) of neurons and found that STAs of many N3 e-type neurons showed oscillatory sidelobes in the 10–30 Hz range (Figure 6A). We quantified this phase synchrony by calculating the spike-LFP pairwise phase consistency (PPC) and extracting statistically significant peaks in the PPC spectrum (Vinck et al., 2012; Banaie Boroujeni et al., 2020a), which confirmed the presence of significant synchrony peaks across theta/alpha, beta and low gamma frequency ranges (Figure 6B). The density of spike-LFP synchrony peaks, measured as the proportion of neurons that show reliable PPC peaks (see Materials and methods), showed a high prevalence of 15–30 Hz beta synchrony for broad spiking neurons in both, ACC and LPFC, a peak of ~5–12 Hz synchrony that was unique to ACC, and a high prevalence of 35–45 Hz gamma synchronization in narrow spiking cells (but not in broad spiking cells) in both areas (Figure 6C; Voloh et al., 2020). The synchrony peak densities of the N3 e-type neurons mimicked this overall pattern by showing beta to gamma band synchrony peak densities in LPFC and a 5–12 Hz theta/alpha and a gamma synchrony in ACC (Figure 6C) (for peak densities of other e-types, see Figure 6—figure supplement 1).

Figure 6 with 1 supplement see all
Spike-LFP phase synchronization.

(A) Average spike-triggered local field potential fluctuations of nine N3 e-type neurons showing a transient LFP oscillations from 5 Hz up to ~30 Hz. Black vertical line is the time of the spike. The red lines denote the LFP after adaptive spike artifact removal (raw traces in gray). (B) Peak normalized pairwise phase consistency for each spike-LFP pair (y-axis) rank ordered according to the frequency (x-axis) with peak PPC. (C) Proportion of sign. peaks of spike-LFP synchronization for neurons in LPFC (left) and ACC (right) for narrow and broad spiking neurons (upper rows) and for the N3 e-type neurons (bottom row).

Interneuron-specific gamma synchronization following cues in LPFC and outcomes in ACC

The overall synchrony patterns leave open whether the synchrony is task modulated or conveys information about choices and prediction errors. We addressed these questions by calculating spike-LFP phase synchronization time-resolved around the color cue onset (for LPFC) and around reward onset (for ACC) separately for trials with high and low choice probabilities (for LPFC) and high and low reward prediction errors (for ACC). We found in LPFC that the N3 e-type neurons showed a sharp increase in 35–45 Hz gamma band synchrony shortly after the color cue is presented and choice probabilities were low (i.e. when the animals were uncertain which stimulus is rewarded), while broad spiking neurons did not show gamma synchrony (Figure 7A–C) (N3 e-type vs broad spiking cell difference in gamma synchrony in the 0–700 ms after color cue onset: p<0.05, randomization test, multiple comparison corrected). When choice probabilities are high, N3 e-type neurons and broad spiking neurons in LPFC showed significant increases of 20–35 Hz beta-band synchronization (Figure 7D,E) with N3 e-type neurons synchronizing significantly stronger to beta than broad spiking neuron types (Figure 7F) (p<0.05 randomization test, multiple comparison corrected). These effects were restricted to the color cue period. LPFC broad spiking neurons and N3 e-type neurons did not show spike-LFP synchronization after the reward onset in low or high RPE trials (Figure 7—figure supplement 1A–D). Moreover, the gamma synchrony when p(choice) was low was not found in other narrow spiking or broad spiking e-types with the LPFC N3 e-type showing stronger gamma synchrony than broad spiking classes in the low p(choice) trials (p=0.02, Tukey-Kramer multiple comparison corrected) (Figure 7—figure supplement 1E–F). There was no difference in 35–45 Hz gamma synchrony of other cell classes in LPFC in the 0–0.7 s after reward onset in the high or low RPE trials, or around the (0.7 s) color onset in the high p(choice) trials (Figure 7—figure supplement 1E–H, see Figure 7—figure supplement 2A for time-frequency maps for all cell classes around cue onset).

Figure 7 with 4 supplements see all
Spike-LFP phase synchronization in LPFC around the color onset for trials with low and high choice probability.

(A) Spike-LFP pairwise phase consistency for broad spiking neurons in LPFC around the time of the color onset (x-axis) for trials with the 50% lowest choice probabilities. (B) Same as (A) for neurons of the N3 e-type. Black contour line denotes statistically significant increased phase synchrony relative to the pre-color onset period. (C) Statistical comparison of spike-LFP synchrony for N3 e-type neurons (orange) versus broad spiking neurons (blue) for low choice probability trials in LPFC. Synchrony is normalized by the pre-color onset synchrony. Gray shading denotes p<0.05 significant differences of broad and N3 type neurons. (D,E,F) Same format as (A,B,C) but for the 50% of trials with the highest choice probability. Source data 3 Coherence data and script for ploting panels A-F.

In ACC, the N3 e-type neurons synchronized in a 35–42 Hz gamma band following the reward onset when RPE’s were high (i.e. when outcomes were unexpected), which was weaker and emerged later when RPEs were low, and which was absent in broad spiking neurons (Figure 8). In contrast to this gamma synchronization at high RPE, low RPE trials triggered increased spike-LFP synchronization at a ~ 6–14 Hz theta/alpha frequency in the N3 e-type neurons (Figure 8C). The increase of 6–14 Hz synchrony was significantly stronger in the N3 e-type than in broad spiking neurons in the 0 to 0.7 s post reward onset period (Figure 8F). These gamma and theta band effects of the N3 e-type neurons in ACC were restricted to the reward period, that is, they were absent in the color cue period for trials with high or low p(choice) (Figure 7—figure supplement 3A–D). Comparison to the other e-types showed that the N3 e-type significantly stronger gamma synchronized in the reward period when RPEs were high (p=0.04, Tukey-Kramer, multiple comparison corrected) (Figure 7—figure supplement 3E). Other e-type classes did not differ in their spike-LFP synchronization in this 35–45 Hz gamma band in low or high RPE trials with the exception of the B2 class in ACC that synchronized in high RPE trials at a higher >50 Hz gamma band (Figure 7—figure supplement 3E–H, see Figure 7—figure supplement 2B for time-frequency maps for all cell classes around reward onset).

Spike-LFP phase synchronization in ACC during outcome processing for trials with low and high reward prediction errors.

(A) Spike-LFP pairwise phase consistency for broad spiking neurons in ACC around reward onset (x-axis) for trials with the 50% lowest reward prediction errors. (B) Same as (A) for neurons of the N3 e-type. Black contour line denotes statistically significant increased phase synchrony relative to the pre-reward period. (C) Statistical comparison of the spike-LFP synchrony (normalized by the pre-reward synchrony) for N3 e-type neurons (orange) versus broad spiking neurons (blue) in ACC for trials ending in low reward prediction errors. Gray shading denotes frequencies with p<0.05 significant differences of broad spiking versus N3 e-type neurons. (D,E,F) Same format as (A,B,C) but for the 50% of trials with the highest high reward prediction error outcomes. Source data 3 Coherence data and script for ploting panels A-F.

The spike-LFP synchronization results in PFC and in ACC were unchanged when the average reward onset aligned LFP, or the average color-cue aligned LFP was subtracted prior to the analysis, which controls for a possible influence of lower frequency evoked potentials (Figure 7—figure supplement 4).

Circuits model of interneuron-specific switches between gamma and beta or theta synchronization

The previous results showed that neurons of the N3 e-type engaged in a transient ~35–45 Hz gamma band synchronization during trials that were characterized by uncertainty. In LPFC gamma synchronization was evident when expected stimulus values were uncertain (reflected in low p(choice)), and in ACC gamma synchronization emerged when reward outcomes were uncertain (reflected in high RPE). In contrast, there was no gamma-band synchrony when choice probabilities were certain and reward outcomes predictable. In these trials, N3 e-type neurons rather showed beta synchronization to the cue (in LPFC), or theta band synchronization to the reward onset (in ACC). These findings indicate that oscillatory activity signatures inform us about the possible circuit motifs underlying uncertainty-related related computations. These computations are formally described in the reinforcement learning framework allowing us to propose a linkage of specific computations to oscillatory activity signatures and their putative circuits as proposed in the Dynamic Circuits Motif framework (Womelsdorf et al., 2014b).

To show the feasibility of this approach we devised two circuit models that reproduces the gamma band activity signatures in LPFC and ACC using populations of inhibitory cells modeled to correspond to N3 e-type cells (for modeling details, see Appendix 1). First, we modeled a putative LPFC circuit. Here, N3 e-type neurons showed gamma synchronization when p(choice) was low which happens in trials in which the values of the two available objects are similar and the choice among them is difficult (see Equation 3 in Materials and methods). We predicted in this situation gamma synchronization of the N3 e-type reflects resolving competition among inputs from similarly active, pyramidal cell populations encoding the expected values of the two objects. To test whether this scenario is plausible, we conceptualized and then simulated a circuit which modelled the activity of an N3 e-type neuron population that we presumed to be PV+ fast spiking basket cells (see Discussion) activated by two excitatory pyramidal cell populations (Es) whose activity scales with the value of the stimuli (Figure 9A). Such an E-I network can synchronize by way of mutual inhibition at beta or gamma frequencies depending on the total amount of drive the network receives (Wang and Buzsáki, 1996; White et al., 1998; Tiesinga and José, 2000). When both stimuli have similar values and the choice probability is relatively low, the drive to the network is high and it synchronizes in the gamma band. In contrast, when one of the objects has a value that is much larger than the other which results in high choice probabilities for that stimulus, it results in a net level of drive that makes the network synchronize in the beta band. We observed such a switch from gamma to beta frequencies in N3 e-type interneurons in LPFC when the choice probabilities changed from low to high (Figure 7). In order to show that such gamma-to-beta switch can indeed follow from such a E-I network as a function of the diversity of inputs we ran simulations in a firing rate E-I model (Keeley et al., 2017), described in detail in Appendix 1, which reproduces the gamma-beta switch (Figure 9—figure supplement 1). The network model simulations suggest that the N3 e-type inhibition in LPFC after color-cue onset might accomplish two functions. It leads to a normalization that transforms the object value into a choice probability (a soft winner-take-all gating of values, see Equation 3 in Materials and methods) and its gamma synchrony indexes resolving strong competition when similar excitatory drive originates from different sources (Figure 9A).

Figure 9 with 2 supplements see all
Hypothetical link of the observed gamma band synchronization of the N3 e-type to circuit motifs and their putative functional correlate.

(A) The N3 e-type in LPFC synchronized at gamma when p(choice) was relatively low and at beta frequencies otherwise. The switch from gamma to beta synchronization can be parsimoniously reproduced in a circuit model with an interneuron (I) population receiving inputs from two excitatory (E) populations. When the input is diverse (similar p(choice)) a simulated circuit shows gamma activity (left) while when one excitatory population dominates it engages in beta synchronization (simulation details in Appendix 1). This activity signature could correspond at the functional level to choosing among similar valued stimuli (left) versus choosing stimuli with different values (bottom row). (B) In ACC the N3 e-type synchronized at gamma when the prediction error was large and at theta frequencies otherwise. The switch from gamma to theta synchronization can parsimoniously be reproduced in a circuit model with two I populations having different time constants and reciprocally connected to an E population. When the faster spiking I1 population is activated stronger, either directly from an external source, putatively by disinhibition of another interneuron population, the network synchronizes at gamma while otherwise the I2 neurons population imposes slower theta rhythmic synchrony to the network (simulation details in Appendix 1). Bottom: The activity states were functionally linked to those trials when outcomes mismatched expectations (high RPE) or matched the expected outcomes (low RPE).

Secondly, we conceptualized and simulated a circuit model that reproduces the oscillatory findings in ACC where the N3 e-type neurons gamma-synchronized when outcomes were unexpected (high RPE) but synchronized in the theta band otherwise (low RPE). Such a gamma/theta switch is different to the gamma/beta switch seen in LPFC (see above). A parsimonious circuit realizing such a switch uses two separate interneuron populations (Is) that inhibit a common group of pyramidal cells (Es): A fast interneuron (I1) presumed to be PV+, corresponding to the N3 e-type (see Discussion), and a slower interneuron population (I2) (Figure 9B). When both are reciprocally connected with an excitatory population (E), an oscillatory regime emerges whose frequency varies depending on which interneuron population receives more excitatory drive (details in Appendix 1). When the I1 population receives stronger drive, gamma frequency synchronization dominates the network, while a relatively stronger drive to the I2 population causes neurons in the network to switch to slower, theta band synchronization. We documented this gamma/theta switching result in simulations of firing rate neurons in detail in the Appendix 1. The activity signatures of this E-I-I model resembles the empirical activity signatures. The theta synchronous activity that reflects the activity of I2 neurons corresponds to low RPE trials, in which a reward R is received and the value V of the chosen stimulus was relatively high (a high V and a large R, the RPE is computed as = R-V) (see Equation 1 in Materials and methods) (Watabe-Uchida et al., 2017). In contrast, the gamma synchronous state that emerged with larger drive to the I1 neurons in the model corresponds to high RPE trials, in which a reward R is received, but the value V of the chosen stimulus was relatively low. This circuit motif is plausible when one assumes that the I1 neuron population is disinhibited when the chosen stimulus value is low. Such a disinhibition can be achieved by lowering the drive to I2 cells (which may require high values to be activated), or by assuming a separate disinhibitory circuit (for details see Appendix 1). In summary, the E-I-I motif reproduces the switch of gamma to theta synchronization we observed in ACC N3 e-type neurons. At the functional level, the circuit suggests that the emergence of gamma activity in this network indexes the detection of a mismatch between the received reward (as one source of excitation) and the chosen stimulus value (as another source of excitation) (Figure 9B).

The described circuits provide proofs-of-concept that the synchronization patterns we observed in the N3 e-type interneurons in ACC and LPFC during periods of uncertain values and outcomes can originate from biologically realistic circuits. The results justify future studies generating and testing quantitative predictions that can be derived from these circuit motifs.

Discussion

We found that narrow spiking neurons in the medial and lateral prefrontal cortex of macaques cause a fast drop of local multiunit activity indicative of inhibitory interneurons. These putative interneurons in LPFC showed increased firing rates to the color-cue onset, encoded the rewarded color and correlated their rates with the choice probabilities, while in ACC their firing correlated with reward prediction errors during the processing of the reward outcome. These functional signatures were specifically linked to a putative interneuron subtype (N3) that showed intermediate narrow action potential waveforms and more regular firing patterns than expected from a Poisson process (LVs of N3 e-type neurons: 0.84). Moreover, this putative interneuron (N3) e-type engaged in prominent event-triggered 35–45 Hz gamma band synchronization in each of the recorded brain areas. In LPFC, the N3 e-type synchronized at gamma to the cue when choice probabilities were low and uncertain, and in ACC the N3 e-type synchronized at gamma to the reward onset when the RPE was high and the reward outcome was unexpected. Thus, the same e-type showed functional firing correlations and gamma synchrony in LPFC and in ACC during periods of uncertainty about cues and outcomes, respectively. Taken together, these findings point to a special role of the same type of interneuron in LPFC and in ACC to realize their area specific functional contribution to the color-based reversal learning task. This interpretation highlights several aspects of interneuron specific circuit functions.

Characterizing narrow spiking interneurons in vivo

The first implication of our findings is that narrow spiking neurons can be reliably subdivided in three subtypes based on their electrophysiological firing profiles. Distinguishing three narrow spiking neurons in vivo during complex task performance is a significant step forward to complement previous electrophysiological distinctions of three interneuron types in-vitro (Zaitsev et al., 2009; Torres-Gomez et al., 2020) or in vivo (Ardid et al., 2015; Dasilva et al., 2019; Shin and Moore, 2019; Banaie Boroujeni et al., 2020b), and complementing the finer-grained electrophysiological characterization of ‘e-types’ in-vitro that has been achieved with a rich battery of current injection patterns that are difficult to apply in the awake and behaving primate (Markram et al., 2004; Monyer and Markram, 2004; Medalla et al., 2017; Gouwens et al., 2019). This in vitro ‘e-typing’ has distinguished eleven (Markram et al., 2015) or thirteen (Gouwens et al., 2019) distinct interneuron e-types in rodent somatosensory and mouse visual cortex, respectively. In the visual cortex, these classes entailed six fast spiking subclasses showing variably transient, sustained or pause-delay response patterns (Gouwens et al., 2019). Notably, the fast spiking interneuron classes in that study were characterized by a low coefficient of variation (CV), low bursting reflective of a low Local Variability (LV), and a feature-importance analysis showed that the narrow action potential width and firing rate of these neurons were most diagnostic for separating the fast spiking from other neuron classes (Figure 2i, S9, and S14 in Gouwens et al., 2019). Our study used these diagnostic metrics (LV, CV, AP width and rate) directly for the clustering because we do not have the current injection responses available and distinguished three interneurons in the monkey compared to six fast spiking interneuron e-types in the mouse study. These results illustrate that our three interneuron e-types will encompass further subclasses that future studies should aim to distinguish in order to narrow the gap between the in vivo e-types that we and others report in the monkey, and the in-vitro e-types in the rodents that are more easily mapped onto specific molecular, morphological and genetic make-ups (Markram et al., 2015; Gouwens et al., 2019). As a caveat, this mapping of cell types between species might also reveal cell classes and unique cell class characteristics in nonhuman primate cortices that are not similarly evident in rodents as recently demonstrated in a cross-species study of non-fast spiking gamma rhythmic neurons in early visual cortex that were exclusively evident in the primate and not in mice (Onorato et al., 2020).

With regard to the specific interneuron e-types we believe that the N3 e-type that showed functional correlations in two areas encompasses mostly parvalbumin PV+ expressing neurons, because of their narrow spikes, regular inter-spike intervals and their propensity to synchronize at gamma, which resemble the regular firing and gamma synchrony described for PV+ cells in the rodent (Cardin et al., 2009; Tiesinga, 2012; Stark et al., 2013; Amilhon et al., 2015; Chen et al., 2017; Gouwens et al., 2019). Moreover, similar to the N3 e-type responses to the attention cue, rodent dorsomedial frontal PV+ neurons systematically activate to preparatory cues while somatostatin neurons respond significantly less (Pinto and Dan, 2015). However, PV+ neurons are heterogeneous and entail Chandelier cells and variably sized basket cells (Markram et al., 2004; Markram et al., 2015; Gouwens et al., 2019). It might therefore be an important observation that the N3 e-type was distinguished from other narrow spiking neurons by having a lower firing rate and an intermediate-narrow action potential shape as opposed to the narrowest waveform and highest firing rates that N1 e-types showed. The proposed tentative suggestion that N3 e-type neurons will be mostly PV+ cells also entails for the primate brain that they would not be part of calretinin (CR+) or calbindin (CB+) expressing cells as their expression profiles do not apparently overlap (Dombrowski et al., 2001; Medalla and Barbas, 2009; Raghanti et al., 2010; Torres-Gomez et al., 2020).

What is the circuit role of the N3 interneuron e-type?

Assuming that N3 e-type neurons are partly PV+ neurons, we speculate that this translates into gamma rhythmic inhibition of local circuit pyramidal cells close to their soma where they impose output gain control (Tiesinga et al., 2004; Bartos et al., 2007; Womelsdorf et al., 2014b; Tremblay et al., 2016). In our task, such local inhibition was linked to how uncertain the expected values of stimuli were (reflected in low choice probabilities) or how unexpected reward outcomes were (reflected in high RPE’s). These conditions are periods that require a behavioral adaptation for which N3 e-type mediated inhibition could be instrumental. For example, in LPFC pyramidal cells that encoded the rewarded color in trials prior to the un-cued reversal become irrelevant when the reversal links reward to the alternative color and hence need to be suppressed during the reversal. This suppression of neurons encoding the previously relevant but now irrelevant color might be realized through activation of the N3 e-type neuron. Similarly, the N3 e-type activation in ACC reflects a rise in inhibition when an unexpected outcome (high RPE) is detected. This activation might therefore facilitate the updating of value expectations to reduce future prediction errors (Sutton and Barto, 2018; Oemisch et al., 2019).

The described, putative functions of N3 e-type activity provide direct suggestions on how they might contribute to transform inputs to outputs in a neural circuit. To understand this process, we devised and simulated circuit models of the activity signatures of inhibitory cells for the LPFC and the ACC (Figure 9, Appendix 1). For LPFC, we devised an E-E-I circuit where the interneuron (I) population synchronized at gamma when the excitatory drive of two E-cell populations was similar (Appendix 1, Figure 9—figure supplement 1). This situation mimics the situation when the values of two objects are similar, resulting in a low choice probability. According to this circuit, the function of I cells that putatively correspond to the N3 e-type neurons in LPFC is twofold. They normalize the activity of the excitatory cells, and they are instrumental in gating the activity of one over the other excitatory cell population when there is competition among them. Such competition arises specifically when choice probabilities are low because the low p(choice) indicates that the expected values of the stimuli to choose from are similar which makes a choice difficult. We therefore speculate that the putative circuit function of the N3 e-type cells in LPFC is the gating of competing excitatory inputs (Figure 9A).

For ACC, we devised an E-I-I circuit where the population of the N3 e-type putatively corresponded to one population of fast spiking inhibitory neurons (I1) that synchronized to gamma when receiving stronger excitatory drive than another population of slower inhibitory neurons (I2) (Figure 9—figure supplement 1B). The enhanced excitation of the I1 over the I2 population was modeled to correspond to trials with high RPE, which occurred when a reward (R) was received but the expected value (V) of the chosen stimulus was relatively low (a large RPE defined as the difference of R-V). In this situation, a stronger excitatory drive and consequently a gamma synchronous activity, could follow from disinhibiting the I1 population. Such a disinhibition could originate from reduced inhibition from the I2 cells in trials with low stimulus value, or it could originate from disinhibition from other neurons. These scenarios deserve explicit testing in future studies (for further discussion, see Appendix 1). They gain plausibility from anatomical studies that report that a large proportion of connections to interneurons go to disinhibitory interneurons that express calretinin and are distinct from the fast-spiking PV+ neurons that more likely entail the N3 e-type neurons (Medalla and Barbas, 2009; Medalla and Barbas, 2010). In summary, the proposed circuit model for the ACC suggests that the N3 e-type neurons activate when there is a mismatch of reward and chosen value. Activation of the N3 e-type neurons may thus be a (bio-) marker that predictions need to be updated to improve future performance.

We acknowledge that the proposed circuit models represent merely a proof-of-concept that says that the neuronal activities can originate in reasonable and previously described E-I motifs. They are not full biophysical implementations of the actual reversal learning task and entail finer predictions that await quantitative testing in future studies. They motivate combined electrophysiological and optogenetic studies in the primate to clarify cell-type-specific circuit functions during higher cognitive operations.

Interneuron-specific gamma synchronization: Comparison to previous studies

Two major findings of our study pertain to spike-LFP gamma band synchronization. First, we found that N3 e-type neurons showed an event-triggered synchrony increase in the same 35–45 Hz gamma frequency band in both LPFC and ACC when there was uncertainty about the correct choice (low p(choice) or about the outcomes (high RPE) [see Figures 7C and 8F]). Synchronization of the N3 e-type switched from a gamma frequency to the beta frequency in LPFC when the choices became more certain, and to the theta frequency in ACC when outcomes became more certain. An intrinsic propensity for generating gamma rhythmic activity through, for example GABAaergic time constant, is well described for PV+ interneurons (Wang and Buzsáki, 1996; Bartos et al., 2007; Womelsdorf et al., 2014b; Chen et al., 2017) and is a documented activity signature even at moderate excitatory feedforward drive that might be more typical for prefrontal cortices than earlier visual cortices (Cardin et al., 2009; Vinck et al., 2013; Shin and Moore, 2019; Onorato et al., 2020).

Our findings provide strong empirical evidence that narrow spiking interneurons are the main carriers of gamma rhythmic activity in nonhuman primate prefrontal cortex during cue and outcomes processing (Whittington et al., 2000; Hasenstaub et al., 2005; Bartos et al., 2007; Hasenstaub et al., 2016; Chen et al., 2017; Shin and Moore, 2019). This conclusion resonates well with rodent studies that document how interneurons in infra-/peri-limbic and cingulate cortex engage in gamma synchrony (Fujisawa and Buzsáki, 2011; Cho et al., 2015).

The second major implication of the gamma synchronous N3 e-type neurons is that gamma band synchrony was associated with task epochs in which neural circuits realize a circuit function that can be considered to be ‘area specific’. In LPFC, the gamma increase was triggered by the color-cue onset of two peripherally presented stimuli that instructed covertly shifting attention. Our circuit model (Figure 9A) illustrates that cue related gamma was restricted to periods when object values were similar, and the animal still learned which object is most reward predictive. The control of learning what is relevant during cognitively demanding tasks is a key function of the LPFC, suggesting that gamma activity emerges when this key function is called upon (Miller and Cohen, 2001; Szczepanski and Knight, 2014; Cho et al., 2020). A similar scenario holds for the ACC whose central function is often considered to monitor and evaluate task performance and detect when outcomes should trigger a change in behavioral strategies (Shenhav et al., 2013; Heilbronner and Hayden, 2016; Alexander and Brown, 2019; Fouragnan et al., 2019). In ACC, the gamma increase was triggered by an unexpected, rewarded outcome (high RPE). Thus, the N3 e-type specific gamma band signature occurred specifically in those trials with conflicting stimulus values requiring behavioral control to reduce the prediction errors through future performance (Figure 9A). Considering this ACC finding together with the LPFC finding suggests that gamma activity of N3 e-type neurons indexes a key function of these brain areas, supporting recent causal evidence from rodent optogenetics (Cho et al., 2020).

Consistent with the proposed importance of interneurons for area-specific key functions prior studies have documented the functional importance of inhibition in these circuits. Blocking inhibition with GABA antagonists like bicuculline not only renders fast spiking interneurons nonselective during working memory tasks but abolishes the spatial tuning of regular spiking (excitatory) cells during working memory tasks in monkeys (Sawaguchi et al., 1989; Rao et al., 2000), disturbs accuracy in attention tasks (Paine et al., 2011) and reduces set shifting flexibility by enhancing perseveration (Enomoto et al., 2011). Similarly, abnormally enhancing GABAa levels via muscimol impairs working memory and set shifting behavior (Rich and Shapiro, 2007; Urban et al., 2014) and can result in either maladaptive impulsive behaviors (Paine et al., 2015), and when applied in anterior cingulate cortex to perseveration (Amiez et al., 2006). Thus, altered medial and lateral prefrontal cortex inhibition is closely linked to an inability to adjust attentional strategies given unexpected outcomes. This evidence supports our studies suggestion of the importance of inhibitory neuron involvement in resolving uncertainties during adaptive behaviors.

Taken together, our interneuron-specific findings in primate LPFC and ACC stress the importance of interneurons to influence circuit activity beyond a mere balancing of excitation. Multiple theoretical accounts have stressed that some types of interneurons ‘control information flow’ (Fishell and Kepecs, 2020), by imposing important filters for synaptic inputs to an area and gain-control the output from that area (Akam and Kullmann, 2010; Kepecs and Fishell, 2014; Womelsdorf et al., 2014b; Roux and Buzsáki, 2015; Cardin, 2018). Testing these important circuit functions of interneurons has so far been largely limited to studies using molecular tools. Our study addresses this limitation by characterizing putative interneurons, delineating their suppressive effects on the circuit and highlighting their functional activation during reversal learning. The observed interneuron-specific, gamma synchronous coding of choice probabilities and prediction errors lends strong support to study cell-type-specific circuit mechanisms of higher cognitive functions.

Materials and methods

All animal care and experimental protocols were approved by the York University Council on Animal Care (ethics protocol 2015–15 R2) and were in accordance with the Canadian Council on Animal Care guidelines.

Electrophysiological recording

Request a detailed protocol

Data was collected from two male rhesus macaques (Macaca mulatta) from the anterior cingulate cortex and lateral prefrontal cortex as described in full in Oemisch et al., 2019. Extracellular recordings were made with tungsten electrodes (impedance 1.2–2.2 MOhm, FHC, Bowdoinham, ME) through rectangular recording chambers implanted over the right hemisphere. Electrodes were lowered daily through guide tubes using software-controlled precision micro-drives (NAN Instruments Ltd., Israel). Wideband local field potential (LFP) data was recorded with a multi-channel acquisition system (Digital Lynx SX, Neuralynx) with a 32 kHz sampling rate. Spiking activity was obtained following a 300–8000 Hz passband filter and further amplification and digitization at a 32 kHz sampling rate. Sorting and isolation of single unit activity was performed offline with Plexon Offline Sorter, based on the first two principal components of the spike waveforms and the temporal stability of isolated neurons. Only well-isolated neurons were considered for analysis (Ardid et al., 2015). Experiments were performed in a custom-made sound attenuating isolation chamber. Monkeys sat in a custom-made primate chair viewing visual stimuli on a computer monitor (60 Hz refresh rate, distance of 57 cm) and performing a feature-based attention task for liquid reward delivered by a custom-made valve system in Oemisch et al., 2019.

Anatomical reconstruction of recording locations

Request a detailed protocol

Recording locations were identified using MRI images obtained following initial chamber placement. During MR scanning, we placed a grid marking the chamber center and peripheral positions as well as a diluted iodine solution inside the chamber for visualization. This allowed the referencing of target regions to the chamber center in the resulting MRI images. The positioning of electrodes was estimated daily using the MRI images and audible profiles of spiking activity. The relative coarseness of the MRI images did not allow us to differentiate the specific layer of recording locations in lateral prefrontal and anterior cingulate cortices.

Task paradigm

Request a detailed protocol

The task (Figure 1) required centrally fixating a dot and covertly attending one of two peripherally presented stimuli (5° eccentricity) dependent on color-reward associations. Stimuli were 2.0° radius wide block sine gratings with rounded-off edges, moving within a circular aperture at 0.8 °/s and a spatial frequency of 1.2 (cycles/°). Color-reward associations were reversed without cue after 30 trials or until a learning criterion was reached, which makes this task a color-based reversal learning task.

Each trial began with the appearance of a gray central fixation point, which the monkey had to fixate. After 0.5–0.9 s, two black/white gratings appeared to the left and right of the central fixation point. Following another 0.4 s the two stimulus gratings either changed color to green and red (monkey K: cyan and yellow), or they started moving in opposite directions up and down, followed after 0.5–0.9 s by the onset of the second stimulus feature that had not been presented so far, for example if after 0.4s the grating stimuli changed color then after another 0.5–0.9 s they started moving in opposite directions. After 0.4–1 s either the red and green stimulus dimmed simultaneously for 0.3 s or they dimmed separated by 0.55 s, whereby either the red or green stimulus could dim first. The dimming of the rewarded stimulus represented the GO cue to make a saccade to one of two response targets displayed above and below the central fixation point. The dimming of the no-rewarded stimulus thus represented a NO-GO cue triggering the withholding of a response and waiting until the rewarded stimulus dimmed. The monkeys had to keep central fixation until this dimming event occurred.

A saccadic response following the dimming was rewarded if it was made to the response target that corresponded to the (up- or down-ward) movement direction of the stimulus with the color that was associated with reward in the current block of trials, for example if the red stimulus was the currently rewarded target and was moving upward, a saccade had to be made to the upper response target at the time the red stimulus dimmed. A saccadic response was not rewarded if it was made to the response target that corresponded to the movement direction of the stimulus with the non-reward associated color. Hence, a correct response to a given stimulus must match the motion direction of that stimulus as well as the timing of the dimming of that stimulus. This design ensures the animal could not anticipate the time of dimming of the current target stimulus (which could occur before, after, or at the same time as the second stimulus), and thus needed to attend continuously until the ‘Go-signal’ (dimming) of that stimulus occurred. If dimming of the target stimulus occurred after dimming of the second/distractor stimulus, the animal had to ignore dimming of the second stimulus and wait for dimming of the target stimulus. A correct response was followed by 0.33 ml of water reward.

The color-reward association remained constant for 30 to a maximum of 100 trials. Performance of 90% rewarded trials (calculated as running average over the last 12 trials) automatically induced a block change. The block change was un-cued, requiring monkeys to use the trial’s reward outcome to learn when the color-reward association was reversed. Reward was delivered deterministically.

In contrast to color, other stimulus features (motion direction and stimulus location) were only randomly related to reward outcome – they were pseudo-randomly assigned on every trial. This task ensured that behavior was guided by attention to one of two colors, which was evident in monkeys choosing the stimulus with the same color following correct trials with 89.5% probability (88.7%/90.3% for monkey H/K), which was significantly different from chance (t-test, both p<0.0001).

Monkeys performed the task at 83/86% (monkey’s H/K) accuracy (excluding fixation break errors). The 17/14% of errors were composed on average to 50/50% of erroneous responding to the dimming of the distractor when it dimmed before the target and 34/37% of erroneous responding at the time when target and distractor dimmed simultaneously but the monkey chose the distractor direction, and 16/13% of error were responses when the target dimmed before any distractor dimming and the choice was erroneously made in the direction of the distractor.

Behavioral analysis of the animal’s learning status

Request a detailed protocol

To characterize the reversal learning status of the animals, we determined the trial during a block when the monkey showed consistent above chance choices of the rewarded stimulus using the expectation maximization algorithm and state–space framework introduced by Smith et al., 2004, and successfully applied to reversal learning in our previous work (Balcarras et al., 2016; Hassani et al., 2017; Oemisch et al., 2019). This framework entails a state equation that describes the internal learning process as a hidden Markov or latent process and is updated with each trial. The learning state process estimates the probability of a correct (rewarded) choice in each trial and thus provides the learning curve of subjects. The algorithm estimates learning from the perspective of an ideal observer that takes into account all trial outcomes of subjects’ choices in a block of trials to estimate the probability that the single trial outcome is reward or no reward. This probability is then used to calculate the confidence range of observing a rewarded response. We identified a ‘Learning Trial’ as the earliest trial in a block at which the lower confidence bound of the probability for a correct response exceeded the p=0.5 chance level.

Reinforcement learning modeling to estimate choice probability and expected value of color

Request a detailed protocol

The color reversal task required monkeys to learn from trial outcomes when the color reward association reversed to the alternate color. This color-based reversal learning is well accounted for by an attention augmented Rescorla Wagner reinforcement learning model (‘attention-augmented RL’) that we previously tested against multiple competing models (Balcarras et al., 2016; Hassani et al., 2017; Oemisch et al., 2019). Here, we use this model to estimate the trial-by-trial fluctuations of the expected value for the rewarded color, the choice probability p(choice) of the animal’s stimulus selection and the positive reward prediction error (RPE, ‘R-V’, see Equation 1, below). P(choice) increased and RPE decreased with learning similar to the increase in the probability of the animal to make rewarded choices (Figure 2—figure supplement 3). They were highly anticorrelated (r = −0.928) (Figure 2—figure supplement 3A).

The attention augmented RL is a standard Q Learning model with an added decay constant that reduces the value of those features that are part of the non-chosen (i.e. non-attended) stimulus on a given trial. On each trial t this model updates the value V for features i of the chosen stimulus according to

Vi,t+1=Vi,t+η(Rt-Vi,t)

where R denotes the trial outcome (0=non-rewarded, 1=rewarded) and η is the learning rate bound to [0 1]. For the same trial, the feature values i of the non-chosen stimulus decay according to

Vi,t+1=(1-ω)Vi,t

With ω denoting the decay parameter. Following these value updates, the next choice Ct+1 is made by a softmax rule according to the sum of values that belongs to each stimulus. We indicate the stimulus by the index j and the set of feature values that belong to it by set sj, (for instance, color x, location y, direction z):

(3) PCt+1=j=exp(βisjVi,t)jexp(βisjVi,t)

Equation 4 defines the choice probability, or p(choice), that is used for the neuronal analysis of this manuscript (Sutton and Barto, 2018). P(choice) increases with trials since reversal (Figure 2—figure supplement 3D), indicating a reduction in the uncertainty of the choice the more information is gathered about the value of the stimuli.

We optimized the model by minimizing the negative log likelihood over all trials using up to 20 iterations of the simplex optimization method to initialize the subsequent call to fmincon matlab function, which constructs derivative information. We used an 80/20% (training/test dataset) cross-validation procedure repeated for n = 50 times to quantify how well the model predicted the data. Each of the cross-validations optimized the model parameters on the training dataset. We then quantified the log-likelihood of the independent test dataset given the training datasets optimal parameter values. The cross-validation results were compared across multiple models in a previous study (Oemisch et al., 2019). Here, we used the best-fitting model based on this prior work.

Waveform analysis

Request a detailed protocol

We initially analyzed 750 single units and excluded 24 units that showed double troughs or those that had overall less than 50 spike number. We then analyzed 726 highly isolated cells in ACC (397 cells), and PFC (149 cells area 8, and 180 cells dLPFC). We trough-aligned all action potentials (AP) and normalized them to the range of −1(trough) to 1 (peak). APs were then interpolated from their original time-step of 1/32000 s to a new time step of 1/320000 s. To characterize AP waveforms, we initially computed three different measures of Trough to Peaks (T2P) and Time for Repolarization (T4R) and Hyperpolarization Rate (HR) according to Equation 4-6:

(4) T2P=(ttrough-tpeak)
(5) T4R=(t0.75xpeaktpeak)
(6) HR=1tVpeaktV0.63xpeak

where tpeak is time of the most positive value (peak) of the spike waveform, ttrough is time of the most negative value of the spike waveform, t0.75xpeak is the time of spike waveform after the peak with a voltage equal to 75% of the peak and tV0.63xpeak is the time of the spike waveform before the peak with a voltage value equal to 63% of the peak (Figure 1—figure supplement 2A,B). We performed Hartigan’s dip test was to test the unimodality hypothesis of distributions (P<0.05). HR and T2P were highly correlated (r=-.76). We chose HR as it was able to reject the Hartigan’s dip test null hypothesis of distribution unimodality (P=0.01). We then used HR and T4R to characterize waveform dynamics. T4R interval likely describes dynamics of the waveform in a period that calcium activated potassium channels are activated and most voltage-gated potassium channels are closed. While, HR reflects a time interval that most of sodium channels are closed and potassium channels have greater contribution to the dynamics of the waveform (Bean, 2007).

Both T4R and HR and their first component of the PCA were fitted with a bi-modal Gaussian distribution. We applied Akaike's and Bayesian information criteria for the two vs one Gaussian fits to select the best fit to the waveform measures.

Data analysis

Request a detailed protocol

Analysis of spiking and local field potential activity was done with custom MATLAB code (Mathworks, Natick, MA), utilizing functions from the open-source Fieldtrip toolbox (http://www.ru.nl/fcdonders/fieldtrip/).

For all statistical tests that were performed on time-series, we used permutation randomization test and multiple comparisons with both primary and secondary alpha level of 0.05, unless the type of multiple comparison correction is explicitly mentioned.

Spike-triggered multiunit modulation

Request a detailed protocol

We used spike-triggered multiunit analysis to estimate whether its spiking increased or decreased concomitantly with the surrounding neural activity – measured on a different electrode located ~200-450 μm from the electrode measuring the spiking activity. To compute the relative multi-unit activity (MUA) of the signal before and after spike occurrences, we used the Wide-Band signal and bandpass filtered the signal to a frequency range of [800 3000] Hz. The signal was then rectified to positive values. For each single unit, we extracted a period of [-50 50] ms around each spike aligned to the spike trough and estimated the power time-course of the signal using a sliding median filter window (window length=5 ms) over the extracted signal every 0.5 ms. For a given single unit, we computed the Z-transformation of each spike-aligned median filtered peak-amplitude by subtracting its mean and dividing by its standard deviation. This step normalized the MUA around the spike times. We then computed the average Z-transformed MUA across all spikes for each single unit. To compare the post spike MUA to pre-spike MUA, we computed the spike triggered MUA modulation ratio (SMUM) according to equation SMUM=MUApost-MUApreMUApre. Pre-spike MUA was the mean in a period of 10 ms before the spike and the Post-spike MUA was the mean in a period of 10ms after the spike.

For comparison of spike-triggered MUA modulation of broad vs narrow spiking neurons, we used the Wilcoxon test on the computed ratio, under the null hypothesis that there is no difference of MUA strength before and after the spike occurrence for narrow vs broad spiking neurons. We also performed the test on each individual group compare with population.

We also tested whether spike-triggered MUA modulation differed varies with the distance of the electrode tip that measured the spike providing neuron and the electrode that measured the MUA, but found no distance dependency (Wilcoxon test, n.s.).

Analysis of firing statistics

Request a detailed protocol

To analyze firing statistics of cells, we followed procedures described in by Ardid et al., 2015, and for each neuron we computed the mean firing rate (FR), Fano factor (FF, mean of variance over mean of the spike count in consecutive time windows of 100 ms), the coefficient of variation (CV, standard deviation over mean of the inter-spike intervals, Figure 1—figure supplement 2C), and a measure of local variability of spike trains called the local variation (LV, Figure 1—figure supplement 2D). LV measures the regularity/burstiness of spike trains. It is proportional to the square of the difference divided by sum of two consecutive inter-spike intervals (Shinomoto et al., 2009).

Cell clustering technique

Request a detailed protocol

We followed procedures described in Ardid et al., 2015, with minor adjustments to test whether neurons fall into different clusters according to the dynamics of their waveform dynamic measures and their firing statistics. For main clustering analysis, we used the K-Means clustering algorithm MATLAB/GNU Octave open-source code, freely available in public Git repository https://bitbucket.org/sardid/clusteringanalysis. We used the K-Means clustering algorithm to characterize subclasses of cells within the dataset upon the Euclidian distances of neuronal measures. We initially used three measures of the waveform: Hyperpolarization Rate, Time for repolarization, and their first component of PCA. For the firing statistic measures we used local variation, coefficient of variance, Fano factor, and firing rate. The k-Means clustering algorithm is sensitive to duplicated and uninformative measures. We set a criterion of. 9 of Spearmans’ correlation coefficient to exclude measures that were highly correlated (1st PCA was excluded). To reduce the biases upon on variable magnitudes, we z-score transformed each measure and normalized it to a range of [0 1]. We then computed the percent of variance explained by each measure from overall variance in our data. The measures were sorted based on their explaining variance of the overall variance within data. To disregard uninformative measures, a cut-off criterion of 90% were set to the cumulated sorted variance explained across measures. The Fano Factor was excluded based on this criterion from the k-Means clustering (Figure 4—figure supplement 2A).

Determining cluster numbers

Request a detailed protocol

We used a set of statistical indices to determine a range of number of clusters that best explains our data. These indices evaluate the quality of the k-means clustering (Ardid et al., 2015): Rand, Mirkin, Hubert, Silhouette, Davies-Bouldin, Calinski-Harabasz, Hartigan, Homogeneity and Separation indexes (Figure 4—figure supplement 1A). We then run 50 replicates of k-means clustering for k = 1–40 number of clusters. For each k, we chose the best replicate based on the minimum squared Euclidian distances of all cluster elements from their respective centroids. While validity measures were improved by increasing number of clusters, the benefit was slowed down for number of clusters more than 5, suggesting a range of at least 5–15 clusters that could be accountable for our dataset. We then used a meta-clustering algorithm to determine the most appropriate number of clusters: n = 500 realizations of the k-means (from k = 5 to k = 15) were run. For each k and n, 50 replicates of the clustering were run and the best replicate were selected. For each k and across n, we computed the probability that different pairs of elements belonged to the same cluster. To identify reliable from spurious clusters, we used a probability threshold (p>=0.9) and considered only reliable clusters with at least five neurons to remove those composed of outliers. From the diagonal matrix of pairing cells into the same clusters using the defined criterion (p>=0.9), clustering with 8 number of classes reached the highest number of cells grouped together (100%, Figure 4—figure supplement 1B). The final clustering was then visualized with a dendrogram based on squared Euclidean distances between the cluster centroids. We validated finally determined number of clusters using Akaike’s and Bayesian criteria which showed the smallest value for k = 8 (AIC: [−17712,–17735, −18476,–11114] and BIC: [−1.7437,–1.7368, −1.8109,–1.0747], for k = [6,7, 8,9]).

Validation of the identified cell classes

Request a detailed protocol

We used dataset randomization (n = 200 realizations) as in Ardid et al., 2015, to validate our meta-clustering analysis by computing two validity measures. First, In each realization, each of eight clusters were associated to the closest cell class in Figure 4A,B. From all realizations and for each cell class, the difference between the mean of all clusters that were associated to the same cell class with respect to the mean of all clusters that were not associated to that cell class is computed versus when the clusters were randomly assigned to the cell classes (Figure 2—figure supplement 4C). Second, we validated the reliability of cell class assignment using n = 200 realizations of a randomization procedure that calculated the proportion of consistently assigned cells to a class compared to other cells assigned to that class. The proportion of class-matching cells with respect to control was systematically higher than class-matching when using a bootstrap procedure with random assignment of class labels (Figure 2—figure supplement 4D). We further validated the meta-clustering results for each monkey separately. We validated the results, analogous to what is describe above. First, validation according to the distances of clusters for each monkey (Figure 4—figure supplement 2E). Second, validation according to the percent number of cells matches for each monkey (Figure 4—figure supplement 2F).

Correlation of local variation with burst index

Request a detailed protocol

The Local Variation (LV) measured how regular neighboring spike trains are, leading to higher values when neurons fire short interspike interval (ISIs) spikes (bursts) intermittent with pauses. We quantified how the LV correlated with the likelihood of neurons to show burst spikes. We calculated the burst proportion as: number of ISIs < 5 ms divided by number of ISIs < 100 ms similar to Constantinidis et al., 2002. To control for effect of firing rate on the measure, we normalized it by the firing rate that would have been expected for a Poisson distribution of ISIs.

We used burst-index computed for neurons and grouped neurons in PFC and ACC into two sub-groups, high burst proportion and low burst proportion (Log(BI)>0 and Log(BI)<0 respectively). We computed the proportion of neurons in each group that showed significant correlation with RPE (in ACC) and Choice Probability (in PFC). In PFC, 25% of high BI neurons and 27.5% of low BI neurons were significantly correlated with Choice probability. In ACC, however, 47% of high BI neurons and 35.2% of low BI neurons were significantly correlated with RPE. Chi-square test failed to show significant differences between two groups (low vs high BI) for proportion of significantly correlated cells with RPE (in ACC, p=0.15), and with Choice Probability (in PFC, p=0.75). The correlation of LV and BI is for all neurons is shown in Figure 1—figure supplement 2F.

Spike-LFP synchronization analysis

Request a detailed protocol

Adaptive Spike Removal method was used on wide-band signal to remove artifactual spike current leakage to LFP (details in Banaie Boroujeni et al., 2020a). We then used the fieldtrip toolbox on the spike removed data to compute the Fourier analysis of the local field potential (LFP). Spike removed signals were resampled with 1000 Hz sampling rate. For each frequency number, Fourier transform was performed on five complete frequency cycles using an adaptive window around each spike (two and a half cycles before and after the spike). We then computed the pairwise phase consistency (PPC) to measure spike-LFP synchronization.

To determine at which frequency-band single neurons showed reliable spike-LFP PPC, a permutation test was adapted and used to construct a permutation distribution of spike-LFP PPC under the null hypothesis of no significant statistical dependencies of spike-LFP phase locking were preserved between spike phases and across frequencies. Then, each bands of significant frequencies were identified and for each band the sum of PPC value (which is unbiased by number of spikes) was computed. We then determined the significance based on PPC band-mass.

To determine whether the spectrum of spike-LFP synchronization measure (PPC) contains peaks that are statistically significant, we used four criteria similar to Ardid et al., 2015. These criteria ensure to indicate reliable frequencies that show phase-consistent spiking. First, detected peaks had to be Rayleigh test significant (p<0.05), to reject the homogeneity hypothesis of the phase distribution. Second, each peak had to have PPC value greater than 0.005. Third, each peak had to have peak prominence of at least 0.0025 from its neighboring minima to disregard locally noisy and possibly spurious PPC peaks. Fourth, detected peaks had to have PPC value greater than 25% of PPC range.

Statistical analysis on the class-specific PPC peak distribution

Request a detailed protocol

To determine whether clusters show significant proportion of PPC peaks in a specific frequency band, 1000 samples with the same size to each class was selected from the population of neurons. For each sample, we computed the mean to construct a distribution of sample means under the null hypothesis that no class show proportion of PPC peak in frequency bands different than the population of samples. The distribution of peak proportion for each class was then compared with identified 95% confidence interval of the population of samples. This procedure was done separately for classes of neurons in PFC and ACC (Figure 6 and Figure 6—figure supplement 1).

Analysis of the firing onset-responses to the color onsets and error/reward outcome onsets

Request a detailed protocol

For each neuron, the spike density was computed using a gaussian window of 600 ms (std 50 ms) around the Cue onsets, Error outcome onsets and Reward onsets across trials. We then performed the z-score transformation of event onset aligned mean response of each cell over trials, by subtracting the pre-onset mean of spike density divided by its standard deviation (a time window of [−500 ms 0 ms] prior to the event onsets). To investigate class-specific event response, we used a permutation approach and randomly selected 1000 samples with a class size same as each class. We then constructed a distribution of mean samples under the null hypothesis that no class show event response different than sample population. Cell classes that showed significantly different response than the population were then identified in a duration that they show response more extreme than two standard deviation from the population of samples. We performed these tests separately for classes in area PFC and ACC and event onsets: Color-Cue, Motion-Cue, Error outcome, and reward outcome.

For Broad vs Narrow spiking cell comparison of event onset response, we randomly shuffled the label of neurons and constructed a distribution of 1000 times randomly sampled difference of mean of Narrow and Broad spiking cells. We then computed 95% CI of the population samples and computed the most extreme 5% of time courses from the 95% CI under the null hypothesis that Broad and Narrow population of neurons do not show significant mean difference responses to the event onsets.

Analysis of effect size of the firing onset-responses to the cue onsets and error/reward outcome event onsets

Request a detailed protocol

For effect size analysis of cell class-specific response to each of the onsets, we computed the mean difference of each cell class from each of 1000 randomly labeled samples divided by their pooled standard deviation to compute Cohen’s d for each randomly selected sample. At the end, we averaged over the 1000 unsigned Cohen’s d computed for each cell class. The procedure was done separately for ACC and LPFC classes and for Cue onsets and Error/Reward outcome event onsets. (Supplementary file 1).

Analysis of time-resolved spike-LFP coherence under different behavioral conditions

Request a detailed protocol

To analyze the spike-LFP phase synchronization of neurons for the trials with 50% lowest and the 50% highest reward prediction error (RPE) for ACC neurons, and for the trials with 50% lowest and 50% highest choice probability (p(choice)) for LPFC neurons we computed time-resolved spike-LFP pairwise phase consistency. First, we divided trials into two groups of high and low RPE and p(choice) values (trials were assigned based on their median value for each experimental session). Then, for each neuron, RPE, and p(choice)condition we extracted spikes and their phase synchronization to the LFP in different frequencies (4–80 Hz, 1 Hz resolution) by applying Fourier transform on a hanning-tapered LFP signal (±2.5 frequency cycles around each spike). Then we computed the PPC for moving windows of ±350 ms every 50 ms around the outcome onset (for RPE) and around color onset (for p(choice)). We included only neurons with at least 50 spikes across trials, using on average 44 (SE 2) trials. To control for spike number, we repeated the procedure 500 times with a random subsample of 50 spikes of a neuron for each window before computing the PPC. For each neuron, behavioral condition, and window we calculated the average PPC over the random subsamples.

Statistical analysis of time-resolved spike-LFP coherence for putative interneurons and broad spiking neurons

Request a detailed protocol

Statistics on the time-resolved coherence was computed in two steps. In the first step, we tested for each post-event time window the null hypothesis that N3-type neurons and broad spiking neurons showed similar spike-LFP synchronization strength after the event onset compared to the time windows prior to the event. To test this, we first normalized the time resolved coherence for each neuron to the baseline coherence (−850 ms to 0 ms) before reward or attention-cue onset (in ACC and PFC, respectively). We then randomly selected 1000 sample of neurons from the population with the same size as neurons in class N3 and broad cells under the null hypothesis that N3 class and broad spiking neurons do not show different synchronization pattern triggered by event onset compared with population. For each sample, we extracted the 95% CIs, and over the population of samples we extracted the most extreme 5% of the previously extracted CIs and set the final 95% multiple comparison corrected confidence intervals. We then found the average of normalized PPC values for N3 class and broad spiking neurons in a time period and frequency domain that were more extreme than the defined confidence intervals. The area of significance then was shown by black contours. In the second step, we asked whether N3 class neurons show different average synchrony strength over a time window of (0 ms 500 ms) aligned windows to the attention-cue onset (in PFC and for high and low Choice Probability conditions) and to the reward onset (in ACC and for high and low Reward Prediction Error conditions). We randomly selected 1000 sampgles, with the same size as N3 class, from broad spiking neurons and computed their average pre-onset normalized synchrony in the defined post-onset period. We then constructed the most 5% extreme values of 95% confidence intervals defined over 1000 samples and across frequencies under the null hypothesis that N3 class cells do not show different synchrony strength from broad spiking cells in the post-onset time period and across different frequencies. We set the confidence interval levels and selected frequency bands more extreme than the CIs as significantly different (multiple comparison adjusted alpha level = 0.05, Figure 7 and 8).

Analysis of spike-LFP synchronization controlled for event evoked LFP

Request a detailed protocol

This analysis controls that the synchronization results are not confounded by event evoked LFP signals. First, we extracted the LFP aligned to the color cue and the reward onset on each individual trial and averaged it in a −0.5 to 1 s window around the onset of the color cue and reward onset, respectively. We then removed the average event evoked LFP from individual trials. We then repeated the above-described synchronization and statistical analysis on the event evoked LFP subtracted trials. Subtraction of event-evoked LFPs did not change the results (Figure 7—figure supplement 4).

Statistical analysis of functional spike-LFP gamma synchronization for neuron types

Request a detailed protocol

We analyzed how distribution of PPC values for each e-type is different from the other e-type in high and low RPE/p(choice) conditions. For each area, we extracted average PPC value for each neuron and conditions in frequency range 35–45 Hz. We used Kruskal Wallis test to see whether neuron types show different synchronization patterns. Lastly, we performed multiple comparison (Tukey-Kramer corrected) to see whether any of the classes is different from the others. These analyses were done separately for each area and each behavioral condition. No significant differences were observed between more certain conditions (high p(choice) and low RPE). Consistent with time resolved results, only N3 class showed stronger gamma synchrony in low p(choice) condition in LPFC, and high RPE condition in ACC (Figure 7—figure supplement 1; Figure 7—figure supplement 3).

Analysis of narrow vs. broad and cell class-specific firing correlations with reversal learning

Request a detailed protocol

To investigate whether firing rate of cells correlate with the learning state, we performed correlation analysis between firing rate of single neurons and model parameters: probability of chosen stimulus (choice probability, p(choice)), and positive Reward Prediction Error (RPEpos). For the correlation analysis, we excluded neurons that had less than 30 trials of neural activity. For each neuron, the event onset response was normalized to the mean of all trials’ pre-onset firing (in a period of −0.5 s to the event onset) and was divided by the standard deviation of all in that period. We then computed for each neuron the Spearman correlation coefficient between p(choice) values and then normalized firing rate in a moving window ±200 ms with sliding increments of 25 ms relative to the Color-Cue onset. We used the same procedure for the reward-onset mean of normalized firing rate and RPEpos values. To test whether narrow and broad spiking neurons correlate their firing rate differently to model values, we randomly shuffled cell labels and constructed a distribution of 1000 differences of the mean correlations of randomly assigned neurons to the broad and narrow groups under the null hypothesis that there is no difference in correlations depending on the spike waveform group. We then computed the most extreme 5% of the sample difference of means through their time course and identified the 95% confidence interval to test our null hypothesis. We also tested whether cells of different cell classes showed different correlations of firing rate and p(choice) or RPEpos. using the Kruskal-Wallis test considering cell class as the grouping variable. To test which class shows correlations different than the population mean, we randomly shuffled cell class labels 1000 times and computed the mean difference between each randomly labeled cell class and the population. We then constructed a distribution of mean difference samples under the null hypothesis that no class shows a mean correlation different from the population mean. We then computed the top 5% of samples and identified 95% confidence interval. Classes that showed a mean difference of correlation to the population more extreme than the identified CI were marked as significant. All mentioned procedures were performed separately for neurons in area ACC and PFC and for both, p(choice) or RPEpos values. In addition to the correlations of firing rate and p(choice), and firing rate and RPEpos, we also calculated the time resolved correlation of neurons firing rate with number of trials since reversal. We found that B-type and N-type neurons in LPFC and in ACC did not change their firing differently as a function of the raw trial count since reversal. The lack of correlation with trial number was true for the color cue period and the reward period of the task (data not shown).

Training classifiers for predicting cell classes from their correlations with learning variables

Request a detailed protocol

We used a machine learning approach to test how accurately cells can be labeled to a cell class based on their functional properties. For training classifiers, we used correlation of cells firing rate and RPE/p(choice) separately for areas LPFC and ACC. We test whether functional correlation of cells activity in a class allows to reliably classify them into the true class label (from the k-means clustering) or in alternate classes. We used multiclass Support Vector Machine (SVM) with one to one comparison of identified cell-classes with 10 folds of cross validation. A vector of correlation values (each element representing one neuron) was used along with a vector of cluster labels (from our clustering results) to train the SVM. The classifier used a Gaussian radial basis function kernel with a scaling factor of 1. For each classifier, only classes were considered that contained ≥5 cells and each unique cluster was present in all folds. As classes N1 and N2 did not meet the criteria, we excluded them from the classifier and instead randomly distributed them to other classes (weighted by the size of classes) as an internal noise factor. For each learning measure (RPE and p(choice)) and for each area (LPFC and ACC), we subsampled each cluster with a size equal to the half of the minimum size of clusters ensuring an equal cell number from clusters in each subsample. We constructed the confusion matrix as the ratio of outcome matrix to the total count across all 1000 subsamples test and performed a binomial test (FDR-corrected p<0.05) to find cells of the confusion matrix that are significantly greater than the chance level (chance level here was defined by one divide by the number of classes). Prediction of classifiers on correlation of LPFC rate and RPE, and ACC rate and p(choice) were closed to the chance level (not shown). However, in ACC N3 class was predictable with an accuracy of 0.34 from its correlation with RPE, and in LPFC, N3 class was predictable with an accuracy of 0.31 from its correlation with p(choice) (Figure 5—figure supplement 3).

Analysis of the information coding cells for the rule identity and target location

Request a detailed protocol

To determine what proportion of neurons relative to the Color-Cue onsets as well as Reward/Error outcome onsets systematically carry information about the rule identity (Red vs. Green), or target location (Left vs. Right), we considered neurons we had at least 20 trials for each condition. We used a moving window of ±200 ms with sliding increments of 25 ms relative to the Cue-onset or Error/Reward outcome onsets. For each window, we performed the nonparametric rank sum test between the two of conditions under the null hypothesis that neurons do not fire preferentially different to a specific color or location (Figure 2—figure supplement 2). For Narrow and Broad spiking neurons, we computed the proportion of neurons that showed statistically significant firing rate (p<0.05) to each condition. We then randomly shuffled the proportion amounts of significantly different firing neurons over the time course and computed 95% CI under the null hypothesis that each group of neurons do not show proportionally different number of neurons compared to the pre-onset population of proportion values.

Analysis of cell class firing statistics measures

Request a detailed protocol

For each of firing statistic measures (firing rate, local variation, and coefficient of variance), we performed nonparametric Kruskal-Wallis test with cell class as grouping variable to test for a main effect of cell class on each firing statistics. We then performed rank sum multiple comparison for pairwise comparison of cell class differences (p<0.05).

Analysis of PPC strength for learning correlated cells vs non-correlated cells

Request a detailed protocol

We grouped our neurons based on their waveform (Narrow vs Broad) and then further grouped them into subgroups of those that their firing after the onset were significantly correlated with learning values and those that were not (p(choice) x Firing Rate after Cue-onset in PFC, and RPEpos x Firing rate after Reward-onset in ACC). For each waveform-grouped neuron, we randomly shuffled their labels and computed the difference of PPC peak proportions between neurons that their firing rate were significantly correlated with learning state and those that were not significantly correlated. We constructed a distribution of 1000 randomly selected samples of difference of proportions of PPC peaks under the null hypothesis that for each waveform grouped neurons there is no significant difference in the proportion PPC peaks for neurons that their firing rate were significantly correlated with learning values and those that were not significantly correlated. We then identified the most extreme 5% of the peak proportion difference and computed 95% CI over the population of samples.

Appendix 1

1. Overview of circuit modeling

We constructed circuit motifs to account for our experimental observation that gamma synchronization characterized cue and reward onset triggered activity when choice probabilities were low (near ~0.5) and reward prediction errors relatively high. These circuit motifs provide a proof-of-concept that the empirical observations can follow from biologically plausible motifs. These circuits motifs also provide predictions which can be tested in future studies.

One circuit motif is comprised of two populations of excitatory cells (E1 and E2) and one population of interneurons (I). This ‘E-E-I’ motif (Figure 9A, Figure 9—figure supplement 1) was constructed to test the gamma to beta synchronization switch that the N3 e-type interneuron population in LPFC showed in the empirical analysis. The second circuit motif is comprised of two populations of inhibitory neurons (I1 and I2) and only one population of excitatory neurons (E). This ‘E-I-I’ motif (Figure 9B, Figure 9—figure supplement 2) was constructed to test the theta to gamma switch that the N3 e-type interneuron population in ACC showed empirically.

2. E-E-I circuit motif realizing the switch from gamma to beta frequency synchronization

2.1 E-E-I Network architecture

We simulated a simple E-E-I model with two excitatory populations recurrently connected with one inhibitory population that is conceived of reflecting the interneurons of the N3 e-type (Figure 9—figure supplement 1B, Figure 9A). Each population was represented by a two variables, a firing rate r modeled after the work of Hahn and colleagues (Hahn et al., 2020), and a synaptic variable s modeled as in Keeley et al., 2017. The full description of the model is given below. Both E populations are reciprocally connected to the I population. We assume that the E cells receive input representing the aggregate values of the objects. We model the situation that the value of object one increases by increasing the drive to the E1 population, whereas concomitantly we reduce the drive to E2, such that their sum remains the same.

2.2 Model equations for the E-E-I circuit model

The activity of each population is represented by two vectors r=(rE1,rE2,rI), representing the firing rate and s=(sE1,sE2,sI), representing the synaptic inputs. They satisfy the following coupled differential equations

τdrdt=-r+αGWs+I+Inoise

And

τsyndsdt=-s+γF(r)(1-s)

Where τ=τE1,τE2,τI=(1.5385,1.5385,1.5385) is the firing rate time scale, τsyn=τsyn,E1,τsyn,E2,τsyn,I=(2.3077,2.3077,15.3846) is the synaptic time scale, α=αE1,αE2,αI=2.5,2.5,5 is a scaling variable to adjust the mean firing rate of each population, γ=γE1,γE2,γI=(4,4,3) is the scale of synaptic onset rate, I=(IE1,IE2,II) is the drive for each population, and W is a 3 by 3 connection strength matrix:

W=2.00-2.641402.0-2.64143.03.0-0.1

We write IE1=I0+Imaxx and IE2=I0+Imax(1-x), where x varies between 0 and 1. Here II=0, I0=0.8, Imax=0.4. The noise current Inoise had a standard deviation of 0 for the simulations shown in this note. It can be used to induce transient oscillations when there is a stable fixed point with eigenvalues that have an imaginary part.

The firing rate response function is

G(x)=x1ex,

and the one for the synaptic inputs is

Fr=11+exp(θ-rk)

Here θ=θE1,θE2,θI,=(5,5,10) is the activation threshold for the synapse and the k=kE1,kE2,kI=(0.5,0.5,1.0) is the sharpness of the synaptic activation function.

2.3 Simulation results of E-E-I model

When the drive to E1 increases, the activity of population E1 increases whereas that of E2 decreases, with the level of I activity varying only moderately with E1 drive (Figure 9—figure supplement 1A). The circuit executes a soft version of the winner-take-all mechanism, the E population with the largest drive suppresses that of the one with the lower drive. We chose parameters such that the network displayed oscillations by first finding a Hopf bifurcation, using a continuation approach implemented with the software auto07 (Doedel et al., 1991). A Hopf bifurcation is signaled when the Jacobian at the fixed point has two complex conjugate eigenvalues of which the real part becomes positive at the bifurcation (Strogatz, 1994). For small amplitudes, the oscillation frequency is directly related to the imaginary part of the eigenvalues. Stable oscillations appear in the model with the frequency increasing from beta for low E1 drives to gamma when the E1 and E2 is similar (Figure 9—figure supplement 1B). The power of these oscillations follows more or less the mean activity of each population.

3.3. E-I-I circuit motif realizing the switch from gamma to theta frequency synchronization

3.1 E-I-I network architecture

We constructed a second model to account for the switch between theta and gamma synchronization (Figure 9—figure supplement 2, Figure 9B). This model has two types of interneurons (the I1 and I2 populations) and one E cell population (E), reciprocally connected. They form two PING-type motifs similarly to Domhof and Tiesinga, 2021, which focused on beta/gamma frequency switches (see 4. Discussion). The first motif with I1 forming a fast circuit, generating gamma, the second one together with I2 forming a slow circuit for theta. Each motif can create its own oscillation, but when one circuit is dominant it takes over the other circuit and imposes its frequency. We assume that interneuron population I1 corresponds to PV neurons because they have a faster dynamics. We simulate the case of rewarded trials, which means that the RPE is low when the expected value is high, whereas when the RPE is high the expected value is low. We further assume that the value-associated drive to I1 is part of a disinhibitory circuit, that is, it is an inhibitory input to I1 that reflects the expected value. In other words, when RPE varies from low to high values, the drive to I1 varies from low to high.

3.2 Model equations for the E-I-I circuit model

The network is simulated using the same modeling framework as in 2.2 (above), but now there are two I populations, I1, I2, and only one E population, hence the vectors are changed in an obvious way: r=(rE,rI1,rI2); and s=(sE,sI1,sI2), τ=τE,τI1,τI2=(1,1,5);τsyn=τsyn,E,τsyn,I1,τsyn,I2=(1.5,5,45), α=αE,αI1,αI2=2.5,5,5, γ=γE,γI1,γI2=(4,3,3), and I=(IE,II1,II2). Here II1=I01+Imax1x with I01=-3 and Imax1=3; IE=0.71646; II2=-0.3. The noise current Inoise has a standard deviation of 0. W is the following 3 by 3 matrix:

W=2.0-1.3207-1.32073.0-0.103.00-0.1

The response functions G and F are identical to those specified in model 1 (see 2.2), with for F the parameter values: θ=(θE,θI1,θI2)=(5,10,10) and k=kE,kI1,kI2=(0.5,1.0,1.0)

3.3 Simulation Results of E-I-I model

We again used auto07 to find Hopf bifurcations, from which we started the exploration of the network dynamics. When we increased the drive to I1 the firing rate of I1 increased (Figure 9—figure supplement 2A) and the oscillation frequency increased from around the theta band to gamma frequencies (Figure 9—figure supplement 2B).

4. Discussion of circuit motifs, relation to other models and experiment

The E-E-I motif provides a proof of principle for the link between diversity of input and oscillation frequency (see Figure 9—figure supplement 1 and Figure 9A). We increased the drive to E1 and reduced it to E2 in such a way that the sum remained constant and studied the oscillation frequency of the I population. The situation with high drive to E1 and low drive to E2 (and vice versa) corresponds to a situation with diverse inputs which happens in a reversal block after learning of values is completed (in the ‘steady state’) and one object has high value and the other object a low value. In this regime oscillations are prominent in the beta frequency range (Figure 9—figure supplement 1A). But when the drive of the E1 and E2 populations is similar, indexing the situation of low p(choice), i.e. when it is near 0.5, the I population increased its oscillation frequency to the gamma range (Figure 9—figure supplement 1). Hence, in the model, competition between two similarly-valued objects that results in a low choice probability is indexed by gamma oscillations of the inhibitory cell population, while otherwise beta synchrony predominates. This result matches the core oscillatory signature we observed in the LPFC around the color cue onset. It suggests that the transient gamma increase of the N3 e-type might reflect the gating of diverse inputs as has been suggested by larger-scale modeling of similar circuit motifs (Buia and Tiesinga, 2008; Sherfey et al., 2018; Sherfey et al., 2020).

The second circuit that implemented a E-I-I model provides a proof of principle for the link between the increased activation of a ‘fast’ interneuron population (I1) and a switch from theta to gamma oscillations. Here, theta synchronous activity driven by the I2 neurons corresponds to low RPE trials (after learning of values is completed), in which a reward R is received and the value V of the chosen stimulus was relatively high (a high V and a large R, the RPE is computed as R - V) (see Equation 2 in Materials and methods of main text) (Watabe-Uchida et al., 2017). In contrast, the gamma synchronous state that emerges with larger drive to the I1 neurons in the model correspond to high RPE trials, in which a reward R is received, but the value of the chosen stimulus was relatively low (low V). This circuit motif is plausible when one assumes that the I1 neuron population is disinhibited when the chosen stimulus value is low. Such a disinhibition can be achieved by lowering the drive to I2 cells, or by assuming a separate disinhibitory circuit involving other inhibitory cells. In the model simulation we only explored the former assumption. In summary, the E-I-I motif reproduces the switch of gamma to theta synchronization we observed during learning in ACC N3 e-type neurons. At the functional level, the circuit suggests that the emergence of gamma activity in this network indexes the detection of a discrepancy between the received reward (as one source of input) and the chosen stimulus value (as another source of input).

The oscillation frequency observed in these two models was not directly related to biophysical time scales, such as, synaptic or membrane time scales or rate constants for the opening and closing of ionic channels, as would be the case in models based on Hodgkin-Huxley type channels (Tiesinga et al., 2001), rather it was achieved by the product of the two effective time scales (firing rate and synaptic) in the model. Therefore, these models serve as a proof of principle, indicating how populations may be wired up to produce oscillations with different frequencies, but they can not make conclusive predictions regarding the dynamics of the underlying interneurons, that is, whether they are PV or SOM, or what type of spike patterns they produce. For this type of insight proper network models composed of biophysical models need to be constructed. Nevertheless, we think it is reasonable to identify faster interneuron populations with PV+ interneurons given prior modeling studies (see next paragraph), and thereby putatively link them to the N3 e-type (see also Discussion of the main text).

Similar reservations hold for the mechanism by which oscillations are generated, such as for instance ING versus PING (Whittington et al., 2000; Tiesinga and Sejnowski, 2009; Tiesinga, 2012). Model 1 is functionally a soft winner-take-all model, but the oscillations could emerge by way of an ING motif, potentially heterogeneously activated, when individual interneurons receive a different mix of inputs from E1 and E2. Previous simulations by us and others (Wang and Buzsáki, 1996; White et al., 1998; Tiesinga and José, 2000; Tiesinga and Sejnowski, 2004) show that this would be feasible. Model 2 is comprised of two competing E-I motifs, which our recent simulations indicate (Domhof and Tiesinga, 2021) could implement switches when one motif is more strongly activated than the other. Our simulations do not exclude the possibility that the I1 population synchronizes by the ING mechanism, but it would in our opinion represent a less parsimonious explanation.

The involvement of ING and PING mechanisms for beta and gamma oscillations are well-established. For theta oscillations other mechanisms have also been proposed, for instance by way of intrinsic membrane resonance (Hutcheon and Yarom, 2000) in the pyramidal cells (Tiesinga et al., 2001) activated by neuromodulatory tone or in a specific type of interneuron (Rotstein et al., 2005), which do need to be reciprocally connected to a fast interneuron for the theta oscillations to emerge. In other models slower synaptic time scales were instrumental (White et al., 2000). As resonance mechanisms were not explicitly modeled, our model simulations do not directly speak to whether the empirical findings rely on resonance properties. We can therefore not conclusively exclude them until a more comprehensive modeling study is conducted that not only takes into account synaptic time scales but also the intrinsic dynamics of all the involved neuron classes together with their task-dependent firing rate dynamics. A comprehensive review of cortical rhythms and their mechanisms can be found in Wang, 2010.

Data availability

Source neural data and matlab scripts for reproducing the main figures with the data are included in the manuscript as supporting files Source Data 1, 2, and 3.

References

  1. Book
    1. Strogatz SH
    (1994)
    Nonlinear dynamics and Chaos: with applications to physics, biology, chemistry, and engineering
    Reading, Mass.: Addison-Wesley Pub.
  2. Book
    1. Sutton RS
    2. Barto AG
    (2018)
    Reinforcement Learning: An Introduction (2nd Edition)
    MIT Press.

Decision letter

  1. Saskia Haegens
    Reviewing Editor; Columbia University College of Physicians and Surgeons, United States
  2. Michael J Frank
    Senior Editor; Brown University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This paper will be of interest to system neuroscientists studying reinforcement learning, as well as neuroscientists in the field of brain rhythms. The work sheds new light on the specialization of individual cell types in the cortex of animals engaged in a challenging task. The authors combine many different techniques (single-cell recordings, clustering of cell types, behavioral modeling, spike-field coherence) in order to understand the differential contributions of subclasses of cell types to cortical computations during a reversal-learning task. The questions asked by the authors in this paper are interesting and their treatment is thorough, with many controls. As a result, this work is a valuable addition to the field.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "Interneuron Specific Gamma Synchrony Indexes Cue Uncertainty and Prediction Errors in Prefrontal and Cingulate Cortex" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The reviewers have opted to remain anonymous.

We are sorry to say that, after consultation with the reviewers, we have decided that your work will not be considered further for publication by eLife. While we found the work overall potentially interesting, and the reviewers each found merit in particular elements, several major concerns were raised regarding key components (including the computational model and the focus on the N3 subtype). You will find these detailed in the reviews attached below.

In its current state, the paper tries many things at once (neuron subtype clustering, reward processing, computational modeling), which by itself is laudable, but as it stands it is coming short at tying these aspects together.

We are not inviting a revision because the amount of work we consider required is too extensive. However, given there was considerable interest in elements of the work, we want to point out you are free to decide to rework the current manuscript into a new version and submit this new manuscript to eLife. Note that if you choose to submit a new manuscript it would go through the regular process again (i.e., consideration by editors) and if selected to be sent out for review could potentially go back to the same reviewers and/or new ones.

Reviewer #1:

Boroujeni et al. recorded extracellular spikes from single neurons in brain areas LPFC and ACC in two awake behaving macaques that were performing a reward reversal learning task. They classified the recorded neurons into various subtypes, and investigated how neuronal activity in these different subtypes related to the variables of the behavioural task.

The paper's clear and primary strength is the classification of extracellularly recorded neurons into broad- and narrow-spiking neurons, and even further into subtypes of these two classes. While a split based purely on spike waveform shape into broad- and narrow-spiking is relatively common, the cluster-based classification into subtypes based on various additional parameters like spike variability is novel and potentially illuminating. The authors furthermore convincingly demonstrate that the recorded narrow-spiking neurons (often labelled "putative inhibitory interneurons") are indeed likely inhibitory in nature, by showing that the net effect of a spike in these cells on the surrounding population spike rate is negative. The analysis choices in this part of the paper were clear, well-motivated, and well-presented.

However, the bulk of the paper is taken up by the relationship between neuronal spiking and variables from the behavioural task, specifically choice probability (p(choice)) and reward prediction error (RPE). Here, the conclusions appear not backed up by the data, for several reasons.

First of all, the authors only present results for correlations with RPE in the reward window, and results for correlations with p(choice) in the stimulus windows. One of the main conclusions of the paper is that LPFC neurons code for p(choice) whereas ACC neurons code for RPE. However, correlations with RPE in the stimulus windows and p(choice) in the reward window are never shown. Furthermore, the authors demonstrate that, purely given the task structure, RPE and p(choice) are almost perfectly negatively correlated (r = -0.928, Figure S4). It is therefore very possible that the crucial split is not between p(choice) and RPE as the determinant of neural activity, but simply the time window in which these are analyzed.

Second, the authors present a "circuit architecture" that might account for the observed results. In the Results, this model is presented as though it were a computationally implemented biophysical neural circuit model that makes predictions that are in line with the observed data. I cannot find details of the implementation of such a model in the Methods, which makes the status of the predictions here unclear. It is not explained why two equally-valued objects would lead to gamma synchronization, whereas two objects of unequal value lead to beta synchronization (the key conclusion derived from the model). This appears to depend on total input strength, but it is hard to see why 0.5 + 0.5 (equal value, numbers provided by authors) would result in higher input than 0.8 + 0.2 (unequal value, again numbers from this paper, Figure 9). These choices, and others, appear arbitrary. In general, the description of the model in Results reads more like an interpretation/Discussion section than an outline of model-derived Results.

Third, the presented empirical evidence for narrow-spiking cells (or, more specifically, the N3-subtype) engaging preferentially in gamma-band synchronization, whereas broad-spiking cells engage preferentially in beta-band synchronization, is modest. Interneuron engagement in gamma rhythms is expected from the literature, of course, but in the present dataset this is less clear-cut. In particular, the spectral peaks in Figure 6C are quite similar between broad- and narrow-spiking, and labelling the former "beta" but the latter "gamma" requires a more thorough analysis than is now presented.

Fourth, there are some issues with reporting, where occasionally results are only reported for the narrow-spiking cells and not for the broad-spiking cells, or it is unclear whether a stated result holds for all or just a subset of cells, etc.

Finally, all results are shown aggregated over two animals, while it is important to know how the key results hold in the two animals separately.

I mention some additional recommendations here.

At the very least, correlation analyses for both p(choice) and RPE should be shown for all time windows, to allow a proper assessment. If the authors indeed wish to maintain the hard claim of a dissociation ACC<>RPE and LPFC<>p(choice) this should explicitly be tested by e.g. directly comparing the correlations with the two behavioural variables.

The model should be specified in much more detail. Specifically, the assumptions built into it should be clearly defined, and the quantitative predictions derived from it should be presented.

I understand that the data are not yet publicly released, as others from the same lab are still working on the same data (which is common in the field). However, I would urge the authors to make the source code for all reported analyses publicly available already, to greatly improve transparency and replicability. ("Upon reasonable request" is not sufficient for this goal.)

In general, the narrative could be streamlined a bit, as it currently stands the manuscript is hard to read.

Reviewer #2:

This paper studies the role of lateral prefrontal cortex (LPFC) and anterior cingulate cortex (ACC) in reversal learning. The authors suggest that LPFC plays a role in computing the probability that the animal will make a certain choice (termed choice probability), whereas ACC signals the reward prediction error. Interestingly, narrow spiking cells (putatively inhibitory neurons also known as fast spiking units) had a higher correlation with these task-relevant parameters, compared to broad spiking cells (putatively excitatory neurons also known as regular spiking units).

Next, the authors define electrophysiological cell types (termed e-types), based on spike waveform and firing patterns. The narrow spiking cells are subdivided into 3 subclasses, termed N1, N2 and N3. Notably, the same subclass of narrow spiking cells, N3, had a correlation with choice probability in LPFC and a correlation with reward prediction error in ACC. Neither of the other narrow spiking subtypes had a significant correlation with either parameter in either area.

In the final part of the paper, the authors examine the phase-locking behavior of these N3 cells to the local field potential (LFP). They find that in LPFC, N3 cells phase lock to gamma (35 – 45 Hz) during the initial learning stage shortly after rule reversal, but as learning progresses and performance reaches a new plateau, their phase locking switches to the beta-band (15 – 30 Hz). Perhaps most remarkably, the N3 cells in ACC showed a similar reversal learning stage dependent phase locking behavior; to elaborate, they phase-locked to gamma only when the reward prediction error was high (i.e., shortly after rule reversal).

These results are generally well supported by rigorous statistics and sophisticated analyses. However, there are several weaknesses. First, while the claim that LPFC encoded choice probability is well supported, the claim that ACC encodes reward prediction error is not as well substantiated. As seen in Figure 3, percent neurons showing significantly correlation between their firing rate and reward prediction error is not very different between LPFC and ACC, and quite similar between broad spiking and narrow spiking units within ACC.

Second, the authors build a reinforcement learning model to calculate "Choice Probability", which quantifies the probability that the animal will select the rewarded stimulus. According to this definition, choice probability should dip upon reversal, and rise to a new plateau after several trials. However, this metric is fairly unintuitive, not to mention in conflict with existing nomenclature (e.g., Nienborg, Cohen and Cumming 2012). It would be helpful to have an accompanying plot of how the firing rate and phase locking behavior of each neuronal type changes as a function of trials after reversal.

Third, the extent to which choice probability encoding neurons and reward prediction error encoding neurons in each area falls into a specific e-type is not shown.

Undoubtedly, it is noteworthy and remarkable that N3 is the only e-type that shows a positive correlation with choice probability in lateral prefrontal cortex and a positive correlation with reward prediction error in ACC (Figure 5). But do all choice probability encoding neurons in LPFC and reward prediction error encoding neurons in ACC fall into the N3 e-type?

Further, the task-dependent phase locking behavior of e-types other than N3 are not shown. Given that N3 is the only NS e-type that shows a relationship with task-relevant parameters, I would expect the task learning dependent phase-locking behavior to also be unique to N3, but this result is not presented in this paper.

Finally, the conceptual model in Figure 9 captures the results presented in this paper and gives rise to testable predictions. It seems that some predictions of this model should be testable with the presented data. For example, the prediction that in LPFC, broad spiking cells fall into two functional categories, whereas N3 cells are more functionally homogeneous, would be an interesting prediction to test. Further, the prediction that in ACC, broad spiking cells encode reward whereas N3 cells encode reward prediction error is easily testable and would strengthen the conclusions of this paper.

The main finding of this paper, that a specific electrophysiological subclass of narrow spiking cells serve important roles in a reversal learning by preferentially phase-locking to gamma band LFP, would be of broader significance and impact if this finding could be generalized to other brain regions, behavioral tasks and model species. That said, there are already several papers in the literature that define e-types. Specifically, Markram et al. (2015) define 11 e-types; Gouwens et al. 2019 define 6 e-types that constitute narrow spiking cells (referred to as fast spiking cells in Gouwens et al). For sake of future efforts to study e-types and their functional roles, it would be important to reconcile these disparate definitions of e-types.

Moreover, there are at least two other papers showing that subclasses of narrow spiking neurons have different relationship with gamma (Shin and Moore 2019; Onorato et al., 2020). It would be very interesting and important to know whether the 3 narrow spiking e-types discussed in this paper match up with the subclasses in the two aforementioned papers.

In sum, this paper is a valuable addition to the reinforcement learning literature as well as neuronal cell types and neural oscillations literature. Some additional analyses could strengthen the conclusions of this paper. It is unclear how the e-types defined in this paper will tie into other neuronal categorizations in recent literature. This link to prior work will be important for broader significance.

Comments for the authors:

I. Comments on Figures

1. Figure 2 and Figure S6 shows the PSTH aligned to Feature 1 and Feature 2 based on the cue order (Motion first vs Color first). It would be highly relevant to also show the PSTH aligned to Feature 1, Feature 2 and Reward based on behavioral outcome (correct vs incorrect, and there are at least 3 different types of error outcomes; please see my comment III-2 in Comments on Methods below for elaboration).

In particular, PSTH aligned to reward conditioned on behavioral outcome is crucial for interpreting Figure 3.

2. Figures 2 and 3: The correlation between firing rate and Choice Probability / RPE is interesting, but not very intuitive. It would be helpful to have a plot of Choice Probability and Reward Prediction Error as a function of trials since reversal, as well as the firing rate for each cell type and brain area as a function of trials since reversal. This way we can see whether LPFC NS firing rate after color cue onset tracks Choice Probability, and whether ACC NS firing rate after reward tracks RPE.

3. Figure 4B firing rate unit is missing both the figure and in the main text.

Figure 4C rastergram firing rate seems massively different from the average firing rate in 4B? e.g., for Figure 4C rastergram for N1, there seems to be ~5 spikes per 100ms, which would be ~50Hz, but the average firing rate for N1 is 4Hz?

Also, please discuss why the narrow spiking firing rate is so low (assuming the firing rate unit was Hz, mean firing rate is <2Hz for N2 and N3). Narrow spiking firing rates have typically been reported to be ~10Hz in vivo.

4. Figure 5: It is remarkable that N3 is the only e-type that shows a positive correlation with choice probability in LPFC and a positive correlation with reward prediction error in ACC. To what extent do choice probability encoding neurons and reward prediction error encoding neurons in each area fall into a specific e-type? I would like to know whether a neuron's e-type is predictable from task-dependent functional properties of the neuron.

5. Figure 6C: suggest plotting N3 in the same plot as Broad Spiking and Narrow Spiking units such that the magnitude can be compared more easily.

In addition, please clarify what the y-axis of Figure6c means (Peak densities of spike-LFP synchronization (PPC)). Is this simply the average PPC spectra? Or normalized for each unit in some way? I would recommend plotting the former, such that it is possible to compare which e-types have the best locking properties to which frequency band.

6. Figure 7 and 8: It's very interesting that initially after reversal, N3 locks to gamma but later, as performance reaches a new plateau, N3 locks to beta. If you plot trial since reversal on the x-axis, and plot the peak of PPC spectra (averaged across N3 cells) on the y-axis, do you see a gradual change in peak frequency or is it more of a step function change after each reversal? Relatedly, if you plot the histogram of PPC spectra peak frequency across N3 cells, is it a bimodal distribution (one peak in beta and another peak in gamma) or is it unimodal?

7. It would be interesting to know the behavior-dependent phase locking of other e-types as well. I suggest adding Figure 7 and 8 C and F for all e-types as a supplemental figure.

8. Were LPFC and ACC recorded simultaneously? If so, it would be very interesting to see if inter-area coherence mimics the changes in PPC. For example, does the gamma band coherence go up in the first few trials after reversal, followed by an increase in beta band coherence as behavioral performance plateaus?

9. Figure 9 outlays a really nice hypothesis that gives rise to testable predictions. Some of these predictions are testable within the data presented in this paper. I think it would significantly strengthen this paper if some of these predictions could be tested:

Figure 9 hypothesizes that in LPFC, Broad Spiking neurons should encode Value predictions; e.g., red-selective neurons that, after learning, fire more when red is being rewarded compared to when green is being rewarded. These Value-predictive neurons should fire similarly during learning, and is perhaps even predictive of the animal's choice on a trial-by-trial basis (e.g., on trials that red-selective neurons fired more during learning, the animal saccades according to the red stimulus). In contrast, N3 neurons should show no such Value-predictive behavior. Is there evidence of such prediction in the data?

Relatedly, Figure 9 hypothesizes that in ACC, Broad Spiking neurons encode reward, whereas N3 encode RPE. According to this prediction, N3 activity should be higher for "surprise correct" trials shortly after reversal, and go down as performance plateaus, whereas Broad Spiking neurons should be excited by reward the same amount regardless of whether it is shortly after reversal or after behavioral performance has reached plateau. Is this seen in data? I think this would be made clear if the PSTH aligned to reward were plotted, as suggested in Comment 1.

II. Comments on Main Text

1. "We next asked whether the narrow spiking, putative interneurons that encode p(choice) in LPFC and RPE in ACC are from the same electrophysiological cell type, or e-type (Markram et al., 2015)."

There are ~11 e-types described in Markram et al., 2015. Further, Gouwens…Koch 2019 NN describes ~6 sub-e-types of Fast Spiking cells. I recommend the authors to speculate on how previously reported e-types match up with the e-types described in this paper.

2. "Prior studies have suggested that interneurons have unique relationships to oscillatory activity (Cardin et al., 2009; Vinck et al., 2013; Voloh and Womelsdorf, 2018; Womelsdorf et al., 2014a),"

I suggest adding Chen…Zhang 2017 Neuron to this list of references.

3. Discussion section: There are at least two other papers showing that subclasses of narrow spiking neurons have different relationship with gamma (Shin and Moore 2019 Neuron; Onorato…Vinck 2020 Neuron). It would be an interesting addition to the Discussion section to speculate on whether the 3 narrow spiking e-types discussed in this paper match up with the subclasses in the two aforementioned papers.

III. Comments on Methods

1. In general, the Method section is not consistent about referring to relevant figures for the analyses being described. It would really help the reader if the analyses that went into each figure were clarified: e.g., "Statistical Analysis of time resolved spike-LFP coherence for putative interneurons and broad spiking neurons (Figure 7, 8)"

2. Task design: "Color-reward associations were reversed without cue after 30 trials or until a learning criterion was reached, which makes this task a color-based reversal learning task. "

It seems that a strategy that a monkey might employ would be to count the number of trials after reversal to anticipate when the next reversal would happen, which would rely on a different mental strategy than reversal learning tasks where the reversal points are not predictable. Is there any behavioral evidence that would discount the possibility that the monkeys are counting?

"Hence, a correct response to a given stimulus must match the motion direction of that stimulus as well as the timing of the dimming of that stimulus."

In this task, there appears to be one way to be correct, but several distinct ways of being incorrect. First, the monkey could be incorrect in both the timing and the saccade direction. Second, the monkey could be correct with the timing but incorrect with the direction. Third, the monkey could be correct with the direction but incorrect with the timing. The third outcome could be further subdivided into premature response versus late response. The reason why a monkey might make each mistake is different. Only the first scenario supports the possibility that the monkey thought the other color was being rewarded, e.g., shortly after reversal. It would be interesting to know the proportion of each error type as a function of trials since reversal. Furthermore, I would expect the negative reward prediction error to be most prominent in the first type of error. Hence, it would make sense to me if only the first type of error was considered when calculating choice probability and reward prediction error.

3. "Here, we use this model to estimate the trial-by-trial fluctuations of the expected value (EV) for the rewarded color and the choice probability (CP) of the animal's stimulus selection. EV and CP increase with learning similar to the increase in the probability of the animal to make rewarded choices, causing all three variables to correlate (Figure 4E, F)."

Figure 4 does not have E-F panels.

4. Behavioral analysis: I could not find a formal definition of Choice Probability and Reward Prediction Error anywhere. I assume Equation 4 defines Choice Probability, while Rt-Vt defines RPE? I suggest making these definitions clear in the Methods, as well as the main text and the figure legend.

Choice Probability is abbreviated in at least three different ways throughout the manuscript (e.g., p(choice), CP, CHP). Please be consistent.

Note on terminology: Choice Probability commonly refers to the relationship between the activity of individual sensory neurons and the animal's behavioral choice (see Nienborg, Cohen and Cumming 2012 ARN). The duplicate terminology may be confusing for some readers. I suggest using a different term (e.g., Probability of Choice).

5. "We then quantified the log-likelihood of the independent test dataset given the training datasets optimal parameter values."

Where is this result plotted? What is the model performance in predicting test dataset?

6. Waveform analysis: It would help to add a diagram of T2P, T4R and HR in Figure 4.

Relatedly, trough comes before the peak in extracellular spike waveforms (as apparent in Figure 4C) – T2P should be (tpeak-ttrough) in order to be a positive value?

7. "LV is a measure of regularity/burstiness of spike train and is proportional to the square of the difference divided by sum of two consecutive interspike intervals (Shinomoto et al., 2009)."

This sentence should go in the main text. The reason being; the way LV is described in the main text makes it sound like LV and CV measure the same things: "regular or variable interspike intervals (local variability 'LV'), or more or less variable firing relative to their mean interspike interval (coefficient of variation 'CV')."

8. Given how central the clustering analysis in Figure 4A is to the rest of the paper, the exact parameters that went into this analysis (HR, T4R, LV, CV, FR) should be made clear in the main text.

In addition, this clustering analysis is key to the reproducibility of e-types in other datasets. The authors have stated that "All data and code is available upon reasonable request." However, in my opinion, at least the code for the e-type clustering analysis should be made publicly available.

9. "Correlation of local variation with burst index"

Burst index is defined here, but not plotted in any figures. I suggest adding a plot depicting the relationship between local variation and burst index would be informative.

10. "First, we divided trials into two groups of high and low RPE and CHP values (trials were assigned based on their median value for each neuron)."

I understood RPE and Choice Probability to be values unique to each trial, rather than to each neuron? If so, the median value should be specific to each behavior session, not to each neuron? Please clarify.

11. "We included only neurons with at least 50 spikes per time window."

Does this sentence mean 50 spikes per time window per trial? For a 700ms time window, this would mean that the neuron would have to be firing at ~70Hz in order to be included in this analysis! If this sentence means 50 spike per time window across trials, please clarify. In this case, please also clarify the range of trial number that went into this analysis.

Reviewer #3:

In this work, Boroujeni et al. investigated the role of different cellular subtypes in the lateral prefrontal cortex (LFPC) and anterior cingulate cortex (ACC) of the rhesus macaque as the animals performed an attention demanding reversal-learning task. The authors use an attention-augmented reinforcement learning model to track the trial-by-trial values of key decision-making variables which were then correlated against the neural activity. The cellular population was separated into broad and narrow spiking neurons using features computed from the extracellularly recorded waveforms. The authors find that the activity of the narrow spiking cells in the LFPC is correlated with the choice probability, whereas the activity of narrow spiking cells in the ACC is correlated with reward prediction errors. Interestingly, the authors find that further splitting the population of broad and narrow spiking cells into subtypes revealed that both the choice probability in LPFC and the reward prediction error in the ACC were encoded by a specific subtype of putative interneuron. The authors show that the spike-field phase synchronization of this putative interneuron subgroup is also modulated by choice probability in the LFPC and reward prediction error in the ACC, mirroring the result from their single-unit correlation analysis. The authors use these results to propose a biologically plausible circuit model of how learning in such a task might be implemented through interneuron specific synchronization.

While many of the results in the paper seem robust, some of the conclusions drawn by the authors rest on analyses and methods that require further validation and controls.

1. The clustering of the cell population into 5 broad-spiking and 3 narrow-spiking subtypes is perhaps one of the most critical results that requires further validation since a lot the conclusions in the paper rely on the outcome of this analysis. The validation that the authors include in the paper (Figure S5C, S5D) address concerns regarding the clustering quality, but it's still unclear how meaningful this separation into these 8 clusters actually is. The clustering is also performed on the pooled data across both animals, but the authors should have also shown what the clustering looks like when performed independently on the population from each animal, and if there is a meaningful correspondence between the sets of clusters recovered in the two populations.

2. Most of the follow-up analysis focuses on the comparison between one specific interneuron subtype (N3) and all broad -spiking cells. I imagine that the reason for this is two-fold: (1) the N3 subtype is the only one that showed a significant modulatory effect on the multi-unit activity (Figure 4D), and (2) it seems to be special in the sense that the activity of the N3 cells is significantly correlated with choice probability in LPFC in addition to reward prediction error in ACC. While the reasons for showing key results only for the N3-type can be appreciated, the authors should have included additional control analysis to demonstrate that their results are indeed specific to the N3 subtype. For example, in Figure 7 and 8, the authors show a comparison of the spike-LFP phase synchronization between N3 and broad spiking cells, but no further characterization of subtypes within the broad spike cells or the other narrow spiking types (i.e. N1, N2).

3. The authors show that the spike-field phase synchronization of the N3 subgroup is also modulated by choice probability in the LFPC (Figure 7) and reward prediction error in the ACC (Figure 8), mirroring the result from their single-unit correlation analysis (Figures 2 and 3). Unlike their firing rate analysis however, they do not show anatomical specialization in these analyses, even though the model they propose in Figure 9 clearly shows that they hypothesize this to be the case. It would be very interesting to show the analysis performed in Figure 7 for the ACC N3 population, and likewise, the analysis performed in Figure 8 for the LPFC N3 population.

4. Behavior

a. In Figure 1C, I imagine that the proportion of rewarded choices at reversal (t=0, not shown) is equal to one minus the asymptotic performance? So around 0.1?

b. If the stimulus-reward pairings are fully deterministic, why does the monkey require so many trials (on average 7 I believe it was) to reach asymptotic performance again?

c. Related to the previous question, is there any change in this acquisition time over the course of a session (as they experience more and more reversals)?

d. Can you show some example fits of the reinforcement learning model? For example, the choice probability and expected value as a function of the trial number around a reversal.

5. Single Units

a. The authors correlate the neural activity with model-derived variables, like the probability of choice, and prediction error. The distributions of these variables, however (as indicated in Figure S4b, and S4C) are very skewed, and it seems like most of the variability comes from the few trials (around 10) that it takes to reach asymptotic performance after a reversal. It would be interesting to know what this correlation represents. Are the cells truly tracking small changes in the P(choice) and PE or does this reflect more of a discrete switch? Maybe the authors could show some scatters, firing rate vs. P(choice), of some example cells. How well can p(choice) and PE be decoded from the neural population?

6. Electrophysiology/Clustering

It seems that a lot of the results in the paper rely on clustering analysis. The authors have been cautious in their approach (i.e., validating the results), but given that a lot depends on the reliability of these results, I think it would be wise to add a few more control analyses. I am not sure how feasible these are, but worth considering nonetheless:

a. Another way of validating the clustering is to do it across animals. From what I understood, the clustering (for e-type) is done using data from both animals. How well would a clustering model fit within animals, predict the clustering across animals?

7. Spike field coherence

a. Can the authors comment on the effect of ERPs?

b. Simply controlling for the number of spikes between conditions is not necessarily sufficient. If you have a cell that responds to one condition but does not respond to another condition, the spikes for condition 1 are going to be much more clustered in time than for condition 2. Therefore the underlying LFP is not sampled in the same way between the two conditions.

c. Is it possible to show that the spike-field coherence results are also anatomically specific? Does the synchrony of cells in the ACC and LPFC mirror the single-unit results, i.e. reward prediction error in ACC but not LPFC and choice probability in LPFC but not ACC?

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Interneuron Specific Gamma Synchronization Indexes Cue Uncertainty and Prediction Errors in Lateral Prefrontal and Anterior Cingulate Cortex" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Frank as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

We believe that the manuscript has improved substantially since the initial submission, and appreciate that you did quite a lot of work to address the concerns raised previously. Reviewers and editors agreed the new analysis makes a much stronger case, and that this work will make a valuable addition to the field. Reviewers raised a few remaining issues, please find these below. We ask you to address these and invite you to submit a revised version of your manuscript at your earliest convenience.

Reviewer #1:

This paper studies the role of lateral prefrontal cortex (LPFC) and anterior cingulate cortex (ACC) in reversal learning. The authors suggest that LPFC plays a role in computing the probability that the animal will make a certain choice (termed choice probability), whereas ACC signals the reward prediction error. Interestingly, narrow spiking cells (putatively inhibitory neurons also known as fast spiking units) had a higher correlation with these task-relevant parameters, compared to broad spiking cells (putatively excitatory neurons also known as regular spiking units).

Next, the authors define electrophysiological cell types (termed e-types), based on spike waveform and firing patterns. The narrow spiking cells are subdivided into 3 subclasses, termed N1, N2 and N3. Notably, the same subclass of narrow spiking cells, N3, had a correlation with choice probability in LPFC and a correlation with reward prediction error in ACC. Neither of the other narrow spiking subtypes had a significant correlation with either parameter in either area.

In the final part of the paper, the authors examine the phase-locking behavior of these N3 cells to the local field potential (LFP). They find that in LPFC, N3 cells phase lock to gamma (35 – 45 Hz) during the initial learning stage shortly after rule reversal, but as learning progresses and performance reaches a new plateau, their phase locking switches to the beta-band (15 – 30 Hz). Perhaps most remarkably, the N3 cells in ACC showed a similar reversal learning stage dependent phase locking behavior; to elaborate, they phase-locked to gamma only when the reward prediction error was high (i.e., shortly after rule reversal).

The main finding of this paper, that a specific electrophysiological subclass of narrow spiking cells serve important roles in a reversal learning by preferentially phase-locking to gamma band LFP, would be of broader significance and impact if this finding could be generalized to other brain regions, behavioral tasks and model species. This paper cites several precedents in the literature that define e-types. Specifically, Markram et al. (2015) define 11 e-types; Gouwens et al. 2019 define 6 e-types that constitute narrow spiking cells (referred to as fast spiking cells in Gouwens et al). For sake of future efforts to study e-types and their functional roles, it would be important to reconcile these disparate definitions of e-types.

Moreover, as mentioned in the Discussion section of this paper, there are several other papers showing that subclasses of narrow spiking neurons have different relationship with gamma (Shin and Moore 2019; Onorato et al., 2020). It would be very interesting and important to know whether the 3 narrow spiking e-types discussed in this paper match up with the subclasses in the aforementioned papers.

In sum, this paper is a valuable addition to the reinforcement learning literature as well as neuronal cell types and neural oscillations literature. However, it is unclear how the e-types defined in this paper will tie into other neuronal categorizations in recent literature. This link to prior work will be important for broader significance.

Comments for the authors:

This paper has made significant improvements from the previous version. Most importantly, the implementation details of the circuit simulation are clarified. The vast majority of my prior concerns have been addressed. I have only a few suggestions remaining.

1. Given that reward prediction error analysis is critical to the thesis of this paper, I am still of the opinion that it would be important to include the PSTH aligned to the reward, for narrow spiking and broad spiking neurons (as in Figure 2) as well as for important e-types (as in Figure S3).

2. The added classifier analysis or predicting cell classes from their correlations with learning variables is very interesting. However, I am not clear on exactly what was used to train the SVM. The way I currently understand this analysis is that in LPFC, correlation between firing rate and p(choice) was calculated for each neuron – and this one-dimensional vector, the size of which is (Number of neurons)X1, was used to train the SVM. Is this the case? Please clarify.

3. Figure S5 E and F: it is hard to see a trend in these plots. I suggest either making the dots transparent; or plotting the data as a 2D-histogram. This way it would be possible to discern where the data is the densest.

4. In Methods, the numbering in the equations are not unique (there's two Equation 2 and two Equation 3). Please correct.

5. The following sentences in Supplementary Online Information needs to be corrected as indicated:

"These circuit motifs are provided to provide a proof-of-concept that the observations can follows from biologically plausible motifs. These circuits motifs also provide predictions which can be tested in future studies."

Reviewer #2:

In this work, Boroujeni et al. investigated the role of different cellular subtypes in the lateral prefrontal cortex (LFPC) and anterior cingulate cortex (ACC) of the rhesus macaque as the animals performed an attention-demanding reversal-learning task. The authors use an attention-augmented reinforcement learning model to track the trial-by-trial values of key decision-making variables which were then correlated against the neural activity. The cellular population was separated into broad and narrow spiking neurons using features computed from the extracellularly recorded waveforms. The authors find that the activity of the narrow spiking cells in the LFPC is correlated with the choice probability, whereas the activity of narrow spiking cells in the ACC is correlated with reward prediction errors. Interestingly, the authors find that further splitting the population of broad and narrow spiking cells into subtypes revealed that both the choice probability in LPFC and the reward prediction error in the ACC were encoded by a specific subtype of putative interneuron. The authors show that the spike-field phase synchronization of this putative interneuron subgroup is also modulated by choice probability in the LFPC and reward prediction error in the ACC, mirroring the result from their single-unit correlation analysis. The authors use these results to propose a biologically plausible circuit model of how learning in such a task might be implemented through interneuron-specific synchronization.

The analysis is thorough and the authors present a nice narrative of the results, even though in some cases my interpretation of the data is a little more mixed than what is written in the paper. For example, the authors are eager to point out that their results are "interneuron specific" and yet the data that they show suggests otherwise. Take the spike-LFP synchronization results shown in Figure S15, where it seems that the modulation of pairwise phase consistency with p(choice) could also be present for the B1 cluster of cells in addition to the N3 group (no stats shown). The same could be true for the B2 type in the ACC, which seems to show differential effects for high and low RPE.

Are these real effects or are these anomalies that are biased by a few outliers? In either case, please clarify.

Thank you for including the new supplementary figures; I can really appreciate the additional amount of work that must have gone into preparing the new controls for the second submission of the paper. The addition of the example model fittings (Figure S5) and the correlation of the firing rate from the two example cells with the RPE and p(choice) (Figure S10) are very nice. I would recommend to the authors to move the two examples in Figure S10 to one of the main figures. In the first submission, the focus of the paper was predominantly on the N3 subtype and its specialized functional properties in ACC and LPFC. The new figures however (specifically Figure S15) show that the story is a little more mixed than originally presented. B1 for example in LPFC shows differential effects for high and low P(choice) and B2 in ACC shows differential effects for high and low RPE. In any case, the new figures provide a much more complete story and I feel made the paper stronger.

https://doi.org/10.7554/eLife.69111.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

Reviewer #1:

Boroujeni et al. recorded extracellular spikes from single neurons in brain areas LPFC and ACC in two awake behaving macaques that were performing a reward reversal learning task. They classified the recorded neurons into various subtypes, and investigated how neuronal activity in these different subtypes related to the variables of the behavioural task.

The paper's clear and primary strength is the classification of extracellularly recorded neurons into broad- and narrow-spiking neurons, and even further into subtypes of these two classes. While a split based purely on spike waveform shape into broad- and narrow-spiking is relatively common, the cluster-based classification into subtypes based on various additional parameters like spike variability is novel and potentially illuminating. The authors furthermore convincingly demonstrate that the recorded narrow-spiking neurons (often labelled "putative inhibitory interneurons") are indeed likely inhibitory in nature, by showing that the net effect of a spike in these cells on the surrounding population spike rate is negative. The analysis choices in this part of the paper were clear, well-motivated, and well-presented.

However, the bulk of the paper is taken up by the relationship between neuronal spiking and variables from the behavioural task, specifically choice probability (p(choice)) and reward prediction error (RPE). Here, the conclusions appear not backed up by the data, for several reasons.

First of all, the authors only present results for correlations with RPE in the reward window, and results for correlations with p(choice) in the stimulus windows. One of the main conclusions of the paper is that LPFC neurons code for p(choice) whereas ACC neurons code for RPE. However, correlations with RPE in the stimulus windows and p(choice) in the reward window are never shown. Furthermore, the authors demonstrate that, purely given the task structure, RPE and p(choice) are almost perfectly negatively correlated (r = -0.928, Figure S4). It is therefore very possible that the crucial split is not between p(choice) and RPE as the determinant of neural activity, but simply the time window in which these are analyzed.

We believe there is a misunderstanding regarding some aspects of our data that we aim to address in three ways.

Firstly, conceptually the fact that RPE and p(choice) are anti-correlated (at r = -0.96, revised Figure S 5) does not change the interpretation of our findings during the cue and reward period. We report highly specific effects at the moment in the trial when p(Choice) and RPE are computed: A reward prediction error is by definition occurring after reward is processed and compared with an expected value, and not when a cue is processed. When a cue is processed the expected value of the cued information is reactivated and translated into a choice probability. We therefore show for LPFC and ACC and for broad and narrow spiking neuron types that after a cue p(choice) is correlated in narrow spiking neurons in PFC but not in ACC or for broad spiking neurons. Similarly, the reward prediction error (RPE) is computed when the outcome is received and during that time the RPE correlates with ACC narrow spiking neurons but not with broad spiking neurons and not in LPFC.

To address the reviewer’s concern whether the cue triggered correlation of p(choice) is not a correlation with RPE (and vice versa), we adjusted the interpretation in the text. In the revised text we more clearly emphasize that the time of low choice probabilities and high prediction errors demarcate a time of enhanced uncertainty about the relevant stimulus color. According to this perspective the firing and gamma synchrony correlations of narrow spiking neurons in PFC and ACC are reflecting not p(choice) and RPE per se, but a period with uncertainty about cues and outcomes. In the revised abstract we write:

“One of these interneuron subclasses showed prominent firing rate modulations and (35-45 Hz) gamma synchronous spiking during periods of uncertainty in both, lateral prefrontal cortex (LPFC) and in anterior cingulate cortex (ACC). […] Computational modeling this interneuron-specific gamma band activity in simple circuit motifs suggests it could reflect a soft winner-take-all gating of information having high degrees of uncertainty.” (abstract)

We hope that these changes address conceptually the reviewer’s concern.

Secondly, we also want to point out a methodological aspect. Applying partial correlations of firing rate and p(choice) and RPE with variables that are so highly correlated leads to ambiguous results. We therefore refrain from attempting to perform partial correlations. What seems to matter is the clear interpretation about which latent decision variable (after cue onset) and learning variable (after reward onset) should emerge at what time point during the trial (cue period versus reward period).

Thirdly, experimentally dissociating choice probability and reward prediction errors might be interesting. In our experiment the correlation of p(choice) and RPE cannot be dissociated cleanly (as in most reinforcement learning tasks). This has, however, also not been the focus of our manuscript.

Second, the authors present a "circuit architecture" that might account for the observed results. In the Results, this model is presented as though it were a computationally implemented biophysical neural circuit model that makes predictions that are in line with the observed data. I cannot find details of the implementation of such a model in the Methods, which makes the status of the predictions here unclear. It is not explained why two equally-valued objects would lead to gamma synchronization, whereas two objects of unequal value lead to beta synchronization (the key conclusion derived from the model). This appears to depend on total input strength, but it is hard to see why 0.5 + 0.5 (equal value, numbers provided by authors) would result in higher input than 0.8 + 0.2 (unequal value, again numbers from this paper, Figure 9). These choices, and others, appear arbitrary. In general, the description of the model in Results reads more like an interpretation/Discussion section than an outline of model-derived Results.

We apologize for the confusion about the computational circuit explanation. We adjusted this in the revised manuscript and provide a detailed description of the circuit and their implementation in firing rate models in a new Supplementary Online Information, new Suppl. Figures S17 and S18 showing the simulation results and in a simplified Figure 9.

The Suppl. Online Information now provides the implementation details and a discussion on what other architectures could underlie the empirically observed cell – type specific gamma synchronization effects. It has these four separate sections:

1. Overview of circuit modeling

2. E-E-I circuit motif realizing the switch from gamma to beta frequency synchronization

3. E-I-I circuit motif realizing the switch from gamma to theta frequency synchronization

4. Discussion of circuit motifs, relation to other models and experiment

We also clarify in the Results section of the main text that the circuit motifs are conceptualized in order to understand the possible circuit function of the observed gamma synchronization of the N3 interneuron class. We kindly refer to the Results section entitled “Circuits model of interneuron-specific switches of gamma to beta or theta synchronization”. In that section we state that

“… oscillatory activity signatures might inform us about the possible circuit motifs underlying uncertainty-related related computations. These computations are formally described in the reinforcement learning framework allowing us to propose a linkage of specific computations to oscillatory activity signatures and their putative circuits as proposed in the Dynamic Circuits Motif framework (Womelsdorf et al., 2014).”.

We then go on to explicitly refer to the Supplementary Online Information that contains detailed descriptions of the model details and the simulation results:

“To show the feasibility of this approach we devised two circuit models that reproduces the gamma band activity signatures in LPFC and ACC using populations of inhibitory cells modeled to correspond to N3 e-type cells (for modeling details, see Suppl. Online Information)”

The revised text also explicitly states that

“… [t]he described circuits provide proofs-of-concept that the synchronization patterns we observed in the N3 e-type interneurons in ACC and LPFC during periods of uncertain values and outcomes can originate from biologically realistic circuits.”

We believe the addition of the circuit models enhances the impact of the paper by explicitly pointing to possible circuit function of the observed interneuron specific gamma activity.

Third, the presented empirical evidence for narrow-spiking cells (or, more specifically, the N3-subtype) engaging preferentially in gamma-band synchronization, whereas broad-spiking cells engage preferentially in beta-band synchronization, is modest. Interneuron engagement in gamma rhythms is expected from the literature, of course, but in the present dataset this is less clear-cut. In particular, the spectral peaks in Figure 6C are quite similar between broad- and narrow-spiking, and labelling the former "beta" but the latter "gamma" requires a more thorough analysis than is now presented.

Our highly specific gamma synchronization effects of the N3 e-type neurons are not described in Figure 6 to which the reviewer refers to. We adjusted the revised text to make this more clear, added new analyses results and describe the statistics more explicitly to convey our specific findings:

1. We explicitly report that both, narrow and broad spiking neurons show beta spike-LFP peaks (Figure 6, Suppl. Figure S4). This analysis was task-epoch independent. Please also note that we describe these results in the section entitled “Narrow spiking neurons synchronize to theta, beta and gamma band network rhythms.”, including links to relevant literature. This shows we do not claim a simple beta – gamma dissociation.

2. Among the individual cell classes, the spike-LFP synchrony showed significant ~40 Hz gamma peaks in ACC for classes N2 and N3. The other cell classes in ACC and all of the cell classes in LPFC did not show 40Hz gamma in a task-epoch independent way. These results (for all cell classes in each area) are provided explicitly in Suppl. Figure S12.

We consider the results from 1) and 2) not essential for our manuscript’s main message. They are provided to more comprehensively provide information about the cell classes, as the clustering procedure did not take into account synchrony. As another reviewer pointed out, the more information we provide about the cell classes the easier it will be in the future to identify possible subclasses and compare results from different studies.

3. We provide statistical evidence that gamma spike-LFP synchronization emerges transiently after a cue only for the N3 class in LPFC during low choice probability trials, and only for the N3 class in ACC during trials with low reward prediction error. The statistics for this cell class specific transient gamma increase is provided in Figure 7C (for LPFC) and in Figure 8F (for ACC). The specificity of these findings is now documented in the new Suppl. Figures S13 and S14 (please also see below).

4. In response to the reviewer we now provide for each of the cell sub-classes in a new Suppl. Figure S15 the time-frequency spike-LFP synchronization cue-aligned for low and high p(Choice) and reward-aligned for high and low RPE trials for PFC and ACC, respectively. These figures are noisy, because of the typically low number of spikes, but they do show that class N3 has some 3545 Hz synchrony in each brain area that other cell types do not show. We revised the text to describe these new results.

Fourth, there are some issues with reporting, where occasionally results are only reported for the narrow-spiking cells and not for the broad-spiking cells, or it is unclear whether a stated result holds for all or just a subset of cells, etc.

Thank you for pointing this out. We added missing information and carefully checked that the revised text reports all main results also for the broad spiking cells. The newly added results did not change the conclusions from the study.

Finally, all results are shown aggregated over two animals, while it is important to know how the key results hold in the two animals separately.

To address this comment we added the results from the key analyses for each monkey separately as supplementary results. In summary, both animals show similar result patterns in their response to color onset and reward onset, their firing rate correlation, and for both animals we verified the cell clustering separately:

“New Supplementary Figure S6 shows the main firing rate results for each of the two monkeys separately. For both monkeys narrow spiking neurons have stronger event-related firing and correlation with choice probabilities and reward prediction errors. This supports our main findings.”

We added new result panels in Supplementary Figure S8E and F that show monkey specific validation of cell clusters using two established methods. Both methods provided results showing that the cell classes were reliable in each monkey. This was true also for the main interneuron class N3 that carries the main results of our manuscript.

I mention some additional recommendations here.

At the very least, correlation analyses for both p(choice) and RPE should be shown for all time windows, to allow a proper assessment. If the authors indeed wish to maintain the hard claim of a dissociation ACC<>RPE and LPFC<>p(choice) this should explicitly be tested by e.g. directly comparing the correlations with the two behavioural variables.

Please also see our reply to the comment starting “However, the bulk of the paper…” above. We added the results of the suggested correlation analyses. We indeed do see only an ACC<>RPE and not an LPFC<>RPE effect, and we do find only a LPFC<>p(choice) and not an ACC<>p(choice) effect.

RPE and p(choice) are highly negatively correlated at r =-0.928 (shown in Suppl. Figure S5B), so that a third variable (firing rate) that correlates positively with one variable will negatively correlate with the other variable. However, choice probability is most meaningfully tested after the color cue onset (the color values are needed to compute p(Choice) and RPE is most meaningfully correlated after reward onset (when the value of the chosen stimulus is compared to the experienced reward).

We adjusted the text to make this more explicit and refer directly in the main text to the definitions of p(choice) (formalized in Equation 4, Methods) and of RPE (formally described in Equation 2, Methods).

In response to another comment, we also added results of the synchronization analysis for both ACC and LPFC for low and high p(choice) and low and high RPE (shown in new Suppl. Figure S13 and S14).

The model should be specified in much more detail. Specifically, the assumptions built into it should be clearly defined, and the quantitative predictions derived from it should be presented.

We agree. A substantially revised Results sections summarizes the circuit motifs. The circuit hypotheses shown in Figure 9 is simplified and streamlined. We added the details about the circuit motifs that reproduce the gamma synchronization effects in LPFC and in ACC and describe the assumptions explicitly. We now also show stimulation results in a new Suppl. Online Information and Suppl. Figures S17 and S18.

The circuit models are not provided to generate quantitative predictions. We specify this now explicitly in the revised Results and write e.g.:

“To show the feasibility of this approach we devised two circuit models that reproduces the gamma band activity signatures in LPFC and ACC using populations of inhibitory cells modeled to correspond to N3 e-type cells (for modeling details, see Suppl. Online Information) …”

The intended value of the circuit models is described in the revised text to …

“… provide proofs-of-concept that the synchronization patterns we observed in the N3 e-type interneurons in ACC and LPFC during periods of uncertain values and outcomes can originate from biologically realistic circuits.”

The model results also support our main interpretation that the gamma activity might indicate the gating of competing information (in LPFC) and the detection of mismatches of experienced reward and the expected value of the chosen stimulus.

Please also see our second response to reviewer #1 above.

I understand that the data are not yet publicly released, as others from the same lab are still working on the same data (which is common in the field). However, I would urge the authors to make the source code for all reported analyses publicly available already, to greatly improve transparency and replicability. ("Upon reasonable request" is not sufficient for this goal.)

We fully agree. We mentioned the open-source code used for clustering in the method section.The source-code for the adaptive spike removal is now added (https://github.com/banaiek/ASR).

We also will add a new GitHub link for the complete spike-triggered MUA analysis (with example scripts) upon publication of this paper.

All analyses code will also be linked and made available on the lab website via http://accl.psy.vanderbilt.edu/resources/code/

In general, the narrative could be streamlined a bit, as it currently stands the manuscript is hard to read.

We attempted to streamline the narrative (starting with an improved abstract). The revised manuscript has all changes highlighted in red font.

Thank you for the many helpful constructive comments.

Reviewer #2:

This paper studies the role of lateral prefrontal cortex (LPFC) and anterior cingulate cortex (ACC) in reversal learning. The authors suggest that LPFC plays a role in computing the probability that the animal will make a certain choice (termed choice probability), whereas ACC signals the reward prediction error. Interestingly, narrow spiking cells (putatively inhibitory neurons also known as fast spiking units) had a higher correlation with these task-relevant parameters, compared to broad spiking cells (putatively excitatory neurons also known as regular spiking units).

Next, the authors define electrophysiological cell types (termed e-types), based on spike waveform and firing patterns. The narrow spiking cells are subdivided into 3 subclasses, termed N1, N2 and N3. Notably, the same subclass of narrow spiking cells, N3, had a correlation with choice probability in LPFC and a correlation with reward prediction error in ACC. Neither of the other narrow spiking subtypes had a significant correlation with either parameter in either area.

In the final part of the paper, the authors examine the phase-locking behavior of these N3 cells to the local field potential (LFP). They find that in LPFC, N3 cells phase lock to gamma (35 – 45 Hz) during the initial learning stage shortly after rule reversal, but as learning progresses and performance reaches a new plateau, their phase locking switches to the beta-band (15 – 30 Hz). Perhaps most remarkably, the N3 cells in ACC showed a similar reversal learning stage dependent phase locking behavior; to elaborate, they phase-locked to gamma only when the reward prediction error was high (i.e., shortly after rule reversal).

These results are generally well supported by rigorous statistics and sophisticated analyses. However, there are several weaknesses.

First, while the claim that LPFC encoded choice probability is well supported, the claim that ACC encodes reward prediction error is not as well substantiated. As seen in Figure 3, percent neurons showing significantly correlation between their firing rate and reward prediction error is not very different between LPFC and ACC, and quite similar between broad spiking and narrow spiking units within ACC.

We did not intend to convey that RPE is not encoded in LPFC, or only encoded in ACC.

We adjusted the text to clarify this and write about Figure 3 that “…23% of LPFC and 35% of ACC…” neurons show sign. rate<>RPE correlations, reporting that the proportions are significantly different from zero. This corresponds well to our previous work that comprehensively surveyed how different types of RPEs are encoded in LPFC and in ACC with LPFC showing RPE encoding slightly less likely and slightly later than ACC when considering both, single isolated neurons as in this study, and multiunit activities (Oemisch et al., 2019).

What we found is that the positive correlation of RPE and firing rate had a stronger effect size in narrow spiking ACC neurons (Figure 3D) and is driven by a significant correlation of the N3 subclass (Figure 5D).

No other narrow or broad spiking neuron class has an average significant positive correlation strength.

We adjusted the text at various place to convey this more clearly in the revised manuscript. E.g., we write

“…time-resolved analysis of the strength of the average correlations revealed a significant positive firing x RPE correlation in the 0.2-0.6 s after reward onset for ACC N-type neurons, which was absent in LPFC (ACC, n=43 N-type neurons, randomization test p<0.05; LPFC: n=31 N-type neurons, no time bin with sign.; Figure 3C,D).”

We confirmed that this result was evident in each of the monkeys when tested separately (and added this result to the revised manuscript).

Second, the authors build a reinforcement learning model to calculate "Choice Probability", which quantifies the probability that the animal will select the rewarded stimulus. According to this definition, choice probability should dip upon reversal, and rise to a new plateau after several trials. However, this metric is fairly unintuitive, not to mention in conflict with existing nomenclature (e.g., Nienborg, Cohen and Cumming 2012). It would be helpful to have an accompanying plot of how the firing rate and phase locking behavior of each neuronal type changes as a function of trials after reversal.

We did not explicitly describe the p(choice) in the original manuscript and apologize for this oversight. In the revised text we followed the reviewer’s suggestion and added the progression of choice probabilities since trials after reversal in an extended Supplementary Figure S5. Choice probabilities positively correlate at r=0.27 with the trial number since reversal (as expected).

We also adjusted the text to more explicitly convey how the reinforcement learning literature uses a softmax (or Boltzman) equation to translate stimulus values into choice probabilities. We adjusted the methods section explicitly and write:

“Equation 4 defines the choice probability, or p(choice), that is used for the neuronal analysis of this manuscript (Sutton and Barto, 2018). P(choice) increases with trials since reversal (Supplementary Figure S5A,E), indicating a reduction in the uncertainty of the choice the more information is gathered about the value of the stimuli.”

We refrain from adding a rate or phase locking analysis as a function of trials since reversal because this would require trial by trial estimates of phase synchronization in short time windows around cue or reward onset. The isolated neurons fire few spikes in these epochs (Figure 4B), rendering the estimate of phase consistency noisy. We believe the trial-by-trial analysis will be more relevant when analyzing larger datasets that do not separate neurons into sub-classes with few and low firing neurons (given the small proportion of interneurons in the population).

Third, the extent to which choice probability encoding neurons and reward prediction error encoding neurons in each area falls into a specific e-type is not shown.

Undoubtedly, it is noteworthy and remarkable that N3 is the only e-type that shows a positive correlation with choice probability in lateral prefrontal cortex and a positive correlation with reward prediction error in ACC (Figure 5). But do all choice probability encoding neurons in LPFC and reward prediction error encoding neurons in ACC fall into the N3 e-type?

The key finding is that the N3 e-type shows the strongest correlations with p(choice) (in LFPC) and RPE (in ACC). It does not entail that neurons in other e-type classes would not significantly encode p(choice) and RPE. In each class a small fraction of neurons individually shows significant correlations.

We clarify this in the text and added a new classification analysis to address the reviewers question.

First, we adjusted the text at various places to more explicitly refer to the “proportion of significance” and to the “strength / effect size” of the correlation when describing the results. E.g., we write

“… on average 23% of LPFC and 35% of ACC neurons showed significant firing rate correlations with RPE … ”, while “… analysis of the strength of the average correlations revealed …”

Second, to quantitatively convey the statistical nature of the main findings we added an analysis that predicted the cell class label of an individual cell based on its correlation of firing rate and p(choice) in LPFC and RPE in ACC. The confusion matrix results shown in the newly added Supplementary Figure S11 reveals that p(Choice) allowed to significantly predict a neurons’ class for various e-types and not only for the N3 e-type (for which the correlations were strongest as shown in other analyses). We added to the main text that in

“… LPFC, a linear classifier trained on multiclass p(choice) values was able to label N3 e-type from their p(choice) value with and accuracy of 31% (Suppl. Figure S11A).” and in

“…ACC a linear classifier trained on multiclass RPE values was able to label N3 e-type from their RPE value with and accuracy of 34% (Suppl. Figure S11B).”

Further, the task-dependent phase locking behavior of e-types other than N3 are not shown. Given that N3 is the only NS e-type that shows a relationship with task-relevant parameters, I would expect the task learning dependent phase-locking behavior to also be unique to N3, but this result is not presented in this paper.

We added the phase locking for all cell types around cue onset for low and high p(choice) in LPFC and around reward onset for low and high RPE in ACC as new Suppl. Figure 15. The plots show that the N3 e-type is the only narrow spiking e-type with a gamma response after cue (in LPFC) / after reward (in ACC) in the period of enhanced reward uncertainty (low p(choice) and high RPE).

Additionally, as requested, we now provide for each cell type the gamma spike-LFP phase locking for low and high p(choice) in LPFC, and low and high RPE in ACC in the new Suppl. Figure’s S13E-H and S14EH.

Notably, gamma synchrony is evident most clearly for the N3 e-type in LPFC during low p(choice) (Suppl. Figure S13E) and the N3 e-type in ACC during high RPE trials (Suppl. Figure S14H).

These results support our original conclusions. The revised text is adjusted to refer to these new results.

Finally, the conceptual model in Figure 9 captures the results presented in this paper and gives rise to testable predictions. It seems that some predictions of this model should be testable with the presented data. For example, the prediction that in LPFC, broad spiking cells fall into two functional categories, whereas N3 cells are more functionally homogeneous, would be an interesting prediction to test. Further, the prediction that in ACC, broad spiking cells encode reward whereas N3 cells encode reward prediction error is easily testable and would strengthen the conclusions of this paper.

In response to editorial and other comments we simplified the scope of the circuit modeling. We devised these models to account for the observe gamma synchronization of the N3 interneuron class and inform us about possible circuit functions of this gamma synchronization. Testing predictions of the model’s broad spiking neurons was not the intention and would exceed the scope of this manuscript.

In the revised manuscript we make this explicit by writing, for example, in the Results section:

“The described circuits provide proofs-of-concept that the synchronization patterns we observed in the N3 e-type interneurons in ACC and LPFC during periods of uncertain values and outcomes can originate from biologically realistic circuits. The results justify future studies testing detailed predictions that can be derived from these circuit motifs.”

The main finding of this paper, that a specific electrophysiological subclass of narrow spiking cells serve important roles in a reversal learning by preferentially phase-locking to gamma band LFP, would be of broader significance and impact if this finding could be generalized to other brain regions, behavioral tasks and model species. That said, there are already several papers in the literature that define e-types. Specifically, Markram et al. (2015) define 11 e-types; Gouwens et al. 2019 define 6 e-types that constitute narrow spiking cells (referred to as fast spiking cells in Gouwens et al). For sake of future efforts to study e-types and their functional roles, it would be important to reconcile these disparate definitions of e-types.

We agree. But we particularly want to highlight that our findings in two brain areas in the monkey is already an important, newly achieved milestone. There is no other paper to our knowledge that characterizes specific interneuron functions in the nonhuman primate in ACC and LPFC and succeeds to find a functional similarity for the same electrophysiological cell type.

To address the reviewer’s suggestions, we added a more explicit discussion that is aimed at linking the different e-typing approaches from in-vitro and from in-vivo studies. We write in the revised Discussion section:

“The first implication of our findings is that narrow spiking neurons can be reliably subdivided in three subtypes based on their electrophysiological firing profiles. […] These results illustrate that our three interneuron e-types will encompass further subclasses that future studies should aim to distinguish in order to narrow the gap between the in-vivo e-types that we and others report in the monkey, and the in-vitro e-types in the rodents that are more easily mapped onto specific molecular, morphological and genetic make-ups (Markram et al., 2015; Gouwens et al., 2019).”

Moreover, there are at least two other papers showing that subclasses of narrow spiking neurons have different relationship with gamma (Shin and Moore 2019; Onorato et al., 2020). It would be very interesting and important to know whether the 3 narrow spiking e-types discussed in this paper match up with the subclasses in the two aforementioned papers.

We agree these are very important references and added them at different places in the revised Discussion section.

Please see the quote in the previous reply. Additionally, when discussing the importance to link in the future e-types to morphological and molecular types in mice we refer to the Onorato study (of which the last author of this study is a co-author) and write:

“…As a caveat, this mapping of cell types between species might also reveal cell classes and unique cell class characteristics in nonhuman primate cortices that are not similarly evident in rodents as recently demonstrated in a cross-species study of non-fast spiking gamma rhythmic neurons in early visual cortex that were exclusively evident in the primate and not in mice (Onorato et al., 2020).”

When discussing when gamma synchrony has been observed in specific cell classes, e.g. we now write in the revised text:

“Such an intrinsic propensity for generating gamma rhythmic activity through, e.g. GABAaergic time constant, is well described for PV+ interneurons (Wang and Buzsaki, 1996; Bartos et al., 2007; Womelsdorf et al., 2014b; Chen et al., 2017) and for some cells even for states with relatively low excitatory feedforward drive that might be more typical for prefrontal cortices than earlier visual cortices (Cardin et al., 2009; Vinck et al., 2013; Shin and Moore, 2019; Onorato et al., 2020).”

In sum, this paper is a valuable addition to the reinforcement learning literature as well as neuronal cell types and neural oscillations literature. Some additional analyses could strengthen the conclusions of this paper. It is unclear how the e-types defined in this paper will tie into other neuronal categorizations in recent literature. This link to prior work will be important for broader significance.

Thank you for the positive comment. We fully agree with the importance to link to other classification schemes and hope the revised discussion and the added details in the supplementary materials will support this goal.

Comments for the authors:

I. Comments on Figures

1. Figure 2 and Figure S6 shows the PSTH aligned to Feature 1 and Feature 2 based on the cue order (Motion first vs Color first). It would be highly relevant to also show the PSTH aligned to Feature 1, Feature 2 and Reward based on behavioral outcome (correct vs incorrect, and there are at least 3 different types of error outcomes; please see my comment III-2 in Comments on Methods below for elaboration). In particular, PSTH aligned to reward conditioned on behavioral outcome is crucial for interpreting Figure 3.

We appreciate your point of view. But this manuscript is about encoding of positive reward prediction errors and choice probabilities during learning with multiple sub-conditions and result panels (with 17 supplementary figures). The proposed (hit vs miss) error type analysis would be a different manuscript and beyond the scope of this paper, or it would add complexity that the revisions aim to reduce.

2. Figures 2 and 3: The correlation between firing rate and Choice Probability / RPE is interesting, but not very intuitive. It would be helpful to have a plot of Choice Probability and Reward Prediction Error as a function of trials since reversal, as well as the firing rate for each cell type and brain area as a function of trials since reversal. This way we can see whether LPFC NS firing rate after color cue onset tracks Choice Probability, and whether ACC NS firing rate after reward tracks RPE.

We added example learning blocks showing the progression of choice probabilities and RPE (and the value of the chosen stimuli) over trials in an extended Suppl. Figure S5A. We show the overall change of choice probabilities and RPE since the first trial after reversal in Suppl. Figure S5E,F and report the correlations of these variables with the trial-since-reversal (which is r=0.13 for p(choice) and r= 0.23 for RPE and trial-since-reversal). We showed more examples in an earlier publication (Oemisch et al., 2019).

To provide ore insights into the firing rate changes with p(choice) and RPE we provide an example for LPFC and ACC in new Suppl. Figure S10 and write e.g.:

“The on-average positive correlation of firing rate and p(choice) was also evident in an example N3 e-type cell (Suppl. Figure S10A-C).”

Please note that the average correlation of the N3 cell classes firing rate with p(choice) is significant but weak: r=0.08 in LPFC (and with RPE in ACC it is r=0.09) so that trial x rate plots will be noisy. The key result is that the narrow spiking neurons do show on average an on-response to the cue and reward onset and that only one of these classes (N3) shows significant correlations (and gamma synchronization).

3. Figure 4B firing rate unit is missing both the figure and in the main text.

Figure 4C rastergram firing rate seems massively different from the average firing rate in 4B? e.g., for Figure 4C rastergram for N1, there seems to be ~5 spikes per 100ms, which would be ~50Hz, but the average firing rate for N1 is 4Hz?

Also, please discuss why the narrow spiking firing rate is so low (assuming the firing rate unit was Hz, mean firing rate is <2Hz for N2 and N3). Narrow spiking firing rates have typically been reported to be ~10Hz in vivo.

Thank you for pointing us to this. The firing rate axes is log(spikes/sec.), so the firing rates for N1, N2, and N3 are on average 20, 3.8, and 4.3 spikes/sec – similar to previous studies / datasets in monkey prefrontal cortex (e.g., Ardid et al., 2015).

We corrected this in Figure 4B and the main text. Figure 4C are example raster’s selected to convey primarily how regular/irregular the firing was (as captured in LV and CV). They might differ in mean rate to the average class specific firing rates.

4. Figure 5: It is remarkable that N3 is the only e-type that shows a positive correlation with choice probability in LPFC and a positive correlation with reward prediction error in ACC. To what extent do choice probability encoding neurons and reward prediction error encoding neurons in each area fall into a specific e-type? I would like to know whether a neuron's e-type is predictable from task-dependent functional properties of the neuron.

That is an interesting question. To address this, we trained a classifier on the correlation values of e-types for p(choice) and RPE, and then predicted the class labels of neurons in a one to all classification process with the trained classifier. Two N-type classes (N1 and N2) were not large enough to survive the sampling we implemented in the analysis. From the other 6 e-types in ACC, the N3 e-type and two other broad spiking classes were predictable significantly higher than the chance level. However, only N3 e-type was not significantly confused with classifiers trained on other e-types.

We added the results in Supplementary Figure S11 and adjusted the methods and results text. We summarize the results by writing in the revised text:

“In LPFC, a linear classifier trained on multiclass p(choice) values was able to label N3 e-type neurons from their p(choice) value with and accuracy of 31% (Suppl. Figure S11A).” and “In ACC a linear classifier trained on multiclass RPE values was able to label N3 e-type neurons from their RPE value with and accuracy of 34% (Suppl. Figure S11B).”

5. Figure 6C: suggest plotting N3 in the same plot as Broad Spiking and Narrow Spiking units such that the magnitude can be compared more easily.

In addition, please clarify what the y-axis of Figure 6c means (Peak densities of spike-LFP synchronization (PPC)). Is this simply the average PPC spectra? Or normalized for each unit in some way? I would recommend plotting the former, such that it is possible to compare which e-types have the best locking properties to which frequency band.

The magnitudes are directly comparable and the axis limits identical: Figure 6C shows the proportion of neurons in that population with a reliable (significant) PPC peak at that frequency. We clarified this in the revised figure axis and legend. The method section describes the computation explicitly.

Similarly, revised Supplementary Figure S12 shows the detailed information about the likelihood for significant PPC peaks for each cell class in each brain area (with the same [ylim] to ease comparison).

The values are directly comparable between plots.

6. Figure 7 and 8: It's very interesting that initially after reversal, N3 locks to gamma but later, as performance reaches a new plateau, N3 locks to beta. If you plot trial since reversal on the x-axis, and plot the peak of PPC spectra (averaged across N3 cells) on the y-axis, do you see a gradual change in peak frequency or is it more of a step function change after each reversal? Relatedly, if you plot the histogram of PPC spectra peak frequency across N3 cells, is it a bimodal distribution (one peak in beta and another peak in gamma) or is it unimodal?

That is an interesting approach. We have tried it per your recommendation. However most of classes did not pass the minimum spike number criterion per time window per trial. The result is too noisy to be meaningful. We are actively working on a new experiment that increases the number of learning blocks per recorded cell to more than 30 and expect this new design will allow more fine-grained trial-by-trial predictions of learning (and to see for how many neurons this is a one shot or a gradual learning change).

For the revised circuit models (Figure 9 and Suppl. Figures S17 and S18) we do describe mechanisms for a gradual gamma-to-beta switch mechanism (for LPFC) and gamma-to-theta witch mechanism (for ACC). We also added an explicit discussion of possible alternative switch mechanisms in the Suppl. Online Information. In the models we propose and simulated the switch depends on the heterogeneity of the inputs to the network cells (we kept the overall excitatory drive constant in both tested models). This provides a starting point to test in the future more precise predictions about how rapid the switch occurs. Our manuscripts can provide the rationale for these more specific tests in the future.

7. It would be interesting to know the behavior-dependent phase locking of other e-types as well. I suggest adding Figure 7 and 8 C and F for all e-types as a supplemental figure.

We added as new Suppl. Figure S15 the suggested information with time frequency plots for low and high p(choice) in LPFC and low and high RPE in ACC for B1-5 and N1-N3.

8. Were LPFC and ACC recorded simultaneously? If so, it would be very interesting to see if inter-area coherence mimics the changes in PPC. For example, does the gamma band coherence go up in the first few trials after reversal, followed by an increase in beta band coherence as behavioral performance plateaus?

That is a really interesting point but unfortunately the number of data points recorded simultaneously is not sufficient for this analysis.

9. Figure 9 outlays a really nice hypothesis that gives rise to testable predictions. Some of these predictions are testable within the data presented in this paper. I think it would significantly strengthen this paper if some of these predictions could be tested:

The original model we proposed was a major issue for the editor and other reviewers as it was not detailed enough. We therefore adjusted the model section and now present a model that has all implementation details presented in a new Supplementary Online Information.

We devised and implemented the circuit model primarily to document how the enhanced gamma during learning can emerge in the prefrontal cortex circuits (where gamma during learning switches to beta after learning) and in the anterior cingulate circuits (where gamma during learning switches to theta after learning).

We adjusted Figure 9 to show only the core circuit motifs that could reproduce the observed gamma to beta switch and the gamma to theta switch. We implemented and simulated the circuits in a firing rate model and show the results in new Supplementary Figures S17 and S18, and provide the details in an extended Suppl. Online Information that has these four sections:

1. Overview of circuit modeling

2. E-E-I circuit motif realizing the switch from gamma to beta frequency synchronization

3. E-I-I circuit motif realizing the switch from gamma to theta frequency synchronization

4. Discussion of circuit motifs, relation to other models and experiment

With all these changes we hope to provide a useful interpretation to our gamma synchrony finding. Testing the perditions from these circuits need to await more quantitative and biophysically detailed modeling that is beyond the scope of this paper (we plan to provide this in a follow up modeling paper).

We summarize the purpose of the circuit modeling and the limitation of our approach in the Suppl. Online Information by writing:

"… these models serve as a proof of principle, indicating how populations may be wired up to produce oscillations with different frequencies, but they cannot make conclusive predictions regarding the dynamics of the underlying interneurons, i.e. whether they are PV or SOM, or what type of spike patterns they produce. […] Nevertheless, we think it is reasonable to identify faster interneuron populations with PV+ interneurons given prior modeling studies (see next paragraph), and thereby putatively link them to the N3 e-type (see also Discussion of the main text)."

Figure 9 hypothesizes that in LPFC, Broad Spiking neurons should encode Value predictions; e.g., red-selective neurons that, after learning, fire more when red is being rewarded compared to when green is being rewarded. These Value-predictive neurons should fire similarly during learning, and is perhaps even predictive of the animal's choice on a trial-by-trial basis (e.g., on trials that red-selective neurons fired more during learning, the animal saccades according to the red stimulus). In contrast, N3 neurons should show no such Value-predictive behavior. Is there evidence of such prediction in the data?

Relatedly, Figure 9 hypothesizes that in ACC, Broad Spiking neurons encode reward, whereas N3 encode RPE. According to this prediction, N3 activity should be higher for "surprise correct" trials shortly after reversal, and go down as performance plateaus, whereas Broad Spiking neurons should be excited by reward the same amount regardless of whether it is shortly after reversal or after behavioral performance has reached plateau. Is this seen in data? I think this would be made clear if the PSTH aligned to reward were plotted, as suggested in Comment 1.

Please see our reply to the previous comment regarding the revised and more detailed circuit models. They were conceived to reproduce the oscillatory signatures rather than as a reinforcement learning network that learns values in pyramidal cell populations. We therefore refrain from more detailed analyses of value coding in different cell populations.

This paper is about cell specific coding of uncertain cues (low p(Choice)) and uncertain outcomes (high RPE) and not about the coding of value in ACC and LPFC. We plan on adding the proposes analyses in future work with a separate set of analyses.

II. Comments on Main Text

1. "We next asked whether the narrow spiking, putative interneurons that encode p(choice) in LPFC and RPE in ACC are from the same electrophysiological cell type, or e-type (Markram et al., 2015)."

There are ~11 e-types described in Markram et al., 2015. Further, Gouwens…Koch 2019 NN describes ~6 sub-e-types of Fast Spiking cells. I recommend the authors to speculate on how previously reported e-types match up with the e-types described in this paper.

We appreciate this comment and added a discussion of the putative relationship of the in-vitro and the in-vivo e-types to the Discussion section. Please also see the reply to the reviewer 1 comment starting “At the very least, correlation analyses…” above.

2. "Prior studies have suggested that interneurons have unique relationships to oscillatory activity (Cardin et al., 2009; Vinck et al., 2013; Voloh and Womelsdorf, 2018; Womelsdorf et al., 2014a),"

I suggest adding Chen…Zhang 2017 Neuron to this list of references.

Thank you for pointing us to this. We added Chen, G., Zhang, Y., Li, X., Zhao, X., Ye, Q., Lin, Y.,.… and Zhang, X. (2017). Distinct inhibitory circuits orchestrate cortical beta and gamma band oscillations. Neuron, 96(6), 1403-1418.

3. Discussion section: There are at least two other papers showing that subclasses of narrow spiking neurons have different relationship with gamma (Shin and Moore 2019 Neuron; Onorato…Vinck 2020 Neuron). It would be an interesting addition to the Discussion section to speculate on whether the 3 narrow spiking e-types discussed in this paper match up with the subclasses in the two aforementioned papers.

Thank you for pointing us to the papers, We added specific sentences in various Discussion sections. We outlined this in detail in reply to the comment starting “Moreover, there are at least two other papers…” above.

III. Comments on Methods

1. In general, the Method section is not consistent about referring to relevant figures for the analyses being described. It would really help the reader if the analyses that went into each figure were clarified: e.g., "Statistical Analysis of time resolved spike-LFP coherence for putative interneurons and broad spiking neurons (Figure 7, 8)"

We added explicitly the specific Figure numbers / Suppl. Figure numbers to the individual methods sections in the revised submission. We also changed the order of some methods sections to more fairly reflect the order in which they are applied in the Results section.

2. Task design: "Color-reward associations were reversed without cue after 30 trials or until a learning criterion was reached, which makes this task a color-based reversal learning task. "

It seems that a strategy that a monkey might employ would be to count the number of trials after reversal to anticipate when the next reversal would happen, which would rely on a different mental strategy than reversal learning tasks where the reversal points are not predictable. Is there any behavioral evidence that would discount the possibility that the monkeys are counting?

Yes, in previous studies we compared quantitatively which of many different reinforcement learning (RL) models, Bayesian models and hybrid Bayes-RL best accounts for the behavior of the animals (Oemisch et al., 2019). The optimal Bayesian model would come closest to very fast changes of behavior after or at the time of reversal but it did not fit the monkeys’ behavior well. Monkeys are slower than predicted by optimal models and are slower than predicted from anticipating the trial of reversal (as suggested by the reviewer).

For estimating the RPE and (choice) we used the best-fitting (cross validated) RL model. We refer explicitly to this in the revised methods:

“This color-based reversal learning is well accounted for by an attention augmented Rescorla Wagner reinforcement learning model (‘attention-augmented RL’) that we previously tested against multiple competing models (Balcarras et al., 2016; Hassani et al., 2017; Oemisch et al., 2019). Here, we use this model …”.

"Hence, a correct response to a given stimulus must match the motion direction of that stimulus as well as the timing of the dimming of that stimulus."

In this task, there appears to be one way to be correct, but several distinct ways of being incorrect. First, the monkey could be incorrect in both the timing and the saccade direction. Second, the monkey could be correct with the timing but incorrect with the direction. Third, the monkey could be correct with the direction but incorrect with the timing. The third outcome could be further subdivided into premature response versus late response. The reason why a monkey might make each mistake is different. Only the first scenario supports the possibility that the monkey thought the other color was being rewarded, e.g., shortly after reversal. It would be interesting to know the proportion of each error type as a function of trials since reversal. Furthermore, I would expect the negative reward prediction error to be most prominent in the first type of error. Hence, it would make sense to me if only the first type of error was considered when calculating choice probability and reward prediction error.

The reviewer is correct about the error types. We added error distribution information to the revised text (in the methods where other behavioral results were described already in order to not disrupt the flow of the result section). We write:

“Monkeys performed the task at 83 / 86 % (monkey’s H / K) accuracy (excluding fixation break errors). Of the 17/14 % of errors were composed on average to 50 / 50% of erroneous responding to the dimming of the distractor when it dimmed before the target and 34 / 37 % of erroneous responding at the time when target and distractor dimmed simultaneously but the monkey chose the distractor direction, and 16 / 13 % of error were responses when the target dimmed before any distractor dimming and the choice was erroneously made in the direction of the distractor.”

For computing the choice probabilities, we use all trials where the monkey made a choice rather than a fixation break or a no-response. In the model, the choice probability is not computed only based on color values, but on location and motion values too (location and motion direction of the stimuli are sources of uncertainty when making a choice). When monkeys make an unrewarded choice at the wrong time, to the wrong direction, or to the wrong stimulus, then this behavior still is a choice initiated by the animal. The choice probability estimate is not supposed to only include choices of one color versus another color. Rather, the choice probability estimate was used because it quantifies how certain the animal was when making the choice. (Analysis of the encoding of color-value estimates is not part of this manuscript but is a separate project).

Regarding RPE’s, this paper reports cell specific correlations with positive RPEs on correct trials (it is mentioned in the methods section). The errors that lead to negative RPEs are too rare in this design to allow a strong analysis in classes with low number of cells and low firing single neuron. A probabilistic reward schedule might be useful in the future to get more error trials. A separate analysis of negative versus positive prediction errors and how they distribute and carry different feature information is provided on a larger dataset in our previous work in Oemisch et al., 2019.

3. "Here, we use this model to estimate the trial-by-trial fluctuations of the expected value (EV) for the rewarded color and the choice probability (CP) of the animal's stimulus selection. EV and CP increase with learning similar to the increase in the probability of the animal to make rewarded choices, causing all three variables to correlate (Figure 4E, F)."

Figure 4 does not have E-F panels.

We corrected the sentence and added the correct Figure reference (which is revised Suppl Figure S5B).

4. Behavioral analysis: I could not find a formal definition of Choice Probability and Reward Prediction Error anywhere. I assume Equation 4 defines Choice Probability, while Rt-Vt defines RPE? I suggest making these definitions clear in the Methods, as well as the main text and the figure legend.

We added the definition in the revised text and methods and write e.g.:

“RPE is calculated as the difference of received outcomes R and expected value V (see Methods).” and

“positive reward prediction error (RPE, ‘R-V’, see Equation 2, below).” and

“Equation 4 defines the choice probability, or p(choice), that is used for the neuronal analysis of this manuscript (Sutton and Barto, 2018). P(choice) increases with trials since reversal (Supplementary Figure S5D), indicating a reduction in the uncertainty of the choice the more information is gathered about the value of the stimuli.”

Choice Probability is abbreviated in at least three different ways throughout the manuscript (e.g., p(choice), CP, CHP). Please be consistent.

Yes, pardon. We corrected this and now use “p(choice)” consistently throughout the text.

Note on terminology: Choice Probability commonly refers to the relationship between the activity of individual sensory neurons and the animal's behavioral choice (see Nienborg, Cohen and Cumming 2012 ARN). The duplicate terminology may be confusing for some readers. I suggest using a different term (e.g., Probability of Choice).

We understand the concern. By using the term p(choice) more directly in the text and defining more explicitly how p(choice) was calculated in Equation 4 we hope this reduces ambiguity with other ways of calculating or using it.

5. "We then quantified the log-likelihood of the independent test dataset given the training datasets optimal parameter values."

Where is this result plotted? What is the model performance in predicting test dataset?

We validated the model that we used here in previous work on the same dataset and do not want to repeat this. We added more details about this in the revised methods section:

“The cross-validation results were compared across multiple models in a previous study (Oemisch et al., 2019). Here, we used the best-fitting model based on this prior work.”

And we write at an early place in the text:

“This color-based reversal learning is well accounted for by an attention augmented Rescorla Wagner reinforcement learning model (‘attention-augmented RL’) that we previously tested against multiple competing models (Balcarras et al., 2016; Hassani et al., 2017; Oemisch et al., 2019). Here, we use this model to estimate …”

6. Waveform analysis: It would help to add a diagram of T2P, T4R and HR in Figure 4.

Relatedly, trough comes before the peak in extracellular spike waveforms (as apparent in Figure 4C) – T2P should be (tpeak-ttrough) in order to be a positive value?

We added the diagrams in a new Suppl. Figure S2A-B and refer to it in the revised results. We corrected the mentioning of peak to trough to the correct trough to peak (T2P).

7. "LV is a measure of regularity/burstiness of spike train and is proportional to the square of the difference divided by sum of two consecutive interspike intervals (Shinomoto et al., 2009)."

This sentence should go in the main text. The reason being; the way LV is described in the main text makes it sound like LV and CV measure the same things: "regular or variable interspike intervals (local variability 'LV'), or more or less variable firing relative to their mean interspike interval (coefficient of variation 'CV')."

We adjusted the revised text and now write:

“LV and CV are moderately correlated (r=0.26, Suppl. Figure S2E), with LV indexing the local similarity of adjacent interspike intervals, while CV is more reflective of the global variance of higher and lower firing periods (Shinomoto et al., 2009).”

8. Given how central the clustering analysis in Figure 4A is to the rest of the paper, the exact parameters that went into this analysis (HR, T4R, LV, CV, FR) should be made clear in the main text.

In addition, this clustering analysis is key to the reproducibility of e-types in other datasets. The authors have stated that "All data and code is available upon reasonable request." However, in my opinion, at least the code for the e-type clustering analysis should be made publicly available.

The code for clustering analysis is publicly available (and already used in other publications) and it is now cited explicitly in the paper along with a clarification of the parameters and reference to a new Suppl. Figure illustrating them in example sketches. We write in the revised results:

“Prior studies have distinguished different narrow spiking e-types using the cells’ spike train pattern and spike waveform duration (Ardid et al., 2015; Dasilva et al., 2019; Trainito et al., 2019; Banaie Boroujeni et al., 2020c). We followed this approach using a cluster analysis to distinguish e-types based on spike waveform duration parameters (inferred hyperpolarization rate and time to 25% repolarization, Suppl. Figure S2A,B), on whether their spike trains showed regular or variable interspike intervals (local variability ‘LV’, Suppl. Figure S2D), or more or less variable firing relative to their mean interspike interval (coefficient of variation ‘CV’, Suppl. Figure S2C).”

9. "Correlation of local variation with burst index"

Burst index is defined here, but not plotted in any figures. I suggest adding a plot depicting the relationship between local variation and burst index would be informative.

We added this in a new Supplementary Figure S2E and F and show the correlation of LV and CV (r=0.27) and LV and burst index (0.443).

10. "First, we divided trials into two groups of high and low RPE and CHP values (trials were assigned based on their median value for each neuron)."

I understood RPE and Choice Probability to be values unique to each trial, rather than to each neuron? If so, the median value should be specific to each behavior session, not to each neuron? Please clarify.

The reviewer is correct, and we corrected it and clarified the sentence. Median of the p(choice) and RPE is calculated over values of each session (on trials), not for neurons.

11. "We included only neurons with at least 50 spikes per time window."

Does this sentence mean 50 spikes per time window per trial? For a 700ms time window, this would mean that the neuron would have to be firing at ~70Hz in order to be included in this analysis! If this sentence means 50 spike per time window across trials, please clarify. In this case, please also clarify the range of trial number that went into this analysis.

It is ≥50 spikes across trials. We adjusted the text and also added the number of available trials. We write in the methods:

“We included only neurons with at least 50 spikes across trials, using on average 44 (SE= 2) trials per condition.”

Reviewer #3:

In this work, Boroujeni et al. investigated the role of different cellular subtypes in the lateral prefrontal cortex (LFPC) and anterior cingulate cortex (ACC) of the rhesus macaque as the animals performed an attention demanding reversal-learning task. The authors use an attention-augmented reinforcement learning model to track the trial-by-trial values of key decision-making variables which were then correlated against the neural activity. The cellular population was separated into broad and narrow spiking neurons using features computed from the extracellularly recorded waveforms. The authors find that the activity of the narrow spiking cells in the LFPC is correlated with the choice probability, whereas the activity of narrow spiking cells in the ACC is correlated with reward prediction errors. Interestingly, the authors find that further splitting the population of broad and narrow spiking cells into subtypes revealed that both the choice probability in LPFC and the reward prediction error in the ACC were encoded by a specific subtype of putative interneuron. The authors show that the spike-field phase synchronization of this putative interneuron subgroup is also modulated by choice probability in the LFPC and reward prediction error in the ACC, mirroring the result from their single-unit correlation analysis. The authors use these results to propose a biologically plausible circuit model of how learning in such a task might be implemented through interneuron specific synchronization.

While many of the results in the paper seem robust, some of the conclusions drawn by the authors rest on analyses and methods that require further validation and controls.

1. The clustering of the cell population into 5 broad-spiking and 3 narrow-spiking subtypes is perhaps one of the most critical results that requires further validation since a lot the conclusions in the paper rely on the outcome of this analysis. The validation that the authors include in the paper (Figure S5C, S5D) address concerns regarding the clustering quality, but it's still unclear how meaningful this separation into these 8 clusters actually is. The clustering is also performed on the pooled data across both animals, but the authors should have also shown what the clustering looks like when performed independently on the population from each animal, and if there is a meaningful correspondence between the sets of clusters recovered in the two populations.

We address the specific suggestions for further validation/recovering reliability and monkey separation of the clustering with added analysis and result figures. The additional validation of the cluster number and the cluster boundaries showed they are highly reliable also in each of the monkeys. We added new Suppl. Figure S7 and new Suppl. Figure S8 and adjusted the results text by writing:

“Cluster boundaries were highly reliable (Suppl. Figure S7). Moreover, assignment of a cell to its class was statistically consistent and also reliably evident for cells from each monkey independently (Suppl. Figure S8).”

We describe the methods in a new section “Determining cluster numbers” and write:

“We used a set of statistical indices to determine a range of number of clusters that best explains our data. […] We validated finally determined number of clusters using Akaike’s and Bayesian criteria which showed the smallest value for k=8 (AIC: [-17712, -17735, -18476, -11114] and BIC: [-1.7437, -1.7368, -1.8109, -1.0747], for k= [6,7, 8,9]).”

For the monkey specific analysis, we added to the revised methods:

“We further validated the meta-clustering results for each monkey separately. We validated the results, analogous to what is describe above. […] Second, validation according to the percent number of cells matches for each monkey (Suppl. Figure S8F).”

Overall, the clustering analysis results in our manuscript are highly consistent with a similar analysis performed in (at least) two different prior datasets in Ardid et al., 2015 and in DaSilva et al. 2019. We are currently working on a review paper that summarizes and compares the clustering based on the spike train parameters we use. Our reply to comment 7 of reviewer 2 also describes how the clustered e-types can be linked to e-types reported in recent in-vitro clustering results from mice visual cortex (Gouwens et al., 2019).

2. Most of the follow-up analysis focuses on the comparison between one specific interneuron subtype (N3) and all broad -spiking cells. I imagine that the reason for this is two-fold: (1) the N3 subtype is the only one that showed a significant modulatory effect on the multi-unit activity (Figure 4D), and (2) it seems to be special in the sense that the activity of the N3 cells is significantly correlated with choice probability in LPFC in addition to reward prediction error in ACC. While the reasons for showing key results only for the N3-type can be appreciated, the authors should have included additional control analysis to demonstrate that their results are indeed specific to the N3 subtype. For example, in Figure 7 and 8, the authors show a comparison of the spike-LFP phase synchronization between N3 and broad spiking cells, but no further characterization of subtypes within the broad spike cells or the other narrow spiking types (i.e. N1, N2).

We agree. That was an oversight on our part. We added multiple additional control analyses and results for other cell classes too. These added analyses support the special role of N3 cells in LPFC and in ACC. Perhaps most relevant are the spike-LFP results in the newly added Supplementary Figure’s S13, S14 and S15.

Suppl. Figure S15A shows the time frequency results for each cell class in LPFC around cue onset in the low and high p(choice) trials and reveals that the gamma increase to the feature cue onset is specific to the N3 class. Suppl. Figure S15B shows the time frequency results for each cell class in ACC around reward onset in the low and high RPE trials and reveals that the gamma increase to the reward onset is specific to the N3 class in the high RPE trials. These time frequency plots are noisy because many neurons fire only few spikes in the time windows of interest and there are only few cells in each class.

To more clearly illustrate the effect on gamma band spike-LFP synchronization and simplify the statistical analysis we extracted the average gamma band effect for all cell classes and sub-conditions, and summarize them directly in Figure ’s S13 and S14:

The specific gamma effect for cell classes in LPFC are shown in Figure S13: It shows that for low and high p(Choice) and low and high RPE, the N3 class sticks out by showing selectively higher gamma synchrony in the low p(Choice) condition (Suppl. Figure S13E).

In ACC it is similar. Figure S14 shows higher gamma for high RPE but not for low RPE or high or low p(choice) particularly for the N3 class (Suppl. Figure S14H).

We adjusted the revised methods sections to fairly describe these results

For LPFC:

“… N3 e-type neurons synchronizing significantly stronger to beta than broad spiking neuron types (Figure 7F) (p<0.05 randomization test, multiple comparison corrected). […] There was no difference in gamma synchrony of other cell classes in LPFC in the 0-0.7 s after reward onset in the high or low RPE trials, or around the (0.7 s) color onset in the high p(choice) trials (Suppl. Figure S13E-H, see Suppl. Figure S15A for time-frequency maps for all cell classes around cue onset).”

For ACC:

“… the N3 e-type neurons synchronized in a 35-42 Hz gamma band following the reward onset when RPE’s were high (i.e. when outcomes were unexpected), which was weaker and emerged later when RPEs were low, and which was absent in broad spiking neurons (Figure 8). […] E-type classes did not differ in their spike-LFP gamma in low RPE trials, or during the color cue period in high or low p(choice) trials (Suppl. Figure S14E-H, see Suppl. Figure S15B for time-frequency maps for all cell classes around reward onset).”

3. The authors show that the spike-field phase synchronization of the N3 subgroup is also modulated by choice probability in the LFPC (Figure 7) and reward prediction error in the ACC (Figure 8), mirroring the result from their single-unit correlation analysis (Figures 2 and 3). Unlike their firing rate analysis however, they do not show anatomical specialization in these analyses, even though the model they propose in Figure 9 clearly shows that they hypothesize this to be the case. It would be very interesting to show the analysis performed in Figure 7 for the ACC N3 population, and likewise, the analysis performed in Figure 8 for the LPFC N3 population.

We added the suggested analysis in new Figures which support the specificity of the observed effects:

Supplementary Figure S13A-E shows the reward onset aligned analysis for the LPFC broad spiking neurons and the N3 e-type in analogy to Figure 7 in the main text. There is no gamma spike-LFP synchrony for these neurons in the reward period at low or high RPE trials.

Supplementary Figure S14A-E shows the color onset aligned analysis for the ACC broad spiking neurons and the N3 e-type neurons in analogy to Figure 8 in the main text. There is no gamma spike-LFP synchrony for these neurons in the color cue period for low or high p(choice).

4. Behavior

a. In Figure 1C, I imagine that the proportion of rewarded choices at reversal (t=0, not shown) is equal to one minus the asymptotic performance? So around 0.1?

Thank you for catching this. The original plot was erroneously showing not proportion correct (but a probability estimate of correct choices). We corrected the panel to show “proportion correct choices” and also now show the last trials prior to the reversal.

b. If the stimulus-reward pairings are fully deterministic, why does the monkey require so many trials (on average 7 I believe it was) to reach asymptotic performance again?

We believe the animals learn sub-optimally slow in this task, because it is a difficult, attention demanding task. Their choice is made according to the motion direction of the stimulus with the rewarded color. When they do not get reward with their choice then this can be not only because of the wrong color of the stimulus, but also because of the wrong motion direction or even the wrong location they attended. This makes the task highly demanding and dependent on a good representation of the currently rewarded color. This task is very different to spatial reversal tasks or simple object-based reversal tasks where animals are allowed to look at the specific stimuli (In our task, animals do not make a saccade to the peripheral stimuli but to up- and down- ward presented response targets).

Please note that we tried fitting the behavior with optimal Bayesian models in previous work and confirmed that they failed to account for behavior, because they would learn too fast (Oemisch et al., 2019). Rather, we found that a reinforcement learning model with a selective decay (forgetting) of nonchosen (or: non-attended) stimulus values was outperforming other models consistently. We use this so-called attention-augmented reinforcement learning model to account.

c. Related to the previous question, is there any change in this acquisition time over the course of a session (as they experience more and more reversals)?

There is no systematic change in reversal acquisition time over the course of the session.

d. Can you show some example fits of the reinforcement learning model? For example, the choice probability and expected value as a function of the trial number around a reversal.

Thank you for this suggestion. We added some example blocks that show how the estimated RPE, the choice probability, and the value of the chosen stimulus varies trial-by-trial along with the correct/error outcome the monkey experiences. This is added in a revised Suppl. Figure S5. More extensive analysis of the model, including a comparison to alternative models that fit (and predict) the data worse, have been previously published in Oemisch et al., 2019.

We refer to the new figure panel by writing in the Results section:

“Choice probabilities (p(choice)) increase during reversal learning when reward prediction errors (RPEs) of outcomes decrease, evident in an anticorrelation of (p(choice)) and RPE of r=-0.928 in our task (Suppl. Figure S5A,B) with lower p(choice) (near ~0.5) and high RPE over multiple trials early in the reversal learning blocks when the animals adjusted to the newly reward color (Suppl. Figure S5E,F).”

5. Single Units

a. The authors correlate the neural activity with model-derived variables, like the probability of choice, and prediction error. The distributions of these variables, however (as indicated in Figure S4b, and S4C) are very skewed, and it seems like most of the variability comes from the few trials (around 10) that it takes to reach asymptotic performance after a reversal. It would be interesting to know what this correlation represents. Are the cells truly tracking small changes in the P(choice) and PE or does this reflect more of a discrete switch? Maybe the authors could show some scatters, firing rate vs. P(choice), of some example cells. How well can p(choice) and PE be decoded from the neural population?

We cannot conclusively answer how gradual or discrete the correlation occurs in the current dataset given the few and low firing neurons and the rather few learning blocks during which they were recorded. The average correlation of the N3 cell classes firing rate with p(choice) were significant but weak: r=0.08 in LPFC and with RPE in ACC it was r=0.09.

To address the reviewer, we now visualize in a new Suppl Figure 10 two example cells of the N3 – e-type, showing the positive correlation of rate and RPE in ACC, and the positive correlation of rate and p(choice) in LPFC. We pooled all trials and sorted them based on RPE/P(choice) Value. We then plot the raster of trials sorted according to the RPE and p(choice) and the pre-onset normalized heatmap. A decoding analysis of rate and synchrony is not part (and beyond the scope) of this manuscript and will be comprehensively performed in a separate project.

6. Electrophysiology/Clustering

It seems that a lot of the results in the paper rely on clustering analysis. The authors have been cautious in their approach (i.e., validating the results), but given that a lot depends on the reliability of these results, I think it would be wise to add a few more control analyses. I am not sure how feasible these are, but worth considering nonetheless:

a. Another way of validating the clustering is to do it across animals. From what I understood, the clustering (for e-type) is done using data from both animals. How well would a clustering model fit within animals, predict the clustering across animals?

In the revised manuscript we provide additional validation analyses and separately validate the clustering for the individual monkeys.

Please see our reply to comment-1 of the reviewer for details and how we adjusted the text. For the assessment of the cluster quality for each monkey separately, please see newly added Suppl Figure S8E,F.

7. Spike field coherence

a. Can the authors comment on the effect of ERPs?

We assume the reviewer suggests that the onset of the reward (or color cue) triggers a low frequency evoked response that biases how neurons phase synchronize to the LFP also at other frequencies.

To address this concern, we repeated the spike-LFP analysis and the statistical evaluation after removing the average cue- or reward onset- evoked LFP. The main p(Choice) and RPE effects in LPFC and ACC were unchanged.

We added this new result as a new Suppl. Figure 16, describe the methods, and summarize the result it in the revised main text:

“The spike-LFP synchronization results were unchanged when the average reward onset aligned LFP, or the color-cue aligned LFP was subtracted prior to the analysis, which controls for a possible influence of lower frequency evoked potentials (Suppl. Figure S16).”

b. Simply controlling for the number of spikes between conditions is not necessarily sufficient. If you have a cell that responds to one condition but does not respond to another condition, the spikes for condition 1 are going to be much more clustered in time than for condition 2. Therefore the underlying LFP is not sampled in the same way between the two conditions.

We have only few number of spikes for many cells and cell classes, so that controlling for the number of spikes seems to be a good control when quantifying and comparing spike-LFP phase consistency.

c. Is it possible to show that the spike-field coherence results are also anatomically specific? Does the synchrony of cells in the ACC and LPFC mirror the single-unit results, i.e. reward prediction error in ACC but not LPFC and choice probability in LPFC but not ACC?

We agree that this is an important point that we missed in the first submission – thank you for pointing us to this. We performed the analyses and added new supplementary figures (Suppl. Figures S13 and S14) and found that the effects are anatomically specific.

We kindly refer to the reply to comment 2 of the reviewer for details and how we adjusted the text.

[Editors’ note: what follows is the authors’ response to the second round of review.]

Reviewer #1:

This paper has made significant improvements from the previous version. Most importantly, the implementation details of the circuit simulation are clarified. The vast majority of my prior concerns have been addressed. I have only a few suggestions remaining.

1. Given that reward prediction error analysis is critical to the thesis of this paper, I am still of the opinion that it would be important to include the PSTH aligned to the reward, for narrow spiking and broad spiking neurons (as in Figure 2) as well as for important e-types (as in Figure S3).

To address the request we added to the revised manuscript information about the firing of the cell types during the reward period (new figure panels in Figure 3 and Figure5-supplement 1 (formerly Figure S9)).

First, we added the spike densities during the reward period in a revised Figure 3. And we revised the text to provide directly the reward related information by writing:

"First, we analyzed N- and B-type cell responses to the reward. In both LPFC and ACC areas, N- and B-type cells on average showed activation to the reward onset (p<0.05, randomization test, n=26 of 54 and 18 of 188 B- type cells with increases, respectively, and n=14 of 54 N- type and 5 of 188 B-type cells with decreased firing in LPFC, and n=30 of 50 N-type and 13 of 216 B- type cells with increases, respectively, and n=19 of 50 and 8 of 216 B-type cells with decreased firing in ACC). However the N- and B-type responses to the reward were not significantly different in ACC or LPFC (ns., randomization test, Figure 3A,B)."

We also added two figure panels to Figure5-supplement 1 (formerly Suppl. Figure S9) to show reward onset related activity and write in the revised figure legend:

"(G, H) Reward activated neurons of different e-types in LPFC and ACC. For LPFC the average normalized firing of each e-type to the reward onset (G) show moderately increased firing rate in most e-types. […] In ACC (H) the N2 e-type neurons showed stronger activation to the reward onset compared with other e-types (p<0.05, effect size values for B1, N2 are -0.311, -0.367 in LPFC and ACC respectively)."

We also added the effect sizes of these results to the Cohen's d effect size table which is now included in “Supplementary File 1”.

These results do not change any of the prior conclusions.

2. The added classifier analysis or predicting cell classes from their correlations with learning variables is very interesting. However, I am not clear on exactly what was used to train the SVM. The way I currently understand this analysis is that in LPFC, correlation between firing rate and p(choice) was calculated for each neuron – and this one-dimensional vector, the size of which is (Number of neurons)X1, was used to train the SVM. Is this the case? Please clarify.

The reviewer is correct. A vector of correlation values was used along with a vector of cluster label (from our clustering results) to train the SVM. So the input to the SVM was a 1D vector of correlation values with a length of Number of the neurons. We clarified it in the main text.

3. Figure S5 E and F: it is hard to see a trend in these plots. I suggest either making the dots transparent; or plotting the data as a 2D-histogram. This way it would be possible to discern where the data is the densest.

Thank you. We added two 2D histograms for RPE and P(choice) to Figure S5 and now say in the legend:

"(G, H) 2D histogram corresponding to E and F, respectively, showing the distribution of trials P(choice) (G) and RPE (H) and trial since reversal."

4. In Methods, the numbering in the equations are not unique (there's two Equation 2 and two Equation 3). Please correct.

Thank you for finding this issue. We corrected it.

5. The following sentences in Supplementary Online Information needs to be corrected as indicated:

"These circuit motifs are provided to provide a proof-of-concept that the observations can follows from biologically plausible motifs. These circuits motifs also provide predictions which can be tested in future studies."

We corrected the sentence. Thank you for pointing us to this issue.

Reviewer #2:

In this work, Boroujeni et al. investigated the role of different cellular subtypes in the lateral prefrontal cortex (LFPC) and anterior cingulate cortex (ACC) of the rhesus macaque as the animals performed an attention-demanding reversal-learning task. The authors use an attention-augmented reinforcement learning model to track the trial-by-trial values of key decision-making variables which were then correlated against the neural activity. The cellular population was separated into broad and narrow spiking neurons using features computed from the extracellularly recorded waveforms. The authors find that the activity of the narrow spiking cells in the LFPC is correlated with the choice probability, whereas the activity of narrow spiking cells in the ACC is correlated with reward prediction errors. Interestingly, the authors find that further splitting the population of broad and narrow spiking cells into subtypes revealed that both the choice probability in LPFC and the reward prediction error in the ACC were encoded by a specific subtype of putative interneuron. The authors show that the spike-field phase synchronization of this putative interneuron subgroup is also modulated by choice probability in the LFPC and reward prediction error in the ACC, mirroring the result from their single-unit correlation analysis. The authors use these results to propose a biologically plausible circuit model of how learning in such a task might be implemented through interneuron-specific synchronization.

The analysis is thorough and the authors present a nice narrative of the results, even though in some cases my interpretation of the data is a little more mixed than what is written in the paper. For example, the authors are eager to point out that their results are "interneuron specific" and yet the data that they show suggests otherwise. Take the spike-LFP synchronization results shown in Figure S15, where it seems that the modulation of pairwise phase consistency with p(choice) could also be present for the B1 cluster of cells in addition to the N3 group (no stats shown). The same could be true for the B2 type in the ACC, which seems to show differential effects for high and low RPE.

Are these real effects or are these anomalies that are biased by a few outliers? In either case, please clarify.

In response to the comment, we calculated (and added) the statistics for the time frequency synchronization results for all the cell classes. Statistically significant synchronization is shown as a black contour in the revised Figure7-supplement 2 (formerly figure S15).

The added statistics shows the synchronization effect in the 35-45 Hz gamma band was significant only for the N3 class. The B1 class in PFC did not show significant synchrony effects and thus show noisy outlier driven synchrony results. The B2 class in ACC shows a significant synchronization effect at a higher gamma band. We added this information explicitly in the main results text and write:

“Other e-type classes did not differ in their spike-LFP synchronization in this 35-45 Hz gamma band in low or high RPE trials with the exception of the B1 class in ACC that synchronized in high RPE trials at a higher >50Hz gamma band (Figure 7-supplement 3E-H, see Figure 7-supplement 2B for time-frequency maps for all cell classes around reward onset)”.

Thank you for including the new supplementary figures; I can really appreciate the additional amount of work that must have gone into preparing the new controls for the second submission of the paper. The addition of the example model fittings (Figure S5) and the correlation of the firing rate from the two example cells with the RPE and p(choice) (Figure S10) are very nice. I would recommend to the authors to move the two examples in Figure S10 to one of the main figures. In the first submission, the focus of the paper was predominantly on the N3 subtype and its specialized functional properties in ACC and LPFC. The new figures however (specifically Figure S15) show that the story is a little more mixed than originally presented. B1 for example in LPFC shows differential effects for high and low P(choice) and B2 in ACC shows differential effects for high and low RPE. In any case, the new figures provide a much more complete story and I feel made the paper stronger.

We very much appreciate the positive reception of the revision. We added to the main text that statistical testing (that we added about Figure7-supplement 2, formerly Figure S15) suggests the B2 class in ACC shows also a gamma effect at high RPE. The B1 class in PFC showed no significance. We opted to leave the example raster in the supplementary figure because it is difficult to discern a correlation finding in example a raster plot and the average results appear more convincing.

https://doi.org/10.7554/eLife.69111.sa2

Article and author information

Author details

  1. Kianoush Banaie Boroujeni

    Department of Psychology, Vanderbilt University, Nashville, United States
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    kianoush.banaie.boroujeni@vanderbilt.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2323-0648
  2. Paul Tiesinga

    Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, Netherlands
    Contribution
    Conceptualization, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Thilo Womelsdorf

    1. Department of Psychology, Vanderbilt University, Nashville, United States
    2. Department of Biology, Centre for Vision Research, York University, Toronto, Canada
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    thilo.womelsdorf@vanderbilt.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6921-4187

Funding

National Institute of Mental Health (MH123687)

  • Thilo Womelsdorf

Canadian Institutes of Health Research (MOP 102482)

  • Thilo Womelsdorf

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank Mariann Oemisch and Ali Hassani for help with the study. This work was supported by the National Institute Of Mental Health of the National Institutes of Health under Award Number R01MH123687 and a grant from the Canadian Institutes of Health Research (MOP 102482). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or. the Canadian Institutes of Health Research.

Ethics

Animal experimentation: All animal care and experimental protocols were approved by the York University Council on Animal Care (ethics protocol 2015-15-R2) and were in accordance with the Canadian Council on Animal Care guidelines.

Senior Editor

  1. Michael J Frank, Brown University, United States

Reviewing Editor

  1. Saskia Haegens, Columbia University College of Physicians and Surgeons, United States

Publication history

  1. Received: April 5, 2021
  2. Accepted: June 17, 2021
  3. Accepted Manuscript published: June 18, 2021 (version 1)
  4. Version of Record published: July 1, 2021 (version 2)

Copyright

© 2021, Banaie Boroujeni et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,208
    Page views
  • 164
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Gordon H Petty et al.
    Research Article

    Neocortical sensory areas have associated primary and secondary thalamic nuclei. While primary nuclei transmit sensory information to cortex, secondary nuclei remain poorly understood. We recorded juxtasomally from secondary somatosensory (POm) and visual (LP) nuclei of awake mice while tracking whisking and pupil size. POm activity correlated with whisking, but not precise whisker kinematics. This coarse movement modulation persisted after facial paralysis and thus was not due to sensory reafference. This phenomenon also continued during optogenetic silencing of somatosensory and motor cortex and after lesion of superior colliculus, ruling out a motor efference copy mechanism. Whisking and pupil dilation were strongly correlated, possibly reflecting arousal. Indeed LP, which is not part of the whisker system, tracked whisking equally well, further indicating that POm activity does not encode whisker movement per se. The semblance of movement-related activity is likely instead a global effect of arousal on both nuclei. We conclude that secondary thalamus monitors behavioral state, rather than movement, and may exist to alter cortical activity accordingly.

    1. Neuroscience
    Jorrit S Montijn et al.
    Tools and Resources Updated

    Neurophysiological studies depend on a reliable quantification of whether and when a neuron responds to stimulation. Simple methods to determine responsiveness require arbitrary parameter choices, such as binning size, while more advanced model-based methods require fitting and hyperparameter tuning. These parameter choices can change the results, which invites bad statistical practice and reduces the replicability. New recording techniques that yield increasingly large numbers of cells would benefit from a test for cell-inclusion that requires no manual curation. Here, we present the parameter-free ZETA-test, which outperforms t-tests, ANOVAs, and renewal-process-based methods by including more cells at a similar false-positive rate. We show that our procedure works across brain regions and recording techniques, including calcium imaging and Neuropixels data. Furthermore, in illustration of the method, we show in mouse visual cortex that (1) visuomotor-mismatch and spatial location are encoded by different neuronal subpopulations and (2) optogenetic stimulation of VIP cells leads to early inhibition and subsequent disinhibition.