ACC neural ensemble dynamics are structured by strategy prevalence

  1. Mikhail Proskurin
  2. Maxim Manakov
  3. Alla Karpova  Is a corresponding author
  1. Janelia Research Campus, Howard Hughes Medical Institute, United States
  2. Department of Neuroscience, Johns Hopkins University Medical School, United States

Abstract

Medial frontal cortical areas are thought to play a critical role in the brain’s ability to flexibly deploy strategies that are effective in complex settings, yet the underlying circuit computations remain unclear. Here, by examining neural ensemble activity in male rats that sample different strategies in a self-guided search for latent task structure, we observe robust tracking during strategy execution of a summary statistic for that strategy in recent behavioral history by the anterior cingulate cortex (ACC), especially by an area homologous to primate area 32D. Using the simplest summary statistic – strategy prevalence in the last 20 choices – we find that its encoding in the ACC during strategy execution is wide-scale, independent of reward delivery, and persists through a substantial ensemble reorganization that accompanies changes in global context. We further demonstrate that the tracking of reward by the ACC ensemble is also strategy-specific, but that reward prevalence is insufficient to explain the observed activity modulation during strategy execution. Our findings argue that ACC ensemble dynamics is structured by a summary statistic of recent behavioral choices, raising the possibility that ACC plays a role in estimating – through statistical learning – which actions promote the occurrence of events in the environment.

Editor's evaluation

This manuscript posits a novel role for the anterior cingulate cortex (ACC) in coding for sequential action strategies and the prevalence of each strategy. These findings provide important insight into ACC function and will therefore be of broad interest within the field of cognitive neuroscience. The evidence supporting the primary hypothesis is convincing.

https://doi.org/10.7554/eLife.84897.sa0

Introduction

Flexibility in choosing one’s behavioral strategy is a foundational characteristic of intelligent behavior, enabling rapid detection and adaptation to changes in the environment. In mammals, the brain’s ability to adaptively change behavior is thought to depend on the coordinated action of medial frontal cortical areas that keep track of the information necessary for context-appropriate choices of strategy (Domenech et al., 2020; Donoso et al., 2014). Functional imaging studies in human subjects and non-human primates have suggested that the anterior cingulate cortex (ACC), in particular, is a critical cortical node that translates contextual information into strategy changes (Domenech et al., 2020; Donoso et al., 2014; Hayden et al., 2011; Kolling et al., 2012; Sarafyazd and Jazayeri, 2019; Seo et al., 2014). Less clear is what specific computations are instantiated in ACC neural ensemble dynamics, in part because analyses of the diverse neural responses in ACC have yielded few simplifying principles. Previous interpretations of the diversity of individual neural responses in frontal regions such as the ACC have included a direct role in enabling the separation of distinct contexts (Rigotti et al., 2010), and in tracking the animal’s evolving motor state (Musall et al., 2019; Stringer et al., 2019). However, unexpectedly specific ACC responses associated with seemingly self-directed strategic choices have also been observed in tasks that relaxed some of the experimental control over subjects’ behavioral responses (Schuck et al., 2015; White et al., 2019), raising the possibility that more structured responses in the ACC remain to be discovered.

One hemodynamic study of the frontal cortical engagement in tasks with less external instruction highlighted the possibility that ACC is specifically engaged during self-guided exploration, with both the self-generation of a specific information-sampling strategy and its evaluation contributing to the observed signal (Walton et al., 2004). The advantage of choosing one’s strategy for sampling information when learning in complex settings has long been appreciated (Markant and Gureckis, 2014); indeed, the idea that subjects learn more effectively when they self-direct their learning experience is central to many educational philosophies (Boekaerts, 1997; Bruner, 1961). Implicit in these notions is a key role for the self-guided component of knowledge acquisition. As such, it is notable that the medial frontal lobe, including the supplementary motor cortex (SMC) and the ACC, have been proposed to process information related to self, and to implement flexible, self-guided actions (reviewed in Passingham et al., 2010, but see Schüür and Haggard, 2011). The cingulate region also features prominently in emerging efforts to experimentally expose task-related information seeking and to uncover its neural underpinnings (Wang and Hayden, 2020; White et al., 2019). Thus, evaluating neural activity in behavioral settings that require self-directed discovery of effective strategies may represent a fruitful direction in the search for organizing principles of frontal cortical neural ensemble dynamics.

In this study, we incorporate self-guided search for task structure into the experimental design by requiring rats – in an apparatus that has ‘left’ and ‘right’ nose ports – to discover (without any explicit instruction) a specific rewarded sequence of ‘Left’/’Right’ choices from a larger set of structured possibilities. During behavioral sessions that incorporate blocks where the latent target sequence remains the same over 200–500 trials, rats infer – in each block – the specific target sequence through self-guided exploration, strongly biasing their choices accordingly within tens of trials following unsignalled block transitions. Nevertheless, throughout each block, rats also continue to periodically sample alternative sequences, favoring those previously experienced as other latent targets. As such, each sequence in the task set of structured possibilities gets executed in several discrete global contexts – which we define as inferred target sequence – as well as in a constantly changing local context – which we define as the specific set of self-guided sequence choices in recent past. We demonstrate that under these conditions, ensemble activity in ACC during the execution of a specific sequence tracks both global and local contexts in which that sequence is being executed. In particular, a summary statistic that we capture as the prevalence of that sequence in the last 20 trials, can be reliably decoded from the ACC ensemble activity throughout the extent of sequence execution. Remarkably, the tracking during sequence execution of the local sequence prevalence persists even as ACC activity is markedly re-organized between global contexts. We further demonstrate that reward is also tracked in the ACC in relation to specific discovered sequences but that reward tracking alone is insufficient to explain the observed prevalence encoding during sequence execution. Our findings demonstrate that the ACC continuously represents a summary statistic reflecting the local prevalence of self-guided behavioral strategies, suggesting that such encoding of recent self-action might be a fundamental part of the animals’ algorithmic approach to discovering adaptive strategies in complex settings.

Results

Behavioral task for self-guided search for task structure and strategy encoding in ACC

To examine frontal cortical neural dynamics during self-guided exploration of a complex environment, we developed a behavioral task that requires rats to discover, without any explicit instruction, a specific sequence of ‘Left’/’Right’ choices that is preferentially rewarded. A typical behavioral session in this framework comprises a series (750–2000) of self-initiated trials that involve a choice between two options – left or right nose port – with a reward being delivered at the choice port upon execution of the latent (i.e. not explicitly revealed) target sequence of choices – such as right-right-left, ‘RRL’. Our behavioral paradigm requires center-port entries between each choice of a side port. Hence, execution of the length 3 sequence ‘RRL’ requires the series of 6 port entries: cRcRcL (Figure 1a). The requirement for center port entries between each side port entry was chosen in part to control the movements associated with the execution of a given sequence; with this requirement, the rat’s movements to select each instance of ‘R’ in the sequences ‘RRL’, ‘LLR’, etc., are behaviorally constrained to involve withdrawing from the center port, then moving to and entering the right port. This center port requirement, therefore, facilitates the separation of neural signals specifically associated with the sequence of ‘R’ and ‘L’ choices from the movements associated with selecting them (see below).

Figure 1 with 3 supplements see all
Strategy encoding in the ACC.

(a) Left panel: Concept of the behavioral task. After initiating at the center port, the animal is eligible to receive a reward only if his sequence of past choices conforms to a latent target sequence, like ‘Left-Left-Right’. Note that the identity of the latent target is not otherwise cued in any way. Liquid reward was delivered directly at choice ports. Right panel: Schematic of the notation used for behavioral data presentation. Note that nose port entries are omitted from schematics in other panels for simplification. (b) Probability of target sequence concatenations across the behavioral dataset. Shuffle randomized trial numbers. (c) Sample behavioral trace, in trial time, around two block transitions. (d) Cross-validated performance errors for linear classifiers trained to distinguish components of different strategies based on ACC neural activity in a decoding window anchored on center port initiation entry. n=36 sessions, N=4 animals for all basic sequence task comparisons; n=9 sessions, N=3 animals for competitor sessions; n=11 sessions, N=1 animal for circularly permuted strategies; n=8 sessions, N=3 animals for 1st R in ‘RRL’ vs ‘RLL’ (or L in ‘LLR’ vs ‘LRR’ decoding). See legend in Figure 1—figure supplement 3 for more details on the non-sequential task (e) Activity traces for an example ACC neuron in the 2 second window around center port entry on the R1 step of ‘RLL’ and ‘RRL’ sequences -a step matched both in immediate history and distance to reward. Black traces: ‘RLL’ rule; blue traces: ‘RRL’ rule. ***, p<0.001.

To ensure that animals continue to self-direct a search for the relevant task structure throughout each behavioral session, we changed the latent target sequence identity in an unsignalled manner every 250–500 trials (Methods). To collect a sizable amount of reward in this task, an animal needs to discover the target sequence and locally structure its choices to preferentially conform to the target sequence, yet remain flexible enough to efficiently adapt to unsignalled changes in its identity (Figure 1a–b). For most experiments in this study, we restricted the set of possible latent target sequences in any individual session to some, or all, of the four non-trivial three-step sequences (‘LLR’, ‘RRL’, ‘LRR’, and ‘RLL’). Under these conditions, expert animals flexibly and reliably discovered and exploited the locally-relevant latent target: within tens of trials following block transitions, the animals’ choices typically became dominated by the new target sequence, even in the presence of substantial (up to 30%) sporadic omission of reward for the properly executed target sequence (Figures 1b–c2b, Figure 1—figure supplement 1).

Activity of ACC neurons associated with a specific sequence of actions changes depending on whether that sequence represents the dominant strategy or is a transiently re-explored alternative.

(a) Left panel: Concept of the behavioral task (same as in Figure 1). Right panel: (Top) Block-wise structure promoted the local pursuit of one dominant sequence (here, ‘Left-Left-Right’, as shown in the thought bubble) at the expense of others. The dominant strategy was occasionally interrupted by explorations of alternative sequences (here, likely ‘Right-Right-Left’, blue shading). (Bottom) Five analysis windows chosen to minimize trajectory confounds were anchored on center- and side-noseport entry events associated with each of the three steps in the sequence, with the exception of the first center port entry. The latter omission was chosen to minimize the contribution from feedback-related activity modulation associated with preceding choices. (b) Top panel: Sample behavioral trace, in trial time, around a block transition. Bottom panel: boxed region of the behavioral trace, in trial time. Note a marked preference at the beginning of the behavioral trace for ‘Right-Right-Left’ and at the end for ‘Left-Left-Right’. Putative exploratory sequences are shaded blue (see Methods for details about how exploratory sequences were identified). *, rejected as an exploratory sequence because it overlapped with the dominant sequence to its left, and did not have the temporal profile consistent with sequence marking (see text) to be rescued from the ‘discard’ group; **, rejected as a putative exploratory sequence because of an overlap with the dominant sequence to its left, and an unusually long break between putative steps 1 and 2. (c) Distribution of local sequence prevalence values for ‘Left-Left-Right’ and ‘Right-Right-Left’ across all dominant and exploratory instances in implanted animals. (d) Activity of an example ACC neuron for three concatenated unrewarded instances of ‘Right-Right-Left’ at the end of the dominant epoch in (b) (left panel) and during an exploratory bout following subsequent dominance of ‘Left-Left-Right’ (right panel). Dashed lines indicate beam breaks at port entries. Bars at the top of the panel: raw spike train. (e) Activity of two other ACC neurons from the behavioral session in (b), aligned in the five one-second analysis windows anchored on port entry events. Grey: dominant instances of ‘Right-Right-Left’ that followed an unrewarded ‘Right-Right-Left’. Blue: exploratory instances of ‘Right-Right-Left’. (f) Fraction of all recorded ACC units that displayed a significant modulation between ‘dominant’ and ‘exploratory’ contexts for each of the five analysis windows. Individual points correspond to different sessions. Error bars represent standard deviation. n=35 sessions, N=4 animals.

The observed behavioral flexibility did not result from an inability to commit to the discovered target sequence. Indeed, the target sequence clearly dominated behavioral choice streams during blocks of stability, with long concatenations of the target sequence frequently evident in the behavioral records of expert animals (Figures 1b–c2b, Figure 1—figure supplement 1b). Nevertheless, clear deviations from this dominant pattern, with the animals’ choices appearing instead to conform to other possible target sequences, were also present within all blocks (Figures 1c and 2a–c). Although some of the deviations from the currently rewarded target sequence may represent errors of execution, the presence of bouts that often contain direct concatenations of previously reinforced sequences (Figure 1—figure supplement 2a–d) argue that at least some of these deviations represent transient exploratory resurgence of alternative sequences. Furthermore, while such transient deviations away from the dominant sequence were significantly more likely to follow the absence of an expected reward (Figure 1—figure supplement 2i), similar strategy deviations were present when no reward was omitted (Figure 1—figure supplement 2f–h), suggesting that animals continue to sporadically sample other sequences even when not extrinsically prompted. Thus, with only two well-defined individual actions (‘left’ and ‘right’), this framework provides a means to engineer a rich and flexible repertoire of multi-step sequences that animals evaluate in various behavioral contexts. As such, this setting presented us with an opportunity to evaluate whether the richness of contextual variation that accompanies the added agency of self-guided strategy selection is reflected in the ACC ensemble dynamics. Indeed, the ACC is thought to play a central role in motivating extended, multi-step behaviors (Holroyd and Yeung, 2012), and our previous work demonstrated the rodent ACC homologue’s involvement in unguided discovery of action sequences (Tervo et al., 2021). Nevertheless, before examining any putative encoding of contextual variation, we first verified that the specific choice of sequence in this more complex sequence task is reflected in the ACC neural dynamics by establishing that individual sequences could be decoded from ensemble activity. For these sequence decoding analyses, we focused on sequence instances that both matched the latent rewarded target and were executed after that sequence had established dominance in the animal’s choices.

A simple linear classifier could reliably distinguish ACC activity between ‘LLR’ and ‘RRL’ sequences – the two main target sequences in our dataset – when provided with firing rates of all active cells in 500 ms windows centered on side port entries for the three steps in each sequence (Figure 1—figure supplement 3 median cross-validated classification error 0.0, IQR 0.006, Methods). Robust classification was also achieved when the decoding window for each of the three sequence steps was instead anchored on the initiation (center) port entry common to all trials, prior to when the overt choice on that trial was made by the animal (Figure 1d, median cross-validated classification error 0.0, IQR 0.007, Methods). Nevertheless, any interpretation of the decodability of the full ‘LLR’ vs ‘RRL’ ACC representations is confounded by a strong difference in the encoding of the individual steps composing the sequences (‘L’ vs ‘R’; Figure 1d, Figure 1—figure supplement 3, median cross-validated classification error 0.008, IQR 0.016 and 0.009, 0.0017 for side port-centered and center port-centered decoding windows respectively). To address this confound, we tested whether the encoding of the same action, such as ‘R’, differs between ‘LLR’ and ‘RRL’ sequences. We indeed found that even controlling for the specific action in this manner, the corresponding neural activity could be used to accurately decode whether that action was embedded in ‘LLR’ versus ‘RRL’ sequences (Figure 1d, Figure 1—figure supplement 3; median cross-validated error 0.10, IQR 0.14 and 0.08, 0.13 for side port-centered and center-port centered decoding windows, respectively). Moreover, when we performed an equivalent analysis for all sets of consecutive ‘Left’, ‘Left’, ‘Right’ and ‘Right’, ‘Right’, Left’ triplets found in the choice streams generated when animals made largely independent choices on individual trials (see a detailed explanation in the legend for Figure 1—figure supplement 3), we found a significantly higher classification error (Figure 1d, Figure 1—figure supplement 3; median cross-validated error 0.42 and IQR 0.05 for the non-sequential setting vs 0.10, IQR 0.14 for the sequence task, p<10–12, Wilcoxon rank-sum test), suggesting that the differential encoding of ‘R’ in ‘LLR’ vs ‘RRL’ reflects some aspect of the task structure.

The observed differences in ACC representations for a given ‘L’ (or ‘R’) action within the sequences ‘LLR’ and ‘RRL’ could reflect the differential encoding of distinct strategies that are currently being pursued. However, the ‘L’s (or ‘R’s) in these different sequences are also associated with differences in the immediate history of other actions (‘L’ in ‘RRL’ follows an ‘R’, while one of the ‘L’s in ‘LLR’ follows an ‘L’) and in their proximity of rewards (no ‘R’ in ‘RRL’ is ever rewarded, whereas most ‘R’s in ‘LLR’ are). We therefore carried out additional analyses to ask whether either of these factors – surrounding actions, or reward proximity – could account for the apparent strategy encoding in ACC, taking advantage of a subset of sessions that tasked our animals with discovering additional three-letter target sequences.

To test for the influence of surrounding actions on sequence encoding, we examined behavioral epochs when distinct latent targets were associated with matching (but permuted) sequences of actions leading rats to produce matched behavioral streams despite pursuing different target sequences. For example, rats that discovered the latent target ‘RRL’ often evidenced this by producing long concatenations of the correct sequence (‘…RRLRRLRRL…’). When these same rats discovered the different target ‘LRR’ and adopted the appropriate and distinct dominant strategy of repeating the ‘LRR’ sequence, the resulting behavioral stream matched the one previously observed with the ‘RRL’ target (‘…(L)RRLRRLRRL(RR)…’). In such cases, we found that the currently dominant target sequence still could be readily decoded from ACC activity during individual actions such as ‘L’ despite their being embedded in indistinguishable choice streams (e.g. ‘…RRLRR…’; Figure 1d, Figure 1—figure supplement 3; median cross-validated classification error 0.10, IQR 0.04 and 0.11, 0.05 for side port-centered and center port-centered decoding windows, respectively).

We adopted a similar approach to examine whether the ability to decode the currently dominant strategy persisted when controlling for both the immediate past choice and for proximity to reward. For example, we assessed how well the activity during the first ‘L’ could be used to decode a dominant strategy of ‘LRR’ versus ‘LLR’. In this case, during concatenations of the currently rewarded sequences, the proximity from the first ‘L’ to reward delivery is matched (as is the immediately preceding action). Here too, despite controlling for reward proximity, the currently dominant strategy could readily be decoded (Figure 1d, Figure 1—figure supplement 3; median cross-validated classification error 0.13, IQR 0.09 and 0.14, 0.11 for side port-centered and center port-centered decoding windows respectively). Combined, these observations argue that individual multi-step sequential strategies have distinct ACC representations, permitting us to next evaluate whether the ACC representation of any specific sequence is further modulated by the specific context in which a particular instance of that sequence is being executed.

Identification of distinct sequence strategies in the behavioral stream

To set the stage for examining contextual modulation of ACC’s sequence representation, we first sought to identify all instances of individual sequence targets in the task set. The robust performance our rats displayed on this task (Figures 1b and 2c, Figure 1—figure supplement 1) suggests that tasks requiring discovery of latent structure through self-guided exploration align particularly well with how brains naturally make sense of complex environments (Gottlieb and Oudeyer, 2018; Tervo et al., 2016; Wang and Hayden, 2021). However, such an experimental framework poses challenges for parsing behavioral stream to identify specific sequence instances evaluated by the animal. Parsing a continuous stream of left and right choices to identify ‘legitimate’ sequence instances is easy for the target sequence once it has been discovered and has locally established dominance in the animal’s choices. Indeed, a high prevalence of target sequence concatenation under such ‘dominant’ condition, and a scarcity of choices that conform to other patterns, argue that almost every instance of a pattern conforming to the latent target is likely to be an instance of that sequence actually evaluated by the animal (see Methods for a detailed description of the filter used to select ‘dominant’ sequence instances). In contrast, parsing is much harder for the deviations from the dominant target outside of clear sequence concatenations. This is particularly challenging in cases where the task set contains circularly permuted strategies: does a ‘LRRL’ deviation during a ‘LLR’ block reflect a ‘LRR’ instance, a ‘RRL’ instance, or neither? While the additional uncertainty inherent in parsing circularly permuted sequences can be resolved by focusing on sessions that contained only ‘LLR’ and ‘RRL’ blocks, not even every apposition of ‘Left’, ‘Left’, and ‘Right’ choices in a ‘RRL’ block will reflect an exploratory instance of the ‘LLR’ sequence. We therefore next sought to delineate an objective criterion for including any lone putative exploratory sequence instance in the subsequent investigation; with the exception of a few pre-specified control experiments, the remaining analyses are restricted to ‘LLR’/’RRL’ block switches.

Our approach was grounded in the expectation that animals would pause, if only briefly, at the side nose port on the last step of a true exploratory sequence instance, marking sequence completion. Under this assumption, putative exploratory sequence instances for which the duration of the third step exceeds a preset threshold, can be objectively included in the ‘exploratory’ dataset (i.e. the set of instances for the sequence of interest executed in the global context of another inferred latent target). Indeed, clear shifts to longer within-side-port and side-to-side durations for step three – as compared to steps one and two – were present across all ‘dominant’ sequence instances for which the otherwise scheduled reward was omitted (Figure 1—figure supplement 2, Methods), permitting an unbiased selection of the specific threshold. The ‘exploratory’ dataset thus included sequence concatenations and lone sequence instances that passed the temporal selection threshold (see Methods for the full description of the selection procedure). Overall, the rich contextual variation associated with the execution of different instances of any specific target sequence in both ‘dominant’ and ‘exploratory’ conditions provided us with an opportunity to determine whether examining frontal cortical neural activity through the lens of this natural contextual variation might reveal dynamics that would shed additional light on the computations performed by these circuits.

Functional reorganization of the ACC network constrains its representation of a specific strategy in distinct inferred global task contexts

ACC is thought to guide contextually-appropriate strategy selection (reviewed in Heilbronner and Hayden, 2016; Holroyd and Verguts, 2021; Kolling et al., 2016; Monosov et al., 2020; Shenhav et al., 2016), prompting us to begin our analysis of task-related neural responses by evaluating how global behavioral context is reflected in ACC ensemble dynamics; we define the global context by the sequence that locally dominates the animal’s choices – likely a manifestation of the animal’s inference about the currently relevant task structure. Specifically, we first sought to determine if ACC activity associated with a specific behavioral sequence is reorganized depending on whether that sequence represents a dominant behavioral strategy or is sampled as a part of an exploratory bout, without classifying the latter global context further according to the identity of the inferred target sequence. Indeed, while abrupt changes in the activity of ACC neurons coincident with behavioral transitions to exploration have been reported previously (Durstewitz et al., 2010; Emberly and Seamans Jeremy, 2019; Karlsson et al., 2012; Powell and Redish, 2016), whether these transitions in ACC activity merely mark the behavioral state change, or reflect an actual task-related re-organization of activity whereby representations of individual strategies are also marked with contextual content, remains unclear. Targeted recordings of ACC activity – performed in a wireless configuration that did not impair the animals’ behavioral flexibility – revealed that marked differences in activity associated with the execution of specific sequences in ‘dominant’ versus ‘exploratory’ contexts could indeed be readily observed across many ACC neurons, frequently evident even in non-trial averaged activity traces (Figure 2d). To facilitate detailed comparisons of neural activity in the face of inevitable variability in the spatio-temporal profiles of movements between nose ports for different instances of sequence execution, we have focused all analyses on five 500 millisecond windows anchored on center and side noseport entry events associated with the individual steps in the sequence (Figure 2a, see also a more detailed evaluation of potential motor confounds below).

An alignment of sequence-related activity in the five constrained analysis windows across different sequence instances exposed various forms of contextual activity modulation in individual cells (see Figure 2e for alignment of spike rasters, Figure 3b for alignment of heatmap representations). Indeed, both decreases, increases, and mixed modulations of activity in the exploratory context relative to that observed when the sequence represented the dominant strategy were observed (see Figures 2e and 3a for examples). At least 76% of all recorded units (787 of 1042 total ACC recorded units, with no separation of potential interneurons) displayed significant modulation in at least one of the five analysis windows – a lower bound given the limitation that the modest size of the exploratory dataset places on the statistical power of these analyses – and a roughly equal fraction of all units displayed modulation in each of the analysis windows (Figure 2f). Thus, activity changes related to the inferred global task context are present at the single-cell level in many ACC neurons.

Figure 3 with 2 supplements see all
Representational transitions in ACC reflect large-scale functional reorganizations of the ACC network between inferred global behavioral contexts.

(a) Schematic of the activity state space for an individual neuron. Three of the five dimensions corresponding to the analysis windows are shown. Two clouds schematize activity of that neuron associated with a specific behavioral sequence for all dominant (grey) and exploratory (blue) instances of the sequence. (b) Heat map representations of normalized activity associated with ‘Right-Right-Left’ sequence execution of 58 simultaneously recorded ACC neurons. Different sequence instances are stacked vertically, with two ‘dominant’ blocks separated by a period when the ‘Right-Right-Left’ sequence was occasionally explored in the background of ‘Left-Left-Right’ dominance. ’exp’, ‘exploratory’ instances. Neurons are arranged according to a ‘transition score’ defined as the distance between the two cloud centroids normalized by root mean of variance within each cloud (see Methods). (c) Similarity matrix for an example session, comparing the ACC ensemble activity across ‘Right-Right-Left’ instance pairs, using Euclidean distance in the network state space. Black lines indicate the boundary between ‘dominant’ and ‘exploratory’ contexts. (d) Euclidean distance (RRL instance-to-instance) in the state space between the relevant ‘dominant’ and ‘exploratory’ clouds for the experimental ACC data, and for the control state space, where the labels of ‘dominant’ and ‘exploratory’ (or of the specific ‘exploratory’ context) were randomly shuffled across the dataset. n=35 sessions, N=4 ACC-implanted animals. (e) Behavior of the ACC ensemble during persistence of the dominant strategy past the unsignalled transition in the rewarded target.(Top panel) Example behavioral transition with a long dominant RRL ‘tail’. (Bottom left panel) Heat-map representation of the activity of 5 ACC cells for consecutive RRL instances before (i.e. when RRL dominance coincided with it being the rewarded target, ‘before’ in panel), during (when RRL continued to dominate the animal’s choices but the target sequence had changed, ‘tail’ in panel) and after the behavioral transition. Note that the later set of examples included several exploratory instances (executed much later, once another sequence had established dominance, ‘exp’ in the panel), as well as several of the subsequent instances of ‘dominant’ RRL later in the session. ‘exp’, ‘exploratory; ‘dom’, ‘dominant’. (Bottom right panel) RRL instance-to-instance Euclidean distance in the state space across these distinct epochs for all long ‘dominant tail’ examples. n=12 sessions, N=4 animals. (f) Behavior of the ACC ensemble during ‘ON-target’ and ‘OFF-target’ persistence with a dominant ‘LLR’ sequence. (Top panel) Example behavioral trace from two ‘ON-target’ epochs and one ‘OFF-target’ epoch within the same behavioral session. Note that the animal had responded to a block change and adjusted his strategy before settling into an ‘OFF-target’ persistence with ‘LLR’ (middle epoch). (Bottom left panel) Heat-map representation of the activity of nine ACC cells for LLR instances in the three epochs. (Bottom right panel) Instance-to-instance Euclidean distance in the state space between an ‘ON’ and ‘OFF’ contexts and between two ‘ON’ contexts. n=4 sessions, N=2 animals. (g) Euclidean distance between the centroids of the ‘dominant’ and ‘exploratory’ clouds for the experimental ACC, M2, and SMC data, and for the control state spaces, where the labels of ‘dominant’ and ‘exploratory’ were randomly shuffled across the dataset. n=37 sessions, N=4 ACC/M2 -implanted animals; n=18 sessions, N=3 SMC-implanted animals. Error bars represent standard deviation. n.s., not significant, ***, p<0.001.

To examine these contextual changes in the ACC representation associated with a specific behavioral sequence in more detail, we next evaluated population responses in a neural state space. For these analyses of potential contextual re-configurations, we used a state-space framework that assigns the mean firing rate of an individual neuron in each of the five analysis windows to a single dimension. Consequently, a point in this five-dimensional state captures the activity of that neuron during one instance of sequence execution (Figure 3a–b, Methods). The ensemble representation, in turn, is captured by further expanding the state space dimensionality to include these five dimensions for each of the simultaneously recorded neurons (Figure 3c). When we used similarity matrices to visualize instance-to-instance distance between ACC representations for a specific sequence in the full state space, a substantial separation of the two clusters formed by the ‘dominant’ and ‘exploratory’ instances of sequence execution was readily evident (Figure 3d). Indeed, the Euclidean distance between the centroids of these experimentally observed ‘dominant’ and ‘exploratory’ groups differed markedly from that observed when the ‘dominant’ and ‘exploratory’ labels were randomly shuffled across the set of all instances of sequence execution (Figure 3d, 0.73±0.18 for experimental data vs 0.268±0.077 for shuffled data, p<10–12, Wilcoxon rank-sum test, n=35 sessions, N=4 animals; 2.6+/-1.9 block changes per session). Moreover, the mean Euclidean distance in the activity state space between two ‘dominant’ blocks separated in time was significantly smaller than the mean pairwise distances between a dominant and exploratory block, arguing against the possibility that the observed representational transitions in ACC arose from an instability in neural recordings (Figure 3c and d). Thus, ACC ensemble activity markedly changes its representation of a particular behavioral sequence when that sequence no longer represents the locally dominant behavioral strategy but settles back into a similar state every time the animal returns to its pursuit of that sequence over all others.

The strong similarity of ACC ensemble configurations across distinct, temporally segregated ‘dominant’ contexts (Figure 3e) argues that these representational rearrangements reflect something other than a mere episodic record of distinct behavioral contexts. Given that ACC is thought to encode signals that convey the value of pursuing alternative courses of action as opposed to the current, default action plan (Behrens et al., 2007; Blanchard and Hayden, 2014; Hayden et al., 2009; Karlsson et al., 2012; Kolling et al., 2012; Kolling et al., 2014; Ma et al., 2016; McGuire et al., 2014; O’Reilly et al., 2013; Powell and Redish, 2016; Procyk et al., 2000; Schuck et al., 2015; Tervo et al., 2021), the abrupt transitions in the ACC representation of specific sequences we observed may be designed to tag these representations simply as ‘dominant’/’default’ or ‘exploratory’/ ’alternative’. However, broader contextual signals are also thought to be present in this cortical region (Caracheo et al., 2018; Euston et al., 2012; Seamans and Floresco, 2022; Shenhav et al., 2016; Tomlin et al., 2006), and thus the ACC representational transitions may carry a richer contextual content. To distinguish these possibilities, we compared, in a small subset of sessions, the ACC representation of exploratory ‘RRL’s in two separate contexts: the ‘LLR’ context (i.e. one where the animal otherwise pursued ‘LLR’ as the default strategy) and the ‘LLLR’ context. Despite the limited statistical power of these smaller exploratory datasets, representational changes between these distinct exploratory contexts were clearly evident both at the single neuron and the population level (Figure 3e, Euclidean distance of 0.70+/-0.11 for two sets of exploratory RRL representations vs 0.40+/-0.08 for a control set with scrambled contextual labels, p<0.008, Wilcoxon rank-sum test, n=5 sessions, 2 animals). Notably, the marked restructuring of ACC network configuration that underpins the differential encoding of exploratory ‘RRL’s in these two contexts argues against the possibility that these large scale-reconfigurations merely reflect the absence or presence of reward for a given sequence in that task epoch since none of the ‘RRL’s in either context were ever rewarded. Rather, these observations support the notion that the ACC network functionally reconfigures in distinct behavioral contexts and argues that the inferred global contextual content of representational transitions in ACC is richer than the ‘dominant’/’exploratory’ dichotomy.

Our parsing of the global behavioral context based on the dominant strategy chosen by the animals rather than based on the experimentally imposed identity of the target sequence is in keeping with the view that the medial prefrontal cortex does not simply track external cues that situate the animal in time and place, but instead reflects the state of the animal’s emotions and beliefs (Caracheo et al., 2018; Euston et al., 2012; Seamans and Floresco, 2022). Nevertheless, the externally imposed task context (i.e. the identity of the latent target sequence at any moment) and its parsing by the animal that presumably shapes the choice of the dominant strategy are strongly correlated in expert animals (Figure 1—figure supplement 1). Thus, to further probe the validity of parsing based on the animal’s behavior, we next examined more closely the ACC neural ensemble activity evolution in two specific cases where the dominant strategy differed from the target sequence. First, at block transitions between the familiar ‘LLR’ and ‘RRL’ target sequences, the dominance of the previously rewarded target often transiently persisted in face of a latent rule change (Figure 3e). Ensemble states associated with sequence execution during such dominant sequence ‘tails’ did not cluster away from states observed earlier in the ‘dominant’ context, and were equally distant from the exploratory cluster (Figure 3e, Euclidean distance of 0.43+/-0.06 for sequence instances within ‘tails’ to other dominant sequence instances vs 0.80+/-0.15 for sequence instances within ‘tails’ to exploratory sequence instances, p<10–6, Wilcoxon rank-sum test, n=17 ’tail’ examples, N=12 sessions, N=4 rats, 2.1+/-1.1 block changes per session). Second, in sessions where the target sequence set included not only the familiar ‘LLR’ and ‘RRL’ but also the recently introduced ‘LLLR’ and ‘RRRL’, animals’ choices in the ‘LLLR’ and ‘RRRL’ blocks were often dominated by the cognate familiar shorter sequence due to a lack of proficiency with the longer targets. In such cases, ensemble states associated with such ‘dominant’ but unrewarded ‘LLR’ sequence instances also did not cluster far away from ones observed earlier in the same session when the dominant ‘LLR’ strategy matched the rewarded target (Figure 3f, Euclidean distance of 0.47+/-0.10 for across ON-target dominant contexts vs 0.53+/-0.12 between OFF-target and ON-target dominant contexts, p=0.68, Wilcoxon rank-sum test, n=4 sessions, N=2 animals). Furthermore, since persistence with an unrewarded sequence in this case occurred after the animals had detected an unsignalled block change and switched away from the previous strategy (Figure 3f), the similarity of ensemble states cannot be explained by a lack of rule switch awareness that potentially confounds the dominant ‘tails’ example. Together with our previous observation that the ACC neural representation associated with a specific probabilistically rewarded action differed markedly depending on the strategy-related context (Tervo et al., 2021), the unambiguous co-segregation of the ensemble representation for dominant sequence instances executed in no reward condition we observed here further argues against the simple ‘reward’/’no reward’ explanation of network reconfigurations. More broadly, these observations support parsing of the behavioral context through the lens of the selected strategy.

We also investigated whether the contextual reorganization of ensemble activity we observed in the ACC reflects an ACC-specific computation or is present in other parts of the frontal cortex. Indeed, neural correlates of decision task parameters have long been observed across many frontal cortical areas (Cisek, 2012; Cisek and Kalaska, 2010; Hunt and Hayden, 2017; Yoo and Hayden, 2018), and recent advances in the ability to simultaneously record from tens of thousands of neurons across many brain areas in the mouse have only further emphasized how widely distributed the task-related encoding can be in the brain (Steinmetz et al., 2019; Stringer et al., 2019). We focused on two distinct regions along the rostro-caudal axis of the medial lobe. At the rostral end, we targeted, within the same set of recording sessions, the immediately dorsal to the ACC part of the premotor cortex (henceforth, M2) that contains, among other areas, the putative rat homologue of the frontal orienting fields recently implicated in a range of sensory-guided decisions (Ebbesen et al., 2018). At the caudal end, we probed, in a separate set of animals, the neural dynamics in a premotor region that shares key afferent/efferent projection patterns with the primate SMC and is functionally required for self-guided action sequencing in our behavioral framework (Figure 3—figure supplement 2, Methods and (Manakov, Proskurin et al, in preparation); henceforth, SMC). Analysis of the M2 and the SMC ensemble representations associated with individual behavioral sequences over the course of long behavioral sessions revealed that although representational changes could also be observed in those parts of the medial lobe, these changes were significantly less pronounced and affected a smaller fraction of the recorded units, especially in the more caudal SMC (Figure 3g, Figure 3—figure supplement 2). While we cannot rule out the possibility that the inherent sampling bias of the extracellular recordings has led us to miss some less active M2 or SMC neurons that also display a substantial degree of activity reorganization, these observations suggested that large-scale functional network re-arrangements between the inferred global behavioral contexts are particularly prominent in the ACC.

Within each global functional re-configuration of ACC neural ensemble, the representation of a behavioral strategy is further shaped by its local prevalence

We next investigated whether ACC neural dynamics are further shaped by more local adjustments to the animal’s course of action. Indeed, within the otherwise largely stable functional ensemble configurations reflected in darker-colored squares of the similarity matrices (Figure 3d, ‘within-dominant’ and ‘within-exploratory’ squares), some local variation in pixel intensity was consistently observed. Therefore, we sought to determine whether the underlying variability in the ACC neural ensemble dynamics characterizing individual instances of sequence execution may reflect local fluctuations in the animal’s choices. As a natural extension of defining global context (above) based on the roughly block-wise, persistent dominance of a single sequence, we chose the simplest summary statistic – sequence prevalence in recent choices – to capture local fluctuations in how much pursuit of the sequence in question is balanced by exploration of other strategies (Methods).

Firing rates indeed tracked local sequence prevalence in a substantial fraction of individual neurons that otherwise displayed a variety of response profiles (see Figure 4a for examples). Although a fraction of ACC neurons tracked local sequence prevalence throughout sequence execution, the majority did so in a subset of the five analysis windows that were aligned to nose port entries, roughly equally distributed across the temporal extent of the multi-step sequence (Figure 4—figure supplement 1c). To examine this modulatory influence on ACC activity in greater detail, we fit the relationship between the firing rates of individual neurons and the local sequence prevalence with a linear model – with or without the global context (‘dominant’/’exploratory’) as a fixed parameter – and evaluated the model’s performance through cross-validation (Methods). The model’s explanatory power was robust to the precise number of trials in the past used to calculate local sequence prevalence but was diminished if the estimate was shifted to largely comprise future trials (Figure 4—figure supplement 1a–b). The cross-validated performance of the mixed-effects linear model that included global context exceeded the performance of the model that had prevalence as a single parameter (Figure 4—figure supplement 2a), arguing that prevalence tracking and the large-scale functional reorganization of the ACC network between the ‘dominant’ and ‘exploratory’ contexts in our task reflect distinct computations. Strikingly, when we allowed the slope of the linear relationship between the strategy prevalence and the firing rate of individual neurons to vary depending on the global behavioral context, this simple behavioral summary statistic explained up to 80% of neural activity variance in individual ACC neurons during sequence execution (Figure 4b and c, Figure 4—figure supplement 1d; median 28, IQR 42% of activity variance explained by prevalence across all recorded cells in the rostral ACC). The marked modulation of individual cell activity in the ACC by local strategy prevalence stands in contrast with a more modest modulation detectable in other parts of the medial lobe, in particular, the SMC (Figure 4b, Figure 4—figure supplement 2a), pointing to a distinct role played by the rostral medial frontal cortical region in strategy contextualization. Within the ACC, an encoding potential gradient emerged following a detailed spatial reconstruction of the recorded units, with the most substantial modulation observed in the rostral-most portion thought to be homologous to the primate area 32D (Figure 4b and c). Thus, this underexplored, rostral-most portion of the ACC may play a particularly prominent role in keeping a statistical representation of recent behavioral choices.

Figure 4 with 3 supplements see all
Activity of ACC neurons during sequence execution is markedly shaped by the local prevalence of the executed sequence.

(a) Firing rates as a function of local sequence prevalence for two example ACC neurons. (b) Graphic representation of the explanatory power of the linear model across the spatial extent of the recording locations. (c) Top panel: explanatory power in the rostral portion of the cingulate as a function of location along the anterior- posterior axis. Bottom panel: refence rat brain atlas section. Red arrow in (b, c) points to a cluster of the particularly strong model performance in the rostral portion of the cingulate that maps to the region homologous to the primate area 32D. (d) Regression weights for the expanded linear models that relate ACC neuron FR rates to not only sequence prevalence but also general reward prevalence (upper panel) or sequence-specific reward prevalence (lower panel). For all box-and-whisker plots, central blue line indicates the median, the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively, and the whiskers extend to the most extreme data points not considered outliers. n.s., not significant; **, p<0.01; ***, p<0.001.

The comparatively low amount of neural activity variance in the motor regions M2 and SMC explained by strategy prevalence suggests that the observed modulation is unlikely to reflect movement parameters that may co-vary with changes in the animal’s dedication to a particular strategy. To establish this more explicitly, we directly quantified trial-by-trial measures of movement vigor (sequence execution time) and kinematics (the first principal component of movement trajectory) for each instance of sequence execution and then asked how well linear models that incorporate these parameters performed at explaining variance in ACC firing rates (Methods). We first evaluated the performance of an equivalent linear model that used either execution time or the first principal component of movement trajectory as a single parameter; neither model could explain ACC neural activity variance as well as sequence prevalence (Figure 4—figure supplement 3a, 0.025+/–0.021 activity variance explained by movement trajectory, 0.04+/-0.04 by execution time, n=36 sessions, N=4 animals). We then focused in more detail on the dataset recorded in the anterior part of the cingulate cortex that displayed the strongest modulation by strategy prevalence (Figure 4b and c), and asked whether adding a separate movement-related parameter to the linear model would dwarf the explanatory power of the prevalence term (Methods). To ensure that the resulting weights on the two parameters would directly reflect their relative contributions, we z-score normalized all the variables to 0 mean and standard deviation of 1 prior to fitting the model. Consistent with the relatively poor performance of the single movement-related parameter models, the expanded models revealed only a minor contribution from those motor parameters to the overall model’s performance (Figure 4—figure supplement 3b, absolute regression weights of 0.29+/-0.10 vs 0.11+/-0.07 for sequence prevalence and execution time; 0.32+/-0.12 and 0.08+/-0.04 for sequence prevalence and movement trajectory; n=36 sessions, N=4 animals). While it remains possible that more nuanced aspects of movement, like the animal’s posture, contribute to the observed activity modulation, these observations suggest that gross movement-related parameters are not behind the explanatory power of strategy prevalence.

We also resolved whether the observed activity variation could simply reflect the specifics of choice and reward history that co-vary with prevalence. To accomplish this, we evaluated whether the prevalence-based model trained only on a subset of sequence instances that shared immediate history still had significant explanatory power. Specifically, we capitalized on the previous finding that much of such direct history impact in the ACC is exerted within fewer than five trials (Bernacchia et al., 2011). Since the prevalence estimate is formed over a much longer window, we could sub-select those ‘LLR’ sequence instances that were matched in the immediate history (i.e. ‘LLR’ instances that immediately followed another ‘LLR’), but that otherwise still differed in the associated ‘LLR’ sequence prevalence. The prevalence-based model trained on this reduced subset of sequence instances matched for local history still explained a significant fraction of ACC activity variance (Figure 4—figure supplement 3c), arguing that the observed modulation could not have been solely history-based.

Overall, while our analyses do not rule out potential contributions of other trial-by trial behavioral parameters to neural encoding in ACC, they demonstrate that the gross motor and history parameters cannot account for the robustness of local strategy prevalence in explaining ACC neural activity variance.

The tracking of reward by the ACC neural ensemble is strategy-specific, however reward prevalence is insufficient to account for activity modulation

Models of decision-making rarely include a summary statistic of the agent’s recent behavioral choices – such as the local strategy prevalence that we examine here. This prompted us to next consider how the observed modulation of ACC ensemble activity by local strategy prevalence might interact with any modulation exerted by the external reward – a more commonly considered parameter directly related to the concept of valuation thought to engage circuit computations in the ACC (Boorman et al., 2009; Kennerley et al., 2009; Kolling et al., 2012; Luk and Wallis, 2013). Indeed, variation in strategy prevalence is necessarily accompanied by variation in detailed reward history and thus understanding the interplay between the two in shaping ACC neural dynamics may shed further light on how the animals parse their environment.

We first established that local strategy prevalence provides at least some modulatory influence independent of external reward by evaluating the performance of the prevalence model trained exclusively on sequence instances executed under ‘no reward’ conditions. Specifically, we evaluated two distinct cases: ‘exploratory’ sequence instances in sessions comprising blocks of the familiar ‘LLR’ and ‘RRL’ targets, and persistent ‘LLR’ and ‘RRL’ sequence instances within the unfamiliar ‘LLLR’ and ‘RRRL’ contexts. The observed model performance in both cases was significantly more robust than what would be expected if most of the observed activity variance during sequence execution arose from the statistics of the associated reward (Figure 4—figure supplement 2a, 0.13 +/- 0.06 of ACC activity variance explained for exploratory sequence instances, n=36 sessions, N=4 animals; Figure 4—figure supplement 2c, 0.35+/–0.23, 0.25+/-0.19 and 0.24+/-0.19 A32D activity variance explained for unrewarded ‘LLR’ instances dominating early ‘LLLR’ acquisition in 3 individual animals). In further support of the conclusion that sequence prevalence shapes ACC neural dynamics independent of the associated reward, we observed little activity modulation for sequence instances within ‘dominant’ tails – a period when sequence prevalence remains high, but reward expectation should rapidly diminish (Figure 3f). Combined, these data argue that a summary statistic of that strategy in past choices exerts a continuous modulation of the strategy representation in ACC and raise the question of how this modulation interacts with any shaping of the ACC activity by the external reward.

ACC is known to both multiplex information about reward and individual actions (Hayden and Platt, 2010), as well as track progression to reward over multiple steps of an action sequence (Shidara and Richmond, 2002). To determine whether the unexpected robustness of the simple strategy prevalence model above in explaining a marked fraction of ACC neural activity variance during the execution of a specific sequence derives mostly from the impact of successfully procured reward, we incorporated an additional ‘reward prevalence’ term into the linear model and determined the relative contribution of the ‘sequence prevalence’ and ‘reward prevalence’ terms to the expanded model’s performance (see below). Provided that the two parameters are sufficiently decorrelated in our dataset to avoid collinearity – due to factors like reward omission, exploratory sequence instances, and transient persistence with off-target sequences– their weights in the expanded model would reflect the relative contribution to the model’s performance from the summary behavioral statistic itself and from the recently procured reward. To account for the possibility that reward statistics may be processed by, and exert influence on, the ACC circuitry at either the single trial- or the single sequence level, we separately considered a model with an ‘overall reward prevalence’ term and one with a ‘sequence-specific reward prevalence’ term. As such, a reward obtained for the target ‘RRL’ sequence before an exploratory, off-target, ‘LLR’ sequence instance would contribute to the ‘overall reward prevalence’ term, but not to the ‘sequence-specific reward prevalence’ term in models aimed at explaining neural activity variance during ‘LLR’ execution. We computed each ‘reward prevalence’ term by weighing the relevant previous rewards with the same temporal filter used to calculate sequence prevalence and identified a subset of behavioral sessions for which the ‘sequence prevalence’ and each of the ‘reward prevalence’ parameters were sufficiently decorrelated in our dataset (condition index between 1 and 4, Methods).

Does the observed marked modulation of ACC neural dynamics during different instances of sequence execution outside of some particularly persistent ‘no reward’ contexts arise largely from variation in recent reward history? To resolve this, we determined whether the model’s weight for the ‘sequence prevalence’ term became negligible once the ‘reward prevalence’ term was added to the model. Contrary to this expectation, neither the ‘overall reward prevalence’ nor the ‘sequence-specific reward prevalence’ terms dwarfed the contribution from the ‘strategy prevalence’ term to the model’s performance, with the ‘overall reward prevalence’ making a particularly small contribution to the model’s explanatory power (Figure 4d). Furthermore, while the contribution from the ‘sequence-specific reward prevalence’ was on par with that of ‘strategy prevalence’, it was clearly insufficient on its own to account for the model’s explanatory power (Figure 4d, regression weights of 0.558+/-0.005 vs 0.122+/-0.003 for sequence prevalence and overall reward prevalence; 0.285+/-0.008 and 0.416+/-0.012 for sequence prevalence and sequence-specific reward prevalence; n=47 sessions, N=9 animals). Combined, these observations argue that the tracking of reward by ACC is strategy-specific, and that both a summary statistic of the specific self-guided behavioral strategy in recent past and a statistic of the associated reward shape the ACC neural dynamics during strategy execution, further highlighting the central role of the animal’s strategy choice rather than the external rule in this process.

Strategy prevalence can be decoded from ACC ensemble trajectories during strategy execution

The unexpectedly strong modulation of ACC neural activity by sequence prevalence (Figure 4) points to the possibility that information about how prevalent the currently sampled strategy has been in the recent past may be decodable from ACC ensemble activity by downstream circuits. Indeed, we found that linear population decoders trained independently in each analysis window achieved robust cross-validated performance. For each of the five analysis windows, we fit the relationship between the local sequence prevalence and the firing rates – in that window – of all simultaneously recorded ACC neurons. Cross-validated performance of such individually tuned linear decoders scaled steeply with the number of neurons (Figure 5—figure supplement 1a). Moreover, ensembles of ACC neurons containing as few as 10 prevalence-modulated units could explain as much as 60% of sequence prevalence variance. Consistent with this finding, visualizing ensemble activity in a reduced dimensional state space defined from regression against prevalence (see Methods) revealed a clear separation of trajectories associated with different instances of sequence execution by the local strategy prevalence (Figure 5a).

Figure 5 with 1 supplement see all
Strategy prevalence can be decoded from ACC ensemble activity throughout sequence execution.

(a) Examples from three different animals of ACC ensemble trajectories associated with different instances of RRL sequence execution visualized in ensemble state subspace chosen to maximize separation by local sequence prevalence. (b) Cross-validated performance of linear models relating ACC ensemble activity during RRL execution to local RRL prevalence. Shown are the performance of best of five models fit in individual analysis windows (b), the average performance of model that was constrained to always use the same weight for any given neuron in all five windows (b, c), and that model’s worst predicted performance in an individual analysis window (c). Right panels exclude from all models ACC neurons that show significant modulation by strategy prevalence in all five analysis windows.

The decoding strategy above implicitly assumes an ability to use a temporally varying readout, which may be challenging to implement in circuit dynamics. We therefore next examined the ability of temporally fixed readouts to decode prevalence. Focusing on sessions with at least five simultaneously recorded neurons that displayed modulation by prevalence in at least one analysis window, we next fit a new linear model that was constrained to always use the same readout (i.e. same weight for any given neuron in all five windows). The constrained model retained a large fraction of the overall explanatory power compared to that of the best of five individually tuned decoders (Figure 5b, left panel, variance explained of 0.36+/-0.03 for best of individually tuned models vs 0.23+/-0.02 for the model with weights fixed across the five analysis windows, n=36 sessions, N=4 animals). Furthermore, when we evaluated the prediction of this aggregate-constrained model in each of the five analysis windows, we observed substantial explanatory power even for the worst-performing window (Figure 5c, left panel, n=36 sessions, N=4 animals), suggesting that the fixed decoder would never drop the prevalence signal as the animal executed the sub-steps of a multi-step strategy. The ability of a fixed linear sum of activity across the recorded population to retain a substantial fraction of explanatory power throughout the temporal extent of sequence execution was not simply due to the presence of a small number of neurons with prevalence-related modulation in all five analysis windows. Indeed, the constrained model’s performance dropped only moderately when such neurons were left out of the dataset (Figure 5b–c, right panels). Combined, these observations suggest that despite the transient nature of activity modulation by sequence prevalence at the level of individual neurons (Figure 5—figure supplement 1b), the ACC neural ensemble dynamics is structured in such a way as to permit this, or a related, summary statistic of the animal’s recent sequence choices to be stably decoded by downstream circuits throughout sequence execution.

Discussion

The Anterior Cingulate Cortex is thought to play a central role in dynamic, context-specific strategy arbitration in complex non-stationary environments, yet the organizing principles of the cingulate’s neural activity that underpin contextually appropriate strategy selection and point to specific computations that take place in the ACC remain unresolved. We report here that when animals search a structured task space through self-guided exploration, a substantial fraction of ACC neural activity variance can be explained by the prevalence of individual behavioral strategies. This prevalence encoding – particularly enriched in the most rostral portion of the ACC homologous to the primate area 32D – is preserved through large-scale functional rearrangements of the ACC network between distinct global behavioral contexts and is evident even in the absence of any pairing between strategy execution and reward delivery. We further show that strategy prevalence, or a related summary statistic of the specific self-guided behavioral strategy in recent past synergizes with a statistic of the associated reward to shape the ACC neural dynamics during strategy execution. Our findings raise the possibility that ‘attention to self-action’ (Blakemore et al., 2000) may be at the core of rostral cingulate functionality, in essence uniting the long-standing ‘attention to action’ account (Norman and Shallice, 1986; Passingham, 1996) with the proposed role of the cingulate in processing information relating to self (Blakemore et al., 2000; Euston et al., 2012; Passingham et al., 2010; Seamans and Floresco, 2022) and the production of self-generated actions.

Studies to define the precise role ACC plays in value-based decision-making have emphasized its role in tracking task-relevant information to guide appropriate action, but have otherwise often cast the question in terms of identifying the specific step to which it contributes in the pre-decision computation and comparison of action values (Boorman et al., 2009; Kennerley et al., 2009; Kolling et al., 2012; Luk and Wallis, 2013). The resulting lack of a unifying account has prompted a recent suggestion that rather than contributing to pre-decision valuation, ACC encodes post-decision variables related to the subjective value that can be inferred empirically from the animal’s choices (Blanchard and Hayden, 2014; Cai and Padoa-Schioppa, 2012). The notion of a subjective approach by each animal to the encountered task is similarly inherent in our parsing of the behavioral session using the animal’s selected self-guided strategy (rather than the imposed rule) in examining the encoding of behaviorally relevant information. Our observation that strategy prevalence explains a substantial fraction of variance in the activity of ACC neurons supports the interpretation that ACC computes an inherently subjective post-decision variable, and indeed, subjective value and strategy prevalence are closely related in well-trained animals in typical value-based tasks. However, our finding that prevalence contributes strongly to activity modulation even in the absence of reward delivery suggests that prevalence encoding might be a more parsimonious account that accommodates both past observations and our findings. Examining how adding such a summary statistic of the agent’s recent behavioral choices to models of decision-making changes their explanatory power in various experimental settings will be one interesting direction for future study.

The encoding of strategy prevalence appears to be independent of, and be preserved through, large-scale functional rearrangement of ACC networks. These functional rearrangements result in representational remapping, whereby ACC ensemble activity associated with the execution of a specific strategy changes markedly between behavioral contexts. This observation suggests that ‘network resets’ previously observed to accompany transitions to exploration (Durstewitz et al., 2010; Emberly and Seamans Jeremy, 2019; Karlsson et al., 2012) are not mere bookmarks of a behavioral state change, but rather reflect reorganizations of the ACC network designed to tag action plans with a global context-specific representation. Although ‘dominant’/ ‘exploratory’ distinction maps well onto the ‘default’/’alternative’ dichotomy that has enjoyed prominence in the ACC field due to its computational appeal and explanatory power in many settings (Blanchard and Hayden, 2014; Boorman et al., 2013; Kolling et al., 2012; Procyk et al., 2000), the representational transitions in ACC likely reflect broader contextual tagging of action plans, as suggested by the observed difference in the representations associated with a specific exploratory sequence depending on the nature of the default strategy. It is even possible that the contextual content of the neural representations in ACC may not be limited to the statistics of actions and outcomes. Indeed, given the ACC’s topological centrality within the frontal cortical network and domain-general role in cognition, the content of its neural representations set-up by large-scale re-arrangements of functional networks might come to reflect, when appropriate, distinct cognitive loads (Shenhav et al., 2016), social settings (Tomlin et al., 2006) or somato-visceral states (Caracheo et al., 2018; Seamans and Floresco, 2022). As such, ACC would set up – and possibly initially infer — representations of distinct task-relevant contextual information in such a way that tracking of individual strategy prevalence may then take place.

How is the circuit implementation of the abrupt representational re-organizations that define individual behavioral contexts related to prevalence encoding? One possibility is that these representational re-organizations are simply an emergent property produced by the dynamic interactions among the elements of the network that tracks prevalence. Specifically, a gradual, prevalence-tracking change in the activity of individual neurons could eventually reach a threshold that instantiates a sudden phase transition reorganizing ACC into a new functional network. Against this idea, prevalence often changes abruptly rather than slowly at the end of a behavioral block, and furthermore, changes in prevalence inside ‘dominant’ blocks often exceed those at the end of a behavioral block but do not lead to abrupt reorganizations of the network. Alternatively, the ACC network may be built to allow a continuous representation of prevalence through a substantial degree of reorganization and a different computation triggers the abrupt transitions. What constraints such an encoding scheme places on the ACC network architecture, where the transition-triggering computation is performed, and what factors into that computation remain open questions. Furthermore, investigating what constraints an encoding scheme that preserves prevalence computation through large-scale functional rearrangements places on network architecture and biophysics may likewise be an intriguing area of future study.

What might be the computational advantage of encoding prevalence, and how can the encoding of this post-decision variable be reconciled with the recent evidence causally linking ACC to the current decision (Tervo et al., 2021)? Given ACC’s well-documented role in exploration (Blanchard and Hayden, 2014; Fouragnan et al., 2019; Hayden et al., 2011; Kolling et al., 2012; Quilodran et al., 2008; Tervo et al., 2021), one possibility is that keeping track of strategy prevalence may be used to prioritize exploring strategies evaluated less frequently in the recent past (see Wiering and Schmidhuber, 1998 for one implementation of such frequency-based exploration). An alternative view is informed by old accounts of the ACC’s role in recognizing the ‘self’ actions (Blakemore et al., 1998; Espinosa et al., 2006). Indeed, an intriguing possibility is that in complex settings, keeping track of different actions taken and the frequency with which one has performed them might be an effective way of estimating agency – an understanding of which actions bring about the ability to promote or prevent the occurrence of events in the environment – through statistical learning. As such, agents can determine the degree to which their actions exert control over world events without reinforcement or an explicit representation of the temporal relationships between their actions and events. Statistical learning is thought to be central to many aspects of perceptual cognition and motor control; in the causal domain, it provides a complement to associative learning in permitting the agent to make inferences about the world without being enslaved to temporal contiguity or specific action-outcome contingencies. In principle, learning of statistical regularities can happen implicitly – without explicit awareness or hypothesis testing. However, one influential view posits that behind the efficiency, with which animals pick up on statistical regularities from sparse data in complex, often only partially observable environments, is the process of probabilistic inference that entertains multiple candidate models of the underlying statistical regularities coupled with explicit hypothesis testing designed to evaluate those models (Tenenbaum et al., 2011). And indeed, establishing agency would critically depend on active evaluation of an inferred ability to influence events in the world. We posit that ACC’s central role in establishing one’s agency is what reconciles our finding of its robust prevalence encoding and the causal evidence (Tervo et al., 2020) linking it to the ongoing decision of whether or not to evaluate alternative action plans (and more generally to curiosity Wang and Hayden, 2020, information seeking White et al., 2019 and hypothesis testing Elliott and Dolan, 1998).

We note that the interpretation that ACC – and likely the medial frontal network into which it is embedded – plays a key role in evaluating the extent to which one has agency in the environment is a refinement of the broader hypothesis that it processes information related to self and implements self-generated actions that not only accommodates the existing experimental observations but is also more falsifiable. Indeed, a major critique of the ‘self-generated actions’ view of the medial frontal lobe is that the distinction between self-generated and externally triggered actions is empirically intractable, thus making this view of the medial frontal lobe unfalsifiable (Schüür and Haggard, 2011). In contrast, it is conceivable to develop experimental designs that manipulate the reward statistics to induce miscalculations of agency: superstitions – a perception of control over the environment in the absence of a causal link between the agents’ actions and the outcome, and learned helplessness – an incorrect perception of a lack of control and a cessation of action.

Methods

Subjects

All experiments were done in male Long Evans rats 6–12 months of age (with weight kept between 400 and 500 g). Animals were kept at 85% of their initial body weight before food restriction and maintained on a 12 hr light/12 hr dark schedule. Experiments were conducted according to National Institutes of Health guidelines for animal research and were approved by the Institutional Animal Care and Use Committee at HHMI’s Janelia Farm Research Campus (IACUC protocol 22–0220.02).

A total of 7 animals were implanted with tetrode drives for collection of neural activity during the task (4 animals targeting the ACC and 3 animals targeting the SMC).

Behavioral apparatus and task

All behavior was confined to a box with 23 cm high plastic walls and stainless-steel floors (Island Motion Corp). The floor of the box was 25 cm by 34 cm, and the nose ports were all arranged on one of the 25 cm walls. All lights, nose ports, and reward deliveries were controlled and monitored with a custom-programmed microcontroller, which in turn communicated via USB to a PC running a Matlab-based control program. Nose port entries were detected with an infrared beam-break detector (IR LED and photodiode pair). The central initiation port contained one white LED that indicated the option to initiate a new trial. The side choice nose ports also each contained an LED that indicated that the initiation port had been successfully triggered (at which point the LED in the center port was extinguished) and side ports were available for selection. Note that in some sessions, only the center LED was changing states, and in some, no lights were used with little impact on behavioral performance. The side ports also delivered liquid rewards (0.1 ml drops of 10% sucrose mixed with black cherry Kool-Aid) with the help of a motorized syringe pump (Harvard Apparatus PHD 2000).

The behavioral task is an elaboration of the basic design reported as a ‘covert pattern’ task in Tervo et al., 2014 and consists of a series of self-initiated trials (several hundred per session), each involving a choice between two options – the left and the right choice port. Reward is delivered on the last step of a target sequence that is not otherwise indicated to an animal and thus has to be discovered through self-guided exploration.

Each session in the dataset described in this manuscript contained one to several unsignalled transitions in the identity of the target sequence. The set of possible sequences typically included some or all of the four non-trivial three-step sequences (‘Left-Left-Right’, ‘Right-Right-Left’, ‘Left-Right-Right’, ‘Right-Left-Left’), but occasionally was expanded to include longer sequences (mainly ‘Left-Left-Left-Right’). Only one sequence was rewarded in any particular block of trials. A subset of sessions incorporated reward omission for 10–30% of correctly executed sequences.

Behavioral training

Food-restricted animals were trained to perform the task with no explicit instruction. Early in training, exploration was encouraged by rewarding a small fraction (at most 10%) of novel patterns – specifically, those that escaped prediction by Competitor 2 (Tervo et al., 2014) that used the history of the animal’s choices and outcomes to predict his next choice. Note that once the animal discovers and concatenates the target sequence at a high rate, little to no background reward is delivered since the behavior becomes fully predictable. Eventually, background reward was fully eliminated.

Animals were considered proficient on the task when they consistently discovered and concatenated more than one sequence in a session, and when the average across-session reward rate was consistently in the 17–22% range % (with 33% being the maximal theoretically possible). Over 90% of animals became proficient within one month of training.

Electrophysiological recordings

A microdrive array containing 16 independently movable tetrodes was chronically implanted on the head of the animal. Each tetrode was constructed by twisting and fusing together four insulated 13 μm wires (stablohm 800 A, California Fine Wire). Each tetrode tip was gold-plated to reduce impedance to 200–300 kΩ at 1 kHz. Within the implant, the tetrodes converged to an oval bundle (1 mm x 2 mmd), angled at 0° with respect to vertical (pointing towards midline after implantation).

For the drive implantation surgery, trained animals were initially anesthetized with 5% isoflurane gas (1.0 L/min). After 10–15 minutes, isoflurane was reduced to 1.5–2.0% and the flow rate to 0.7 L/min. A local anesthetic (Bupivacaine) was injected under the skin 10 minutes before making an incision. A unilateral craniotomy (1.0 by 2.0 mm) was drilled in the skull above the site of recording. The microdrive array was implanted such that the tetrode bundle was centered 3.0 mm anterior and 0.8 mm lateral to Bregma (right or left hemisphere) when recordings were targeted to the ACC, and 2.0 mm posterior and 1.2 mm lateral to Bregma when recordings were targeted to the putative SMC. Small stainless steel bone screws and dental cement were used to secure the implant to the skull. One of the screws was connected to a wire leading to the system ground. Before the animal woke up, all tetrodes were advanced into the brain ~1.20 mm deep from the brain surface.

Over the two weeks following surgery, the tetrodes were slowly lowered, moving approximately 40 μm/day on average. During this time, animals were re-acclimated to performing the task with the drive. When performance on the task was regained to pre-surgery levels (in terms of motivation and dynamic strategy arbitration behavior), recording sessions began. After each recording session, any tetrodes that did not appear to have any isolatable units were moved down 25 μm. Once a tetrode had been moved a total of 2.5 mm from the surface, which is the approximate border between anterior cingulate and prelimbic cortices, it was no longer advanced.

Each recording session spanned 3–4 hr. Animals were not forced to perform the task and sometimes took breaks (generally around 10 min, but sometimes up to 30 min). Data from all the animals were collected using the wireless headstage and datalogger (Horizontal Headstage 128ch with Datalogger, SpikeGadgets, spikegadgets.com/hardware/hh128.html). An array of LEDs of different colors was attached on top of the animal’s implant and the animal’s position in the environment was recorded with a video camera at 60 frames per second. The animal’s position was reconstructed offline using a semi-automated analysis of digital video of the experiment with custom-written software.

Raw electrophysiology data were sampled at 30 kHz, digitally filtered between 600 Hz and 6 KHz (2 pole Bessel for high and low pass) and threshold-crossing events were selected for further analysis. Individual units on each tetrode were identified by manually classifying spikes using polygons in two-dimensional views of waveform parameters using custom-made Matlab scripts (Karlsson et al., 2012, MatClust, https://www.mathworks.com/matlabcentral/fileexchange/39663-matclust). For each channel of a tetrode, peak waveform amplitude and the waveform’s projection onto the first two principal components were used for clustering. Autocorrelation analyses were done to exclude units with non-physiological single-unit spike trains. Only units where the entire cluster was visible throughout the recording session were included. Thus, a unit was not isolated for further analysis if any part of the cluster vanished into the noise or was cut off by the recording threshold.

The total contribution from each animal was as follows:

  • ACC animals: 4 sessions, 81 neurons for animal 1, 15 sessions, 163 neurons for animal 2, 13 sessions, 324 neurons for animal 3, 5 sessions, 286 neurons for animal 4

  • SMA animals: 11 sessions, 176 neurons for animal 1, 12 sessions, 101 neurons for animal 2, 8 sessions, 75 neurons for animal 3

Mapping of the putative SMC homologue

The Supplementary Motor Cortex had not been characterized in the rat, so in a separate set of experiments, we sought to identify a medial premotor region involved in temporally sequencing self-initiated actions in the rat. Specifically, we performed a set of pharmacological inactivation experiments along the rostro-caudal axis of the agranular premotor region M2, evaluating the effect of such a transient inactivation on the animals’ performance during the three-step version of the sequence exploration task. We observed a robust behavioral effect following a bilateral injection of muscimol, but only when muscimol was delivered to the region immediately caudal to mid-cingulate cortex (a location similar to that of primate SMC, Figure 3—figure supplement 2). Following local muscimol delivery, animals continued to complete trials, but no longer performed the target sequence significantly above the value expected for a biased coin. SMC inactivation appeared to specifically affect complex sequencing rather than chaining of actions in general, because the animals still performed the sequential entries into the initiation and reward ports correctly. The technical details associated with this set of experiments are provided below.

Surgery for Cannula Implantation

Location of bilateral craniotomies and cannula implantation were +2.0 mm AP and ± 1.2 mm ML with respect to Bregma for ACC, –2 mm AP and ± 1.2 mm ML for putative SMC,+1 mm AP and ± 1.2 mm ML for FOF (Erlich et al., 2011), –0.5 mm AP and ± 1.2 mm ML for another candidate SMC region that didn’t show muscimol effect. ACC cannula implantation was deeper (2.0–3.)0 mm to account for curvature, whereas candidate SMC regions were 1 mm deep. Injection guide cannulas (Eicom CXG (T) 2 Diameter OD/ID 0.3/0.2 mm) were inserted through a 0.5 mm craniotomy. As a protection measure for the animal in the home cage, an opaque cone was placed around the implant. Cannulas were bonded to the skull with dental cement (C&B Metabond - Parkell). Dummy cannulas (Eicom CXD (T) 2 Diameter 0.15/0.06 mm) were used in between sessions to protect the cannula from debris. Food deprivation and further training were resumed after the recovery period.

Muscimol inactivation

As a control, the beginning of each session remained unperturbed and animals were allowed to perform the task normally. After an animal performed 200 trials and reached reward rate above 0.22, the animal was placed back into its home cage for inactivation. Treats were provided to keep the animal still. Muscimol or saline was administered through the needle inserted into the cannula. Muscimol (Tocris Bioscience) solution was prepared with sterile saline (Hospira) at a final concentration of 0.1 μg/ml. Bilateral infusion was made via a 3 mm-long 31-gauge injector connected to a 5 μl syringe (Hamilton) by a teflon tube (Eicom). Eicom micro syringe pump ESP-64 was programmed to deliver the solution at the rate of 0.25 μl/min for 2 min, for a total volume of 0.50 μl for each hemisphere. The animal was placed back into the operant chamber after the injection. The muscimol effect persisted for about 2 hr after the injection. Trials within one hour of the injection were used for the analysis.

Data analysis

Session selection

Since animals frequently showed greater variability in behavioral performance following drive implantation, a minimal selection filter was applied to determine, which sessions to include into the analysis: the session had to contain at least 200 instances of either the ‘LLR’ or the ‘RRL’ sequence, and the across-session average sequence production rate had to exceed 0.15.

Selection of dominant and exploratory sequence instances in behavioral data

The nature of our behavioral framework required extra care when parsing the behavioral record to identify ‘legitimate’ sequence instances. Parsing a continuous stream of left and right choices to identify ‘legitimate’ sequences was easy for the ‘dominant’ condition, where a high prevalence of sequence concatenation, and a scarcity of choices that conform to other patterns, argue that almost every instance of a pattern conforming to the target sequence is likely to be one actually evaluated by the animal. Specifically, starting with all ‘LLR’ (‘RRL’) instances that were done during the epochs where ‘LLR’ (‘RRL’) was the latent target sequence, we removed all examples prior to when the dominance for the new sequence (which we define as the presence of a successful concatenation of at least three sequence instances) hadn’t yet been established, and appended all sequence instances from the period where dominance had persisted following a block change (i.e. until the previously concatenated sequence was interrupted by more than two trials incompatible with that sequence).

The ‘exploratory’ condition, required a closer examination outside of clear concatenations. For example, although some ‘RRL’ patterns within runs of ‘LLRRL’ in a ‘LLR’ block might reflect a true pairing of the locally dominant pursuit of ‘LLR’ with a quick exploratory evaluation of ‘RRL’ tagged on in a manner akin to strategy mixing (Donoso et al., 2014), most of such instances likely reflect a mere apposition of the ‘RRL’ and the ‘LR’ sequences. To select the likely true exploratory sequences conservatively, we therefore excluded all putative lone exploratory sequences that displayed an overlap (to the left or to the right) with the dominant sequence, except when the timing analysis (see below) provided independent evidence in favor of classifying this putative instance as a true exploratory sequence. Here, our approach was grounded in the expectation that animals would pause, if only briefly, at the side nose port on the last step of a true exploratory sequence instance to ascertain that the explored sequence is not rewarded. Under this assumption, we ‘rescued’ the putative exploratory sequence instances for which the duration of the third step exceeded the threshold chosen on the basis of an unbiased analysis of the distributions for the within-sideport and side-to-center durations across all ‘dominant’ sequence instances for which the otherwise scheduled reward was omitted permitted an unbiased selection of the specific threshold (Figure 1—figure supplement 2e). Specifically, starting with all putative ‘LLR’ (‘RRL’) during periods when dominance had been established for another sequence (see above), we first removed all instances that displayed an overlap (to the left, or to the right) with the dominant latent target sequence. From the removed set, we then ‘rescued’ those instances that were either preceded by another ‘LLR’ (‘RRL’), or for which on step 3, the within-sideport duration exceeded 0.5 s, or side-to-center duration exceeded 1 s.

In addition, we noticed that at times, animals chose to concatenate ‘LLRR’, possibly as a clever way to catch every three-step sequence that might serve as the latent target. Therefore, we removed from all datasets any ‘LLR’ (‘RRL’) instances that appeared within such concatenations. Finally, we also removed from all datasets sequence instances where the animal appeared to take long breaks. Specifically, any sequence instance that displayed a center-to-side time of over three seconds was removed from further analysis.

Analysis of sequence representation in frontal cortical activity

To capture the neural activity associated with a specific three-step sequence, we represented, for each neuron, its activity during a single instance of sequence execution as a vector of the square root of spike counts in five 500 ms windows centered on port entry events (center and side port entries for steps 2 and 3 in the sequence, and side entry for step 1; the window leading to center port entry in step 1 was excluded to avoid a strong contribution of behavioral choices that preceded sequence initiation). The choice of working with the square root of spike counts was guided by the desire to transform the data with Poisson distribution into an approximately Gaussian distribution with a unity variance (Anscombe, 1948). Such transformation brought the variance of the binned data for individual cells to the same level, such that the variability of fast-spiking neurons did not conceal less active cells. The five chosen windows offered substantial coverage of the sequence with minimal temporal overlap, a high degree of stereotypy of spatial trajectories, and the exclusion of outcome-related activity.

For a single neuron, its activity associated with each instance of sequence execution was thus captured as a single point in the corresponding five-dimensional ‘state space’. A single-cell metric of the contextual representational change (‘transition score’, Figure 3) was calculated as the Euclidean distance in this single cell state space between the centroids of the ‘dominant’ and the ‘exploratory’ clusters.

Decoding analysis

For each session, we trained a linear classifier to predict, on the basis of the ensemble activity, whether a particular trial was a part of the ‘LLR’ or the ‘RRL’ sequence. For each trial, we used firing rates of simultaneously recorded ACC cells as predictors and sequence identity as a predicted variable. All cells with non-zero firing rate were included in the analysis; the firing rate for each was calculated over the 500 ms decoding window centered around entry into the side port. When the entire three-step sequence identity was being decoded, each cell contributed three windows corresponding to the three side entries. When the decoding was done on the context (‘LLR’ vs ‘RRL’) of a specific choice (e.g. ‘Right’), only one window was used. All linear discriminant analyses were performed with a fivefold cross-validation: five separate classifiers were trained on 80% of the data set, and the classification accuracy for each was estimated on the remaining 20% of the dataset, after ensuring that the sample size in each category was the same. The reported values represent the average error rates across the five classification runs.

Characterization of representational transitions

To capture the behavior of the entire neuronal ensemble, the dimensionality of the state space was expended to 5*N, where N was the number of simultaneously recorded cells. We then calculated pairwise Euclidian distance between the points in the state space to evaluate how the sequence representation evolved over the course of the session. To demonstrate clustering of points according to the behavioral context, we calculated the distance between centroids of the clusters (median values along each dimension) corresponding to the ‘dominant’ and ‘exploratory’ groupings, and compared it with the corresponding distance calculated after context identity was randomly shuffled across the differences sequence instances.

Characterization of prevalence encoding

To generate a running estimate of the local sequence prevalence in the animal’s behavioral choices, we constructed, for each behavioral session, a vector of binary values of the same length as the session. Trials corresponding to the last step of the target sequence were given the value of ‘1’ (regardless of the context), all other trials were given the value of ‘0’. This binary vector was then convolved with a causal half-Gaussian kernel with zero mean and standard deviation of 20 trials.

We used fivefold cross-validated performance of a linear model to characterize the extent to which sequence prevalence factored into the activity of ACC neurons, using built-in Matlab functions ‘fitlme’ and ‘crossval’, and separately fitting the data in each behavioral context. Linear models were fit independently for each of the five analysis windows. For reporting fractions of cells with significant explained variance in each window, all five values for each neuron were used. For reporting explained variance of individual neurons, best of five values was used.

We defined explained variance as the coefficient of determination (R-squared) using the equation:

(1) 1-y-yfit2y-y-2

where the top is the sum of squared errors and the bottom is the total variance. The explained variance was calculated for ‘test’ subsets of data for each of the five cross-validation rounds, and then averaged across the five resulting values. In some cases, the explained variance was negative reflecting overfitting (typically occurred in the data sets with a small sample size). These negative values were replaced with zeros for subsequent reporting.

Interaction of prevalence and context

We considered several linear mixed-effect models to examine how a potential interaction of prevalence and context might account for the observed variance in sequence representation. The models considered incorporated sequence prevalence as a fixed effect predictor and could include a random intercept that varied by context. We used each of these features alone or together to fit the different models, and similarly estimated performance as the fraction of explained variance with fivefold stratified (with respect to context) cross-validation.

Comparison with models relating movement vigor and trajectory to neural activity

We resolved whether the observed modulation of ACC neural activity could be explained by variation in sequence timing or spatial trajectory by evaluating how well linear models using these parameters as regressors performed relative to the models above. For a measure of movement vigor, we calculated the time between the first step ‘center in’ and the last step ‘side in’; for the spatial trajectory, we first took snippets of the X and Y coordinates within the same five analysis windows that were used for all analyses of neural activity and then concatenated those snippets for all windows and both coordinates together in order to get a 1D vector. Then we combined such vectors for all the sequence instances within a session into a matrix and performed PCA to reduce the dimensionality of the raw spatial data but preserve the data about variation in animal motion. The first principal component was then used as a regressor for the corresponding linear model.

Multiple linear regression

To determine whether the unexpected robustness of the simple strategy prevalence model in explaining a marked fraction of ACC neural activity variance during the execution of a specific sequence derives mostly from the impact of successfully procured reward or the associated movement parameters, we incorporated an additional term into the linear model and determined the relative contribution of the ‘sequence prevalence’ and ‘reward prevalence’ terms to the expanded model’s performance. We considered several families of multiple linear regression models (such as ‘reward prevalence’, ‘sequence reward prevalence’, ‘sequence execution time’ and ‘trajectory PC1’). We computed ‘reward prevalence’ and ‘sequence reward prevalence’ in a similar way to ‘sequence prevalence’ (see above) by convolving a binary vector of rewards (1’s for rewarded and 0’s for unrewarded trials) with half-Gaussian kernel with zero mean and standard deviation of 20 trials. The key difference between the two types of reward prevalence is that we assigned ‘1’ to either all rewarded trials (for calculating ‘reward prevalence’) or only to trials rewarded for the specific sequence type (for calculating ‘sequence reward prevalence’). We used the same variables for the movement vigor and spatial trajectory as described above. We whitened both the predictor and the response variables to zero mean and standard deviation of 1. We then fitted a multiple linear regression model using built-in Matlab functions ‘fitlm’ with pairs of variables (i.e. sequence prevalence and another feature) as predictors and the firing rate of individual neurons in a certain window as a response variable. We used absolute values of the resulting predictor variable weights as a measure for the relative contribution to the final fit by each predictor (Figure 4d and Figure 4—figure supplement 3b). These fits were done for each of the five behavioral windows and the window with the highest total variance explained was selected for the assessment of weights. The following two criteria were used for the sub-selection of sessions and neurons in this analysis:

  1. To ensure that the estimate of weights was not exquisitely sensitive to the specific sample, we only chose sessions whether the pair of variables in the model was sufficiently decorrelated. Specifically, we only took sessions that had condition index – a measure of decorrelation calculated as the square root of the ratio between the largest and the smallest eigenvalue of the covariance matrix – below 4.

  2. For any chosen sessions, we sub-selected those neurons that were deemed significantly modulated by sequence prevalence in the initial analysis (variance explained >0.2). This sub selection was done to answer the question of whether any robustness in explaining activity variance came derives from another variable.

Decoding sequence prevalence from ACC ensemble trajectories

To characterize the extent, to which sequence prevalence could be decoded from population activity buy a downstream decoder, we evaluated the 5-fold cross-validated performance of several linear models that used the firing rates (or more specifically, the number of spikes) of individual neurons in each of the five analysis windows as regressors to estimate the local prevalence of the sequence in question during the t-th instance of sequence execution. The most general model (Equation 2) modeled the response variable local sequence prevalence at the time when the t-th sequence iteration was executed as a linear sum of predictor variable set, with each neuron contributing 5 spike counts, each with its own regression coefficient (weight; ‘model i, k’).

(2) Seq prevt=bias+i,kwik spikesikt

where t=sequence instance, i=1:number of neurons, k=1:5 analysis windows.

For this most general model, each neuron received a significance score for each of the five analysis windows that showed whether that particular neuron in that particular window contributes to the prediction of sequence prevalence. We used these scores to sort each neuron into classes 0–5, depending on the total number of windows this neuron was significant in. Neurons that were not significant in any of the windows ended up in class 0 and were removed from all further modeling, and only sessions with at least 5 neurons outside of class 0 were retained for further analyses.

We further sometimes fit models on all neurons from classes 1 through 5, and sometimes fit models only on neurons from classes 1 through 4. The latter was done to guarantee that we had removed those neurons that themselves had significant prevalence encoding in all windows, thus forcing the models to use multiple neurons to output good predictions over the temporal extent of sequence execution.

To estimate the best possible prediction in each of the five analysis windows, we also fit five individual-window-models (model ‘k’):

(3) Model k: Seq prevt=bias+iwik spikesikt

Finally, we fit a linear model that was constrained to always use the same readout (i.e. same weight for any given neuron in all five windows, Equation 3, model wik=wil). In essence, this single aggregate model treats each of the five windows as extra data points for the same fit. As such, while having the same number of model parameters, this model benefits from these extra data but loses the temporal resolution, providing an estimate of how the fixed downstream decoder would perform on average during any instance of sequence execution.

Data availability

All data can be found here: https://doi.org/10.25378/janelia.21594129.v2. All code can be found here: https://doi.org/10.25378/janelia.21594105.v1. Requests for raw materials should be addressed to AYK (alla@janelia.hhmi.org).

The following data sets were generated
    1. Proskurin M
    2. Manakov M
    3. Karpova A
    (2023) Janelia Research Campus
    Dataset supporting main results of "ACC neural ensemble dynamics are structured by strategy prevalence".
    https://doi.org/10.25378/janelia.21594129.v2
    1. Proskurin M
    2. Manakov M
    3. Karpova A
    (2022) Janelia Research Campus
    Analysis code supporting main results of "ACC neural ensemble dynamics are structured by strategy prevalence".
    https://doi.org/10.25378/janelia.21594105.v1

References

  1. Book
    1. Bruner JS
    (1961)
    The act of discovery
    In: Bruner JS, editors. Harvard Educational Review. American Psychological Association. pp. 21–32.
  2. Book
    1. Norman DA
    2. Shallice T
    (1986) Attention to action
    In: Norman DA, editors. Consciousness and Self-Regulation. Springer. pp. 1–18.
    https://doi.org/10.1007/978-1-4757-0629-1_1
    1. Passingham RE
    (1996) Attention to action
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 351:1473–1479.
    https://doi.org/10.1098/rstb.1996.0132
  3. Book
    1. Wiering M
    2. Schmidhuber J
    (1998)
    E Cient Model-Based Exploration
    MIT Press.

Decision letter

  1. Timothy E Behrens
    Senior and Reviewing Editor; University of Oxford, United Kingdom

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "ACC neural ensemble dynamics are structured by strategy prevalence" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

The reviewers were mostly positive about the manuscript, but all agreed that the evidence ruling out alternative explanations should be strengthened. The high-level summary of the changes requested is as follows:

– The most significant weakness is insufficient evidence to rule out alternate hypotheses, particularly reward, but also the certainty and other factors that may account for the variance of neural signaling.

– More data on behavior is needed: accuracy on all strategies, how many blocks per session, did rats have individual biases in strategy, etc.

– Analytical methods are hard to follow in some places.

– Too much unnecessary jargon reduces clarity.

– Reinforcement effects not sufficiently integrated/described.

– Results and a more general description of the analytical methods should be in the main text rather than the methods.

– A more incremental and linear exposition would help readers understand the data and analyses.

Reviewer #2 (Recommendations for the authors):

I have some concerns and suggestions to make for this manuscript.

1) The authors claim the performance of the rats to be 'robust.' Is there an objective measurement to test such robustness? The conditioned probabilities in Figure 1b might provide some evidence. But it is not straightforward to comprehend what a probability of 0.4 means relating to behavioral performance. It would be helpful to have a learning curve showing baseline behavioral performance and whether expert rats have reached their behavioral asymptotes.

2) To identify exploratory sequence instances in the behavioral data, the authors try to remove many action sequence instances that could be explained by factors other than 'exploration.' While this effort is appreciated, it remains to be answered whether one could pinpoint the actual 'exploratory' instances or their existence. To do so, we need to figure out if a mistake made by rats is random, based on a false belief, or only caused by disengagement. Even if there are some exploratory sequence instances in the behavior, other causes might contaminate identified instances, which could promote the current findings on the coding of strategy prevalence. For instance, the authors use a criterion combining the within-side duration and side-to-center duration in step 3 to remove overlapped and long-break sequences. It, however, can only partially rule out other possible contamination as many trials display long choice duration even in step 1 (Figure S3), and the variability is also substantial. One suggestion is that the authors try a more stringent criterion and test whether the primary findings hold firmly.

3) The authors try to rule out many alternative explanations of the neural data other than strategy prevalence. But unfortunately, it seems impossible to remove all influences from some factors (such as action steps, action sequence configuration/composition, behavioral confidence, reward delivery, reward expectation, reward rate, expected reward location, etc.) that covary with strategy prevalence. (i) The authors examine neural responses to 'R' in LLR vs. RRL. But apparently, the two Rs are in different steps (step 3 in LLR and step 1 or 2 in RRL) within the two three-step sequences. Steps thus might explain such differential responses to the R in different sequences. Many neurons would likely show selectivity to different steps regardless of specific actions (L or R), which need further testing. (ii) The authors have tried to remove the influence of reward expectation by only looking at different exploratory states since no reward is available in these trials. But reward expectation can still explain differential firing to the dominant vs. exploratory strategies. The dominant tail effect seen in Figure 3f following the change of rules does not help because reward expectation, like strategy prevalence, also follows the subjective belief of the rats but not the actual task rule. (iii) ACC neurons might code an action sequence as a whole or configuration, much like a compound action, which might instead be interpreted as the authors' specific content of strategy prevalence. However, further evidence is needed to rule out the possibility that neural activities are merely related to a complex action per se rather than a higher cognitive cause. While it is understandable that confounds are not easy to control fully, the authors should emphasize these limitations more in the discussions.

4) Recording data show that neural correlates of strategy prevalence in M2 are slightly higher than in SMC, but inactivating SMC but not M2 affects corresponding behavior. How do the authors explain the disagreement between recording and inactivation data? More importantly, behavioral data on the inactivation of ACC is missing, although cannula implantation surgery in ACC is mentioned in the manuscript. More clarifications are needed for these questions.

5) In Figure 3e, data from a small subset of sessions show that neurons show differential neural responses to two exploratory contexts, supporting two conclusions assessed by the authors. (i) Coding of contexts is beyond the dichotomy of 'dominant' vs. 'exploratory' but contains more specific information regarding richer contextual content. However, critical evidence to distinguish between two 'dominant' contexts by ACC neurons needs to be provided. Moreover, whether the coding for exploratory contexts results from other factors, such as overall reward rate, needs to be clarified. (ii) Because ACC neurons differentiate two exploratory contexts in which reward is unavailable, the authors believe that reward cannot explain the observed neural correlates of strategy prevalence, which is an over-generalized statement since the 'dominant' and 'exploratory' contexts are different in reward availability. Besides, the authors' claim, used to support the irrelevance of reward expectation, that expert rats will likely sample the 'exploratory' sequence instances without any immediate reward expectation lacks clear evidence.

6) I find the manuscript often hard to follow primarily because of the usage of many jargons and extended expressions and sentences, which are unnecessary and could be replaced with more plain language to improve readability for a broader audience. To list a few:

i) behavioral framework = behavioral task?

ii) searching a structured space of action sequences = nose poking?

iv) restructuring of ACC network configuration = neural activity change in ACC?

v) persists through a substantial ensemble reorganization = persistent neural code?

vi) tags ACC representations with contextual content = ACC encodes context?

vii) organizing principles for the ACC ensemble dynamics = how ACC encodes information?

viii) representations of individual strategies are also marked with contextual content = encoding of strategy per se?

Besides, the introduction needs to be narrower in scope, especially at the end, to successfully frame and specify the questions the current study would like to answer. The last paragraph in the introduction contains a lengthy summary of the results. But since the task and results are so complicated, it is almost impossible to understand it without going through the following results in detail. Also, throughout the manuscript, many statements are too broad or vague to some extent. There is room to improve the introduction to set a better ground for the current study and other parts of the manuscript for a broader audience.

Reviewer #3 (Recommendations for the authors):

(1) I found the claims of the manuscript difficult to assess because many basic features are not described or shown in graphical form, and the abstracted analytical methodology (and language) is often difficult to understand. For instance, the description of behavior is incomplete relative to the description of the task in the methods. Rats learned many sequences, but the results of only one are shown. It is important to show whether the rats had biases for particular sequences, particularly for the interpretation of neural data. Otherwise, neurons might be revealing preferred vs non-preferred strategies. Showing the entropy of response strategies (i.e. sequences parsed into triplets and measuring the resulting entropy of triplets) may be a compact way to do so. In general, I recommend reorganizing the manuscript in a more incremental and linear manner.

(2) The primary argument provided that neurons encode strategy (Figure 1e) is based on the comparison to L in the RLL rule vs LLR. But a comparison is made between the 2nd element of the former rule and the first of the latter one. There is clearly a potential for confound by reward expectation. Even though the reward was withheld on the analyzed trials, there is likely different reward expectation. Specifically, the L response in the first position follows a negative reward prediction error. This analysis would be much more compelling by controlling for position and reward. Why not analyze the response to the 2nd element in the sequences RLL and RLR? This way, they are both in the same position and have the same local context (follow R). Reward expectancy is much less of a nuisance variable in this scenario.

(3) I am not convinced that it is valid to compare the 3 sequence responses to single responses in the 'competitor task' (Figure 1.d). The reward expectation is much different because there is a non-zero probability of reward after every choice. Furthermore, readers need to know much more information. Are these different animals with distinct training regiments, differences in the apparatus, etc.? Presumably, rats are going to the feeder after responses in the competitor task more often than the sequence task. I did not find in the present manuscript where the feeder was located relative to the response ports.

(4) My primary critique of the paper is that alternate explanations for variance are not sufficiently tested. Presently, this is done very piecemeal and is scattered throughout the results. The authors do address some of the possible confounders, but not to a level of sufficient rigor to support the overall claim that it is a dominant/exploratory strategy that accounts for differences in neural firing. In no particular order, I think the primary confounders are: reward expectancy; response vigor; posture during nose poke (which may depend on prior choice), position in sequence; side chosen, a relative time during the session (motivation and/or typical run-down of neural firing during long operant tasks), and rule certainty.

The potential for reward confound is fairly obvious. The authors should use some model of reward expectation as an estimate of reward expectation.

The authors need to do more to negate the possibility that neural variance reflects features of movement and posture. The authors mention an analysis visually presented in supplementary data. It is not clear that the principle component of trajectory is the best predictor. I recommend searching the feature space in an attempt to account for the most variance. For instance, body posture, head angle, and maximum velocity may provide superior predictive power. I am particularly concerned because the exploratory sequences appear to be often detected because of slowing in at least some phases of the responses. It is not clear whether to expect a linear relationship between motor output and neural signalling. A link function may be necessary.

Neural activity often drifts over the course of long sessions in which animals perform many trials of an operant task. The authors need to dismiss that this is a possibility, particularly in their analysis of dominant vs exploratory strategy. It is not evident from the manuscript how many trials animals are doing and how long the blocks are. Because of the number of possible sequences, it seems likely that the time between each strategy being the dominant versus exploratory may allow for task-independent drift (fatigue or mechanical instability) in neural activity. Authors should show that difference in firing for dominant/exploratory strategy is not influenced by drift in neural firing over time. If it is, this variance must be parsed out.

(5) Figure 1B: it is not clear why we are shown only one response sequence. Why not show performance accuracy overall strategies? The legend indicates that this is 'across the behavioural dataset'. Does this mean all sessions were concatenated? The sampling is unbalanced among animals. It is important to know if they performed similarly, so showing the mean of each animal is useful.

(6) I recommend tempering the language in the abstract, which mentions intelligence and complex settings. The relation of the present work to these topics is better presented in the discussion if the authors think them relevant.

(7) The firing rates shown in Figure 1 and 4 appear quite high for rat pyramidal neurons in mPFC. Are these cells typical of the dataset, or outliers in terms of firing rate? A quantification of the general activity over the population of cells is needed. Moreover, a plot of firing rate versus spike width would provide valuable insight into the distribution of putative pyramidal and interneurons.

https://doi.org/10.7554/eLife.84897.sa1

Author response

Strategy-specific action encoding

Reviewer 1 writes: “The authors provide strong data indicating that a given L or R response is associated with distinct ACC activity depending on which sequence that response is embedded within, a finding reminiscent of other reports in multiple brain regions. While not a criticism per se, I was interested in the center port responses, also embedded within unique sequences, yet never preceding reward. A key difference in the performance of a given R or L response is that it is sometimes the terminal response, and thus the rat knows a given R or L response to be sometimes reinforced in one of the contexts, but not the other, in each of these comparisons. I wonder if there was an opportunity to cleanly demonstrate the context dependence of a given individual action by comparing center port responses across distinct sequences.”

Reviewer 2 writes: “The authors examine neural responses to 'R' in LLR vs. RRL. But apparently, the two Rs are in different steps (step 3 in LLR and step 1 or 2 in RRL) within the two three-step sequences. Steps thus might explain such differential responses to the R in different sequences. Many neurons would likely show selectivity to different steps regardless of specific actions (L or R), which need further testing.

Reviewer 3 writes: “the authors compare the encoding of one action (L) in two sequences (RLL and LLR). However, the analyzed action occurs in different local contexts. In the first, it is the middle action, and in the second it is the first action following a reward omission. Even though the reward is withheld, the rat presumably has some reward expectation….

“…The primary argument provided that neurons encode strategy (Figure 1e) is based on the comparison to L in the RLL rule vs LLR. But a comparison is made between the secondnd element of the former rule and the first of the latter one. There is clearly a potential for confound by reward expectation. Even though the reward was withheld on the analyzed trials, there is likely different reward expectation. Specifically, the L response in the first position follows a negative reward prediction error. This analysis would be much more compelling by controlling for position and reward. Why not analyze the response to the secondnd element in the sequences RLL and RLR? This way, they are both in the same position and have the same local context (follow R). Reward expectancy is much less of a nuisance variable in this scenario.”

We have now expanded our argument about strategy encoding to address the Reviewers’ concerns, incorporating their insightful suggestions in two distinct ways:

1) Most importantly, we implemented a variant of the last suggestion from Reviewer 3. We have avoided using ‘RLR’ sequence in our tasks because it can be hacked quite effectively through basic alternation- an innate bias many naive rodents display. Nevertheless, we loved the spirit of the suggestion: match the immediate past choice as well as the distance to reward. We therefore took advantage of the sessions where our animals were tasked with finding all four non-trivial 4 letter sequences to demonstrate robust decodability of the first ‘R’ in ‘RRL’ vs ‘RLL’ (or the first ‘L’ in ‘LLR’ vs ‘LRR’) (new Figure 1d-e, last condition).

2) We have also established that all conclusions of the series of decoding analyses remain valid even when the decoding window is shifted to the center port entry – a common step at every trial initiation (new Figure 1d-e). Indeed, the original set of decoding analyses, done using a decoding window anchored on choice port entry has now been moved to the supplementary materials.

‘Dominant’/’exploratory’ vs ‘rewarded’/’unrewarded’

Reviewer 2 writes: “(ii) Because ACC neurons differentiate two exploratory contexts in which reward is unavailable, the authors believe that reward cannot explain the observed neural correlates of strategy prevalence, which is an over-generalized statement since the 'dominant' and 'exploratory' contexts are different in reward availability. Besides, the authors' claim, used to support the irrelevance of reward expectation, that expert rats will likely sample the 'exploratory' sequence instances without any immediate reward expectation lacks clear evidence.”

Reviewer 2 also writes: “(5) In Figure 3e, data from a small subset of sessions show that neurons show differential neural responses to two exploratory contexts, supporting two conclusions assessed by the authors. (i) Coding of contexts is beyond the dichotomy of 'dominant' vs. 'exploratory' but contains more specific information regarding richer contextual content. However, critical evidence to distinguish between two 'dominant' contexts by ACC neurons needs to be provided. Moreover, whether the coding for exploratory contexts results from other factors, such as overall reward rate, needs to be clarified.”

Reviewer 3 writes: “(4) …The authors do address some of the possible confounders, but not to a level of sufficient rigor to support the overall claim that it is a dominant/exploratory strategy that accounts for differences in neural firing. …I think the primary confounders are: reward expectancy…”

We apologize for not spelling out the logic of this argument better. Our reasoning was as follows: in a small subset of sessions, we included both ‘LLR’ AND ‘LLLR’ blocks (for this analysis, we were sure to pick sessions, in which proficiency with ‘LLLR’ was achieved). This permitted us to look at exploratory ‘RRL’ sequence instances (never rewarded) in these two distinct conditions. Note that by our selection filter, for any sequence instance to be even considered for the ‘exploratory’ dataset, the block-specific target (‘LLR’ and ‘LLLR’ respectively in this experiment) has to have been discovered by the animal, and moreover, it has to have come to dominate the animal’s choices. Thus, in the old Figure 3e (new Figure 3d), we were comparing ACC representations of ‘RRLs’ done as deviations from persisting with a previously discovered, rewarded ‘LLR’ or a previously discovered, rewarded ‘LLLR’. Both subsets of RRLs were exploratory, and thus associated with the same reward unavailability, yet as the old Figure 3e (new Figure 3d) shows, the associated global ACC neural states were as different as those between ‘exploratory’ and ‘dominant’ ‘RRL’s. This led us to conclude that (1) ‘rewarded’/’unrewarded’ is insufficient to capture the difference in global states, and (2) ‘exploratory’ is insufficient as well.

To answer Reviewer 2’s comment: “However, critical evidence to distinguish between two 'dominant' contexts by ACC neurons needs to be provided”:

The differential encoding of ‘RRL’ between ‘LLR’ and ‘LLLR’ contexts is in itself an indication that ACC neurons can tell the difference between the two contexts. Nevertheless, below we provide independent evidence to this effect by demonstrating that we can decode whether an ‘R’ at the end of a ‘(L)LLR’ originates in the ‘LLR’ block or in the ‘LLLR’ block.

More generally, it is still, however, a fair point that while our expert animals likely build on their extensive experience to expect only the target sequence to secure reward, some possibility remains that a transient one/two- instance exploratory resurgence of ‘RRL’ might come with some non-zero reward expectation (for instance, if animals were thereby pre-emptively evaluating the possibility of a block transition).

It is also fair that at least in some ‘dominant tails’ – stretches of determined commitment to the previously rewarded dominant sequence past unsignalled block transitions – our animals simply hadn’t noticed the transition yet, and thus hadn’t changed their reward expectation. We therefore sought to develop an additional setting aimed at disambiguating ‘reward’/’no-reward’ classification of the global ensemble state from ‘dominant’/’exploratory’ (or some more nuanced variation of the latter).

We built on one parsimonious interpretation of the distinct global ensemble states evident from the differential encoding of exploratory ‘RRL’ in ‘LLR’ vs ‘LLLR’ contexts (and other findings at the level of global contextual encoding). If the global contextual state of the ACC ensemble represents the inferred target sequence, then we should expect that state to be similar BOTH when the dominant sequence in the animal’s behavior is indeed aligned with the latent rewarded target, and when it is not, i.e. when the animal perseverates with a non-rewarded sequence at the expense of others.

In a new set of animals, we have now collected additional data in sessions where the target sequence set included not only the familiar ‘LLR’ and ‘RRL’, but also the recently introduced ‘LLLR’ and ‘RRRL’. We discovered, that under such conditions of early ‘LLLR’/’RRRL’ acquisition, animals’ choices in the ‘LLLR’ and ‘RRRL’ blocks were often dominated by the cognate familiar shorter sequence due to a lack of proficiency with the longer targets (see example in Figure 3f).

We now demonstrate that in such cases, ensemble states associated with such dominant but unrewarded ‘LLR’ sequence instances also do not cluster far away from ones observed earlier in the same session when the dominant ‘LLR’ strategy matched the rewarded target (new Figure 3f). Furthermore, since persistence with an unrewarded sequence in this case occurred after the animals had detected an unsignalled block change and switched away from the previous strategy (note the switch in new Figure 3f, middle epoch), the similarity of ensemble states cannot be explained by a lack of rule switch awareness that potentially confounds the dominant ‘tails’ example. Together with the evidence we explored in the original submission, these data argue that ‘rewarded’/’unrewarded’ dichotomy does not provide a parsimonious classification of the discrete global ACC ensemble states.

Exploratory sequence instances vs errors of execution

Reviewer 2 writes: “(2) To identify exploratory sequence instances in the behavioral data, the authors try to remove many action sequence instances that could be explained by factors other than 'exploration.' While this effort is appreciated, it remains to be answered whether one could pinpoint the actual 'exploratory' instances or their existence. To do so, we need to figure out if a mistake made by rats is random, based on a false belief, or only caused by disengagement. Even if there are some exploratory sequence instances in the behavior, other causes might contaminate identified instances, which could promote the current findings on the coding of strategy prevalence. For instance, the authors use a criterion combining the within-side duration and side-to-center duration in step 3 to remove overlapped and long-break sequences. It, however, can only partially rule out other possible contamination as many trials display long choice duration even in step 1 (Figure S3), and the variability is also substantial. One suggestion is that the authors try a more stringent criterion and test whether the primary findings hold firmly.”

There seem to be two separate, although related questions at the core of this concern:

i) Is there even evidence that any of the deviations from the target sequence represent anything other than errors of execution?

ii) How robust are the described effects to progressive culling of the set of all putative exploratory sequence instances?

i) Two observations argue against deviations from the discovered target sequence being mere errors of execution:

– Many of the deviations contain direct concatenations of several instances of an alternative sequence (see, for example, Figure 2b). Moreover, as can be inferred from Figure 2c, most exploratory sequence instances occur in bouts (otherwise, for any given exploratory sequence instances, the local prevalence of that sequence in recent past history would be close to zero, and the histogram would be much more squished to the left (see example of such a distribution in the second argument below))

– The type of bout-based putative exploratory resurgence of specific alternative sequences captured in Figure 2c is something we observe in animals that have experienced those sequences as latent targets. Below we give two examples:

– Across-animal comparison. Imagine that all putative instances of, say, ‘LLR’ we observe arose as mere errors of execution, despite the apparent correlation of co-occurrence (Figure 2c) suggesting that the exploratory sequence instances occur in bouts. Then, if we were to examine the dataset from animals trained on a distinct set of sequences – one that never included ‘LLR’ as a latent target – we would expect to encounter ‘LLR’s with bout frequency that is on par with what we see in the dataset in this manuscript. Contrary to that prediction, the distribution of local ‘LLR’ sequence prevalence look dramatically different in animals trained on ‘LRLRR’ and ‘LRLRLRR’ sequences (new Figure 1—figure supplement 2a-b).

– Within-animal comparison. For animals that first become proficient with ‘LLR’ and ‘RRL’ sequences, and later learn the longer ‘LLLR’ and ‘RRRL’ sequences to expert level, we see a noticeable increase in the incidence of the longer sequences in deviations from the dominant target after that learning stage­ (new Figure 1—figure supplement 2c-d).

We now mention this explicitly on p.6 of the manuscript:

“…Nevertheless, clear deviations from this dominant pattern, with the animals’ choices appearing instead to conform to other possible target sequences, were also present within all blocks (Figures. 1c, 2a-c). Although some of the deviations from the currently rewarded target sequence may represent errors of execution, the presence of bouts and even direct concatenations of previously reinforced sequences (Figure 1 —figure supplement 2a-d) argue that at least some of these deviations represent transient exploratory resurgence of alternative sequences.”

ii) The second question is something we worried about from the beginning, when we first observed the phenomena that are the subject of this study, at that time on the full set of putative exploratory sequence instances. For the paper, we had settled on the harshest set of criteria that still left enough sequence instances to give the dataset statistical power. Nevertheless, it is important to emphasize that we had verified the robustness of the findings to the inclusion of progressively harsher selection criteria at each sub-selection step along the way.

Activity modulation by strategy prevalence vs by reward or certainty

Reviewer 1 writes: I find it difficult to disentangle prevalence encoding and impacts of reward in the way the data and interpretation are presented in some areas of the text. While neural encoding may not reflect trial-by-trial reward receipt, clearly the rat's decision to repeat a given sequence or initiate a new sequence is impacted by reinforcement parameters and reward expectation. Thus being very exact in the interpretation would be helpful.”

Reviewer 2 writes: “The authors have tried to remove the influence of reward expectation by only looking at different exploratory states since no reward is available in these trials. But reward expectation can still explain differential firing to the dominant vs. exploratory strategies. The dominant tail effect seen in Figure 3f following the change of rules does not help because reward expectation, like strategy prevalence, also follows the subjective belief of the rats but not the actual task rule…”

Reviewer 3 writes: I think the primary confounders are: reward expectancy… and rule certainty…

The potential for reward confound is fairly obvious. The authors should use some model of reward expectation as an estimate of reward expectation.”

In response to the Reviewers’ requests, we have extended our analysis and narrative related reward in the following 3 ways:

1. We have expanded the set of contexts where a specific behavioral sequence was performed under a ‘no-reward’ condition. Specifically, we have now collected new data in a setting where persistent, but unrewarded ‘LLR’ and ‘RRL’ sequence instances dominated within the unfamiliar ‘LLLR’ and ‘RRRL’ contexts. What makes this fundamentally different from the case of ‘dominant tails’ is that the observed persistence with an unrewarded sequence in this case occurred after the animals had detected an unsignalled block change and switched away from the previous strategy (Figure 3f) and thus cannot be explained by a lack of rule switch awareness.

The addition of this extra 'no reward’ condition permitted us to more convincingly demonstrate (a) that the global ‘dominant’ ACC ensemble state relates to strategy dominance rather than the associated reward (see section ‘Dominant’/’exploratory’ vs ‘rewarded’/’unrewarded above and new Figure 3f ), and (b) that strategy prevalence makes at least some reward independent contribution to modulating ACC activity during the execution of a specific strategy (Figure 4 —figure supplement 2b-c). We also note that since rule certainly captures one’s belief about strategy outcome, observing activity modulation by strategy prevalence under conditions of long-term persistence with an unrewarded strategy argues against that modulation reflecting rule certainty.

2. We took advantage of the fact that in our dataset, the sequence prevalence parameter and the reward prevalence parameter are sufficiently decorrelated to probe – using multiple linear regression – whether the unexpected robustness of the simple strategy prevalence model above in explaining a marked fraction of ACC neural activity variance during the execution of a specific sequence derives mostly from the impact of successfully procured reward. By separately considering an expanded model with an ‘overall reward prevalence’ term and one with a ‘sequence-specific reward prevalence’ term, we demonstrate that the tracking of reward by the ACC ensemble is strategy specific – indirectly tying that to strategy prevalence – but that strategy-specific reward prevalence alone was insufficient to explain the activity modulation during strategy execution (new Figure 4d).

3. We have re-organized the narrative, expanding the discussion of these issues into a separate section (see section The tracking of reward by the ACC neural ensemble is strategy-specific, however reward prevalence is insufficient to account for activity modulation on p.17 of the revision and below):

The tracking of reward by the ACC neural ensemble is strategy-specific, however reward prevalence is insufficient to account for activity modulation

“Models of decision-making rarely include a summary statistic of the agent’s recent behavioral choices – such as the local strategy prevalence that we examine here. This prompted us to next consider how the observed modulation of ACC ensemble activity by local strategy prevalence might interact with any modulation exerted by the external reward – a more commonly considered parameter directly related to the concept of valuation thought to engage circuit computations in the ACC (Boorman et al., 2009; Kennerley et al., 2009; Kolling et al., 2012; Luk and Wallis, 2013). Indeed, variation in strategy prevalence is necessarily accompanied by variation in detailed reward history and thus understanding the interplay between the two in shaping ACC neural dynamics may shed further light on how the animals parse their environment.

We first established that local strategy prevalence provides at least some modulatory influence independent of external reward by evaluating the performance of the prevalence model trained exclusively on sequence instances executed under ‘no reward’ conditions. Specifically, we evaluated two distinct cases: ‘exploratory’ sequence instances in sessions comprising blocks of the familiar ‘LLR’ and ‘RRL’ targets, and persistent ‘LLR’ and ‘RRL’ sequence instances within the unfamiliar ‘LLLR’ and ‘RRRL’ contexts. The observed model performance in both cases was significantly more robust than what would be expected if most of the observed activity variance during sequence execution arose from the statistics of the associated reward (Figure 4 —figure supplement 2). In further support of the conclusion that sequence prevalence shapes ACC neural dynamics independent of the associated reward, we observed little activity modulation for sequence instances within ‘dominant’ tails – a period when sequence prevalence remains high, but reward expectation should rapidly diminish (Figure 3f). Combined, these data argue that a summary statistic of that strategy in past choices exerts a continuous modulation of the strategy representation in ACC and raise the question of how this modulation interacts with any shaping of the ACC activity by the external reward.

ACC is known to both multiplex information about reward and individual actions (Hayden and Platt, 2010), as well as track progression to reward over multiple steps of an action sequence (Shidara and Richmond, 2002). To determine whether the unexpected robustness of the simple strategy prevalence model above in explaining a marked fraction of ACC neural activity variance during the execution of a specific sequence derives mostly from the impact of successfully procured reward, we incorporated an additional ‘reward prevalence’ term into the linear model and determined the relative contribution of the ‘sequence prevalence’ and ‘reward prevalence’ terms to the expanded model’s performance (see below). Provided that the two parameters are sufficiently decorrelated in our dataset to avoid collinearity – due to factors like reward omission, exploratory sequence instances and transient persistence with off-target sequences– their weights in the expanded model would reflect the relative contribution to the model’s performance from the summary behavioral statistic itself and from the recently procured reward. To account for the possibility that reward statistics may be processed by, and exert influence on, the ACC circuitry at either the single trial- or the single sequence level, we separately considered a model with an ‘overall reward prevalence’ term and one with a ‘sequence-specific reward prevalence’ term. As such, a reward obtained for the target ‘RRL’ sequence before an exploratory, off-target, ‘LLR’ sequence instance would contribute to the ‘overall reward prevalence’ term, but not to the ‘sequence-specific reward prevalence’ term in models aimed at explaining neural activity variance during ‘LLR’ execution. We computed each ‘reward prevalence’ term by weighing the relevant previous rewards with the same temporal filter used to calculate sequence prevalence and identified a subset of behavioral sessions for which the ‘sequence prevalence’ and each of the ‘reward prevalence’ parameters were sufficiently decorrelated in our dataset (Methods).

Does the observed marked modulation of ACC neural dynamics during different instances of sequence execution outside of some particularly persistent ‘no reward’ contexts arise largely from variation in recent reward history? To resolve this, we determined whether the model’s weight for the ‘sequence prevalence’ term became negligible once the ‘reward prevalence’ term was added to the model. Contrary to this expectation, neither the ‘overall reward prevalence’ nor the ‘sequence-specific reward prevalence’ terms dwarfed the contribution from the ‘strategy prevalence’ term to the model’s performance, with the ‘overall reward prevalence’ making a particularly small contribution to the model’s explanatory power (Figure 4d). Furthermore, while the contribution from the ‘sequence-specific reward prevalence’ was on par with that of ‘strategy prevalence’, it was clearly insufficient on its own to account for the model’s explanatory power (Figure 4d). Combined, these observations argue that the tracking of reward by ACC is strategy-specific, and that both a summary statistic of the specific self-guided behavioral strategy in recent past and a statistic of the associated reward shape the ACC neural dynamics during strategy execution, further highlighting the central role of the animal’s strategy choice rather than the external rule in this process.”

Modulation by strategy prevalence vs by movement-related parameters

Reviewer 1 writes: “Would it make sense to focus the control analyses (vigor, reward history, and so on) on those sessions/ensembles with greater variance explained, ie, perhaps there might be greater sensitivity to detecting interactions among variables within ensembles recorded more rostrally?”

Reviewer 3 writes: “The paper would benefit from analyses, such as multiple regression over all possible predictive variables, to evaluate the relative amount of neural signal variance attributable to strategy dominance compared to other information….

We have now refined our evaluation of the possibility that the observed linear modulation of ACC activity in fact reflects modulation by co-varying movement-related parameters by combining these two insightful suggestions. Specifically, we have expanded the section devoted to this issue (p. 16) where we focused in more detail on the dataset recorded in the more rostral part of the cingulate cortex and asked whether adding a separate movement-related parameter to the linear model would shift the explanatory power away from the sequence prevalence term. We report that the expanded models revealed only a minor contribution of the execution vigor or of the specific trajectory to the overall model’s performance – an observation consistent with the relatively poor performance of the relevant single parameter models included in the original submission. We write:

“The comparatively low amount of neural activity variance in the motor regions M2 and SMC explained by strategy prevalence suggests that the observed modulation is unlikely to reflect movement parameters that may co-vary with changes in the animal’s dedication to a particular strategy. To establish this more explicitly, we directly quantified trial-by-trial measures of movement vigor (sequence execution time) and kinematics (the first principal component of movement trajectory) for each instance of sequence execution and then asked how well linear models that incorporate these parameters performed at explaining variance in ACC firing rates (Methods). We first evaluated the performance of an equivalent linear model that used either execution time or the first principal component of movement trajectory as a single parameter; neither model could explain ACC neural activity variance as well as sequence prevalence (Figure 4 —figure supplement 3a). We then focused in more detail on the dataset recorded in the anterior part of the cingulate cortex that displayed the strongest modulation by strategy prevalence (Figure 4b,c), and asked whether adding a separate movement-related parameter to the linear model would dwarf the explanatory power of the prevalence term (Methods). To ensure that the resulting weights on the two parameters would directly reflect their relative contributions, we z-score normalized all the variables to 0 mean and standard deviation of 1 prior to fitting the model. Consistent with the relatively poor performance of the single movement-related parameter models, the expanded models revealed only a minor contribution from those motor parameters to the overall model’s performance (Figure 4 —figure supplement 3b). While it remains possible that more nuanced aspects of movement, like the animal’s posture, contribute to the observed activity modulation, these observations suggest that gross movement-related parameters are not behind the explanatory power of strategy prevalence.”

Reviewer 3 also writes: “It is not clear that the principle component of trajectory is the best predictor. I recommend searching the feature space in an attempt to account for the most variance. For instance, body posture, head angle, and maximum velocity may provide superior predictive power. I am particularly concerned because the exploratory sequences appear to be often detected because of slowing in at least some phases of the responses.”

Unfortunately, the specifics of the video data that accompany this dataset precludes us from doing more sophisticated pose tracking. To highlight the limitation of our analysis, we included the following comment to the relevant text section:

“While it remains possible that more nuanced aspects of movement, like the animal’s posture, contribute to the observed activity modulation, these observations suggest that gross movement-related parameters are not behind the explanatory power of strategy prevalence. “

It is worth mentioning, however, that the momentary slowing down that accompanies the demarcation by an animal of a completed sequence happens outside of the temporal window used in all analyses in this manuscript.

Finally, Reviewer 3 notes: “It is not clear whether to expect a linear relationship between motor output and neural signalling. A link function may be necessary.”

We regret that the compact wording in the original submission left the logic behind out analyses unclear. The specific question we aimed to address was whether the linear component of the activity modulation we were attributing to sequence prevalence was instead the result of the underlying variability in motor parameters. As such, we selectively evaluated the explanatory power of a linear relationship. We hope that the expanded explanation, along with the inclusion of the multiple linear regression analysis, clarifies the logic of the argument.

Additional details about behavior is needed: accuracy on all strategies, how many blocks per session, did rats have individual biases in strategy, etc.

Reviewer 1 writes: “The main text provides an intriguing narrative but lacked very many of the quantitative details one would like to evaluate the claims, requiring a lot of back and forth between the results and the methods, and the figure legends. The number of rats, the number of sequence occurrences, the number of neurons simultaneously recorded, etc. For any given analysis of a set of sessions, the reader is not told if the sessions switched among 2, 3, or 4 sequences, nor what the reinforcement parameters were, all variables that potentially impact the behavior and neural encoding, and therefore the decoding analyses.”

We apologize for the confusion that arose from our laconic style in the original submission; the importance of stating this clearly is further underscored by the fact that in the revised version, we now deviate on the target sequence set specifically for the decoding analyses (see ‘Strategy-specific action encoding’ section of the response above).

Initially, we restricted our analysis of neural activity to sessions with ‘LLR’/’RRL” transitions (with the exception of a single control for the encoding of ‘exploratory’ sequence instances in two distinct ‘dominant’ contexts). We apologize for not making the rational here clear: this choice ensured that we removed additional uncertainty inherent in parsing circularly permuted sequences when identifying putative exploratory sequence instances (e.g. disambiguating a putative ‘LRR’ from a putative ‘RRL’ in ‘LRRL’). In revision, we both outline this argument more clearly. Furthermore, when exceptions to this general rule are made, we not only mention it, but explain the purpose and the task design.

Specifically, to explain the rationale for largely focusing on ‘LLR’/’RRL’ sessions when analyzing contextual modulation of sequence representation, we now include the following statement in the text (p. 9):

“In contrast, parsing is much harder for the deviations from the dominant target outside of clear sequence concatenations. This is particularly challenging in cases where the task set contains circularly permuted strategies: does a ‘LRRL’ deviation during a ‘LLR’ block reflect a ‘LRR’ instance, a ‘RRL’ instance, or neither? And while the additional uncertainty inherent in parsing circularly permuted sequences can be resolved by focusing on sessions that contained only ‘LLR’ and ‘RRL’ blocks, not even every apposition of ‘Left’, ‘Left’ and ‘Right’ choices in a ‘RRL’ block will reflect an exploratory instance of the ‘LLR’ sequence. We therefore next sought to delineate an objective criterion for including any lone putative exploratory sequence instance in the subsequent investigation; with the exception of a few pre-specified control experiments, the remaining analyses are restricted to ‘LLR’/’RRL’ block switches.

Once key place where we deviate from focusing on ‘LLR’/’RRL’ sessions is in the decoding section. Here, in response to Reviewers’ suggestions, we now also take advantage of sessions that tasked our animals to find all four non-trivial 3 step targets (see more on that below in the Strategy-specific action encoding section of our response to the “The most significant weakness” part of Essential Revisions). For these specific analyses, we could avoid the uncertainty of parsing by restrictoing the decoding analyses to those sequence instances that both matched the latent rewarded target and had established dominance in the animal’s choices. Indeed, analyzing such dominant sequence instances for each target sequence was sufficient to answer the question of whether action representations in ACC is strategy-dependent.

We now write in the opening portion of that part of the manuscript (p.6):

… we first verified that the specific choice of strategy in this more complex sequence task is reflected in ACC neural dynamics by establishing that individual sequences could be decoded from ensemble activity. For these strategy decoding analyses we focused on those periods when each sequence of interest both matched the latent rewarded target and had established dominance in the animal’s choices.”

We also made sure to specify when the inclusion of these sessions was necessary. We now write (p.7):

“The observed differences in ACC representations for a given ‘L’ (or ‘R’) action within the sequences ‘LLR’ and ‘RRL’ could reflect the differential encoding of distinct strategies that are currently being pursued. However, the ‘L’ (or ‘R’s) in these different sequences are also associated with differences in the immediate history of other actions (‘L’ in ‘RRL’ follows an ‘R’, while one of the ‘L’s in ‘LLR’ follows an ‘L’) and the proximity of rewards (no ‘R’ in ‘RRL’ is ever rewarded, whereas most ‘R’s in ‘LLR’ are). We therefore carried out additional analyses to ask whether either of these factors – surrounding actions, or reward proximity – could account for the apparent strategy encoding in ACC, taking advantage of a subset of sessions that tasked our animals with discovering additional 3 letter target sequences.”

As for the reinforcement parameters, although the fraction of reward omission varied across our dataset, it was always matched across different target sequences within a session. Thus, reinforcement parameters did not come into play for the decoding analyses.

Reviewer 2 writes: “(1) The authors claim the performance of the rats to be 'robust.' Is there an objective measurement to test such robustness? The conditioned probabilities in Figure 1b might provide some evidence. But it is not straightforward to comprehend what a probability of 0.4 means relating to behavioral performance. It would be helpful to have a learning curve showing baseline behavioral performance and whether expert rats have reached their behavioral asymptotes.”

We apologize for not referencing the figures behind this statement. In our minds, 3 figures play into that claim- Figure 1b, Figure 2c and new Figure 1—figure supplement 1.

One straightforward way to define performance is the probability (or prevalence) that the animal is doing the target sequence. In early experiments, we found that the animals’ performance indeed asymptotes in a version of the task where only one target sequence is used over the course of many sessions. However, in the dynamic version of the task used in this manuscript – one with repeated switches between blocks characterized by different target sequences – such performance can locally vary within a session even for expert animals because they repeatedly search for a new target sequence or deviate to explore the alternatives. Thus, we feel that the distribution of local sequence prevalence values for dominant blocks (Figure 2c) is more informative than a learning curve. In essence, that analysis characterizes the likelihood for any given dominant ‘LLR’ (or ‘RRL’) that the animal has locally dedicated a certain amount of attention to that target sequence (with ‘1’ signifying that the animal has not deviated from the target sequence over the past 20 trials). What Figure 2c fails to show is that the target is often cleanly concatenated even when transient deviations from that dominant strategy crop up over the course of the 20-trial window; that information is instead presented in Figure 1b.

Finally, the ability of our animals to detect and adjust to block transitions within tens of trials (old Supplementary Figure 1, new Figure 1—figure supplement 1a) is another aspect of robust task performance we would strive to highlight. We have made sure to reference these 3 Figures in the context of the robustness claim.

Reviewer 3 writes: “(1) …the description of behavior is incomplete relative to the description of the task in the methods. Rats learned many sequences, but the results of only one are shown. It is important to show whether the rats had biases for particular sequences, particularly for the interpretation of neural data. Otherwise, neurons might be revealing preferred vs non-preferred strategies. Showing the entropy of response strategies (i.e. sequences parsed into triplets and measuring the resulting entropy of triplets) may be a compact way to do so.”

We apologize for two separate sources of confusion here.

The first source stems from our misguided attempt to simplify axis labels. Our summary plots always show data pooled across ‘LLR’ and ‘RRL’ strategies, which we think of as interchangeable. Thus, for instance, what was labeled in the original Figure 1b as p(LLR|LLR) is, in fact, p(LLR|LLR in ‘LLR’ block) OR p(RRL|RRL in ‘RRL’ block). We have now expanded the relevant labels to reflect the fact that both sequences contributed to the analyses.

Most importantly, however, all analyses about activity modulation by local and global context are always done for different instances of the same strategy (ie action sequence), and only during epochs when the target sequence has established dominance. For instance, whereas we compare exploratory ‘RRL’ to dominant ‘RRL’, or exploratory ‘LLR’ to dominant ‘LLR’, we never compare ‘RRL’ to ‘LLR’. Thus, preference to a sequence doesn’t account for any spotted difference in such analyses.

Thus, putative preferences for a difference strategy would not factor into the interpretation of neural data. We have now emphasized this point several times throughout the revision to help the reader. For example, we conclude the opening section on strategy encoding on p. 8 by saying:

“Combined, these observations argue that individual multi-step sequential strategies have distinct ACC representations, permitting us to next evaluate whether the ACC representation of any specific sequence is further modulated by the specific context in which a particular instance of that sequence is being executed. “

Reviewer 3 also writes: “(5) Figure 1B: it is not clear why we are shown only one response sequence. Why not show performance accuracy overall strategies? The legend indicates that this is 'across the behavioural dataset'. Does this mean all sessions were concatenated? The sampling is unbalanced among animals. It is important to know if they performed similarly, so showing the mean of each animal is useful

We again apologize for the confusion that arose from our attempt to keep the axis labels simple. In all behavioral analyses, the data was always pooled for the ‘LLR’ and ‘RRL’ sequences (they are, in our minds, equivalent). We have now corrected the labels (e.g. replacing p(LLRLLR)|p(LLR) with p(LLRLLR)|p(LLR) or p(RRLRRL)|p(RRL)).

It is again important to emphasize that all analyses about activity modulation by local and global context are always done for different instances of the same strategy (ie action sequence), and only during epochs when the target sequence has established dominance. Thus, a somewhat better performance on one strategy over another – if present – would not factor into the interpretation of neural data.

Reinforcement effects not sufficiently integrated/discussed

Reviewer 1 writes: “Some greater attention to the behavioral parameters could be helpful, especially regarding the impact of reward rate on behavior. For example, looking at some of the figures of individual rat behavior, exploratory sequences seemed triggered by reward omission. Is this just a chance for the examples chosen or is there something systematic here? Upon block switch, how exactly does the switch in sequences emitted by the rat track with reinforcement history? The authors mention that reinforcement probability differed across sessions, and one would thus expect switching behavior would as well. Because of the interesting existence of sometimes quite long 'tails' of performance of the original sequence after a block switch, I am wondering how the length of such tails relates to reinforcement rate parameters.”

Our original choice to give these details only partial attention stems from the fact that any potential differences in the rate, at which our animals adjust their behavior at block transitions would not affect the analysis or interpretation of the neural data for two reasons:

(1) With the exception of the strategy decoding section, all of the neural data analyses in the manuscript compare representations of the same given action sequence, just across different instances of its execution.

(2) Sequence instances selected always come from session epochs where one sequence has established a clear dominance in an animal’s behavior.

Nevertheless, we are happy to add the requested details below.

“…looking at some of the figures of individual rat behavior, exploratory sequences seemed triggered by reward omission. Is this just a chance for the examples chosen or is there something systematic here?”

This is a very insightful question, and one we could certainly devote more attention to in the manuscript. As we mentioned in the original submission, we reliably see exploratory instances of other sequences even in the absence of reward omission (i.e. in animals — like two in the included dataset — that have only ever experienced deterministic reward for where every instance of correctly executed target sequence) (new Figure 1 —figure supplement 2 f-g). Furthermore, even with animals that, through experience, always expect a certain degree of reward omission, we see examples of exploration that are not obviously triggered by reward omission (new Figure 1—figure supplement 2h).

That said, not unexpectedly, when evaluated across the entire dataset for animals trained under the conditions of periodic reward omission, there is indeed a spike in omission rate when aligned relative to the onset of exploratory bouts (new Figure 1—figure supplement 2i). This observation suggests that the onset of exploratory bout is indeed more likely to happen following an omitted reward of the dominant sequence.

We have now included this analysis in the revised version. Specifically, we write (p 6):

“Furthermore, while such transient deviations away from the dominant sequence were significantly more likely to follow the absence of an expected reward (Figure 1 —figure supplement 2i), similar strategy deviations were present even when no reward was omitted (Figure 1 —figure supplement 2f-h), suggesting that animals continue to sporadically sample other sequences even when not extrinsically prompted.”

“Upon block switch, how exactly does the switch in sequences emitted by the rat track with reinforcement history? The authors mention that reinforcement probability differed across sessions, and one would thus expect switching behavior would as well. Because of the interesting existence of sometimes quite long 'tails' of performance of the original sequence after a block switch, I am wondering how the length of such tails relates to reinforcement rate parameters.”

We find that at block transitions, neither the length of the ‘dominant tails’ nor the number of trials to commit to the new dominant strategy displays a significant dependence on reward omission rate (Figure 3—figure supplement 1). Consistent with this, we see “tails” even in animals that never experience reward omission. One possibility is that these “tails”, with their high sequence prevalence despite an utter lack of reinforcement represent a concerted effort by the animals to falsify the “this sequence is still valid” hypothesis.

Too much unnecessary jargon reduces clarity

Reviewer 2 writes: “I find the manuscript often hard to follow primarily because of the usage of many jargons and extended expressions and sentences, which are unnecessary and could be replaced with more plain language to improve readability for a broader audience….

…Besides, the introduction needs to be narrower in scope, especially at the end, to successfully frame and specify the questions the current study would like to answer. The last paragraph in the introduction contains a lengthy summary of the results. But since the task and results are so complicated, it is almost impossible to understand it without going through the following results in detail.”

We apologize for the undue influence of run-on sentences more common in our native language. We thank the Reviewer for nudging us to make some changes, and for highlighting the difficulty with parsing the unnecessary detailed preamble of the original introduction. We hope that the simplified narrative (including the re-written concluding paragraph of the introduction, as well as many parts of the Results) will be easier to follow.

Reviewer 2 also raised a number of specific possibilities for changing the original wording, which we admittedly chose at times with the additional unpublished work from the lab in mind.

“(i) behavioral framework = behavioral task? “

We have implemented the suggested change.

(ii) searching a structured space of action sequences = nose poking?

In the end, we felt that “nose poking” does not quite capture the essence of our task design. However, we have simplified the wording:

“…requiring rats – in an apparatus that has ‘left’ and ‘right’ nose ports – to discover (without any explicit instruction) a specific rewarded sequence of ‘Left’/’Right’ choices from a larger set of structured possibilities.”

(iv) restructuring of ACC network configuration = neural activity change in ACC?

Since both the marked, abrupt, ensemble-wide changes in activity that accompany inference of block transitions and the more gradual modulation by local sequence prevalence are neural activity changes, we feel that “restructuring of ACC network configuration” is a more accurate, if somewhat mechanistic, account of the former.

(v) persists through a substantial ensemble reorganization = persistent neural code?

We worry that “persistent neural code” might suggest to a reader a stable neural representation. Instead, we are trying to convey that changes of neural activity in accordance with strategy prevalence are present across distinct global contexts that cause large-scale restructuring of the ensemble activity (see previous point). We therefore favor retaining the original wording.

(vi) tags ACC representations with contextual content = ACC encodes context?

Our goal was to highlight that ACC representation of individual strategies is modified in distinct contexts. Thus, both the identity of the specific strategy and the global behavioral context affect ACC neural activity – something that would not be captured by saying “ACC encodes context”. We do appreciate, however, that this would be somewhat incomprehensible in the introduction; that part is now greatly simplified in the revision.

(vii) organizing principles for the ACC ensemble dynamics = how ACC encodes information?

Our goal was to set up the contrast between the marked structuring of the ACC dynamics we observe in this stud (Figure 5) and the emerging picture from past findings that there is seemingly little task-related activity at the level of single units. We suspect it is less about how ACC encodes information and more about what information it encodes. We have reworded the offending bit to state

“raising the possibility that more structured responses in the ACC remain to be discovered.”

(viii) representations of individual strategies are also marked with contextual content = encoding of strategy per se?

As we detailed in our answer to suggestion (vi), the fact that two separate effects contribute to shaping ACC neural activity leads us to favor our more comprehensive, if somewhat more cumbersome, wording over one that highlights either context encoding or strategy encoding alone.

Results and a more general description of the analytical methods should be in the main text rather than the methods. Analytical methods are hard to follow in some places.

We have expanded the discussion of our approach to establishing strategy encoding in the ACC, as well as elaborated on our approaches to eliminating the major confounds. We hope that the Reviewers will find that the revised version strikes a better balance between providing enough details while still presenting a smooth narrative.

A more incremental and linear exposition would help readers understand the data and analyses.

We have expanded the following arguments:

– Deviations from dominant sequence targets are not mere errors of execution

– Action encoding in the ACC is strategy specific

– Large-scale re-organizations of ACC ensemble do not reflect a mere ‘reward’/’no reward’ dichotomy

– Motor parameters cannot explain observed activity modulation

We have also introduced a separate, strongly elaborated section on the contribution of reward encoding.

We hope that the Reviewers will deem the new narrative easier to follow for the reader.

Point-by-point Responses to Other Reviewer Comments

Reviewer 1

“In analyzing neural activity accompanying the behavioral persistence of the dominant sequence after a block change, the authors find that the ACC ensemble firing pattern is closer to the original dominant sequence pattern during reinforcement and less like this pattern during exploration… As time, and trials, progress the rat is approaching the point at which it explores another strategy. The authors find strengthened "prevalence" encoding with increasing sequence repetition, but if this parameter is related to behavioral change/flexibility, this was not clear to me. Might there be something unique about the last trials in a tail "predicting" an upcoming switch? Can the authors please expand? Relatedly, if the prediction of upcoming behavioral change is not observed in the neural activity from sequence steps 2-6, it is notable that these are the steps 'within' the sequence, that leaves out the initiation (first center poke) and termination (reward/reward omission). Thus one could imagine this information is "missed" in the current analysis given that both the reward period and the initiation of a trial at the center are not analyzed. This does lead me to suggest a softening of some claims made of identifying "unifying principles" of ACC function, as the authors state, based on the analyses included in the current report, since the neural activity related to the full unit of behavior is not considered. (I appreciate the motivation behind this focus on within-sequence behavior – the wish to compare time periods with similar movement parameters.)

We apologize for the confusion; while the sequence prevalence itself tends to be high for ‘dominant tails’, we do not claim that the fit of the prevalence model is better at those sequence instances. We do share the interest in linking prevalence encoding to behavioral adaptation as well as the Reviewer’s intuition that block transitions should be among the epochs where strategy prevalence is tracked particularly well. And indeed, we had spent a considerable amount of time thinking about whether we can identify and interpret periods during the session where our prevalence model fits better or worse. Two arguments convinced us to abandon that direction: a technical one and a conceptual one. The technical argument is that when the explanatory power of a variable is limited, regression residuals are proportional to the variable itself. Thus, any meaningful comparison of the model’s fit would have had to be done for periods where strategy prevalence is within a similar range. The conceptual argument is even more disarming: imagine we do identify a putative session epoch where the model fits worse. While it is possible that it truly means that the animal tracks the details of how much he has pursued this strategy in recent past less, it is equally possible that we were simply off in selecting the specific window over which the prevalence signal is estimated, the exact behavioral statistic tracked, or the exact form of the dependence between that statistic and neural activity. We certainly do see changes leading up to behavioral switches at block transitions – something we plan to elaborate on in a subsequent paper – but whether those are related to prevalence tracking is something we believe is hard to crack.

We have now made sure that we don’t make the claim that the structuring of activity by prevalence during strategy execution is a ‘unifying principle’ either the Results or the Discussion sections.

Reviewer 2

(3) … (iii) ACC neurons might code an action sequence as a whole or configuration, much like a compound action, which might instead be interpreted as the authors' specific content of strategy prevalence. However, further evidence is needed to rule out the possibility that neural activities are merely related to a complex action per se rather than a higher cognitive cause. While it is understandable that confounds are not easy to control fully, the authors should emphasize these limitations more in the discussions.

We apologize for not making the foundational premise clearer: all comparisons of activity dynamics in this manuscript are always done for different instances of the same multistep sequence. As such, the modulation by sequence prevalence we describe here is independent of the specific form in which a multi-step strategy is encoded in the first place. It is precisely the type of concerns insightfully expressed here by several Reviewers that convinced us that ‘across-instances-of-same-sequence’ are the only interpretable comparisons that can be performed.

(4) Recording data show that neural correlates of strategy prevalence in M2 are slightly higher than in SMC, but inactivating SMC but not M2 affects corresponding behavior. How do the authors explain the disagreement between recording and inactivation data?

We believe that the role that SMC plays in this behavior does not involve tracking the statistics of chosen strategies in recent past.

More importantly, behavioral data on the inactivation of ACC is missing, although cannula implantation surgery in ACC is mentioned in the manuscript. More clarifications are needed for these questions.

The behavioral effect of ACC inactivation has been reported in our previous publication (see Figures 3 and S2 in (Tervo et al., 2014)).

Reviewer 3

3) I am not convinced that it is valid to compare the 3 sequence responses to single responses in the 'competitor task' (Figure 1.d). The reward expectation is much different because there is a non-zero probability of reward after every choice. Furthermore, readers need to know much more information. Are these different animals with distinct training regiments, differences in the apparatus, etc.? Presumably, rats are going to the feeder after responses in the competitor task more often than the sequence task. I did not find in the present manuscript where the feeder was located relative to the response ports.

We apologize for the several sources of confusion here! First, we compared a sequence of the same 3 responses in both conditions, i.e. a ‘LLR’ sequence in both the competitor and the sequence tasks. Second, in all our tasks – done in the same behavioral box – liquid reward is delivered directly at the choice ports. We have now included that information in the figure legend. That said, we appreciate that this was not the ideal comparison for strategy encoding. We hope that the Reviewer will find that our newly added analyses (see section ‘Strategy-specific action encoding’ in Essential Revisions above) strengthen the main claim of that section to their satisfaction.

4) My primary critique of the paper is that alternate explanations for variance are not sufficiently tested. Presently, this is done very piecemeal and is scattered throughout the results.

We have now expanded a number of these arguments, and, most importantly, consolidated the discussion of the reward contribution into its own section. Please refer to the section ‘Activity modulation by strategy prevalence vs by reward or certainty’ in Essential Revisions above for more details.

The authors do address some of the possible confounders, but not to a level of sufficient rigor to support the overall claim that it is a dominant/exploratory strategy that accounts for differences in neural firing. In no particular order, I think the primary confounders are:

reward expectancy;

Reward expectancy is a confusing concept; it may mean different things to different readers. At one level, all behavioral choices may be framed as being associated with reward expectancy, even when there are no specific reward expectations. The concept is even further confounding because of the inadequate specification of reward during many discussions—social approbation and social stimuli, relief from anticipated costs, and even knowledge may be as rewarding outcomes as appetitive consumables. Indeed, even hypothesis falsification presumably rarely centers on completely implausible choices and almost always yields clarifying information.

Given this tight conceptual coupling between sought outcomes and agent’s choices, at one extreme, one could argue that reward expectancy is reflected in the extent to which an agent devotes attention to any specific strategy – an interpretation that also relates to the concept of subjective value. We would argue that this interpretation of reward expectancy is semantically equivalent to what we capture here with the local strategy prevalence. However, reward expectancy can also apply to any singular action, or specific strategy. We tackle these concepts now in a new section of the manuscript titled “The tracking of reward by the ACC neural ensemble is strategy-specific, however reward prevalence is insufficient to account for activity modulation.” And while we do not claim to have explicit knowledge of the specific updating rule used by each animal for its reward expectation, we use multiple linear regression analysis to determine whether the unexpected robustness of the simple strategy prevalence model in explaining a marked fraction of ACC neural activity variance during the execution of a specific sequence derives mostly from the impact of successfully procured reward. We believe that the proper approach to the latter question requires us to apply the same updating rule to the reward term as we did to the prevalence one. Please refer to the section ‘Activity modulation by strategy prevalence vs by reward or certainty’ in Essential Revisions above for more details.

­

response vigor;

We apologize that our piecemeal presentation in the original submission made this part difficult to find; we used execution time as a way to capture response vigor. The expanded analysis of this parameter, along with movement trajectory, is now detailed in the following section (p 16):

“The comparatively low amount of neural activity variance in the motor regions M2 and SMC explained by strategy prevalence suggests that the observed modulation is unlikely to reflect movement parameters that may co-vary with changes in the animal’s dedication to a particular strategy. To establish this more explicitly, we directly quantified trial-by-trial measures of movement vigor (sequence execution time) and kinematics (the first principal component of movement trajectory) for each instance of sequence execution and then asked how well linear models that incorporate these parameters performed at explaining variance in ACC firing rates (Methods). We first evaluated the performance of an equivalent linear model that used either execution time or the first principal component of movement trajectory as a single parameter; neither model could explain ACC neural activity variance as well as sequence prevalence (Figure 4 —figure supplement 3a). We then focused in more detail on the dataset recorded in the anterior part of the cingulate cortex that displayed the strongest modulation by strategy prevalence (Figure 4b,c), and asked whether adding a separate movement-related parameter to the linear model would dwarf the explanatory power of the prevalence term (Methods). To ensure that the resulting weights on the two parameters would directly reflect their relative contributions, we z-score normalized all the variables to 0 mean and standard deviation of 1 prior to fitting the model. Consistent with the relatively poor performance of the single movement-related parameter models, the expanded models revealed only a minor contribution from those motor parameters to the overall model’s performance (Figure 4 —figure supplement 3b). While it remains possible that more nuanced aspects of movement, like the animal’s posture, contribute to the observed activity modulation, these observations suggest that gross movement-related parameters are not behind the explanatory power of strategy prevalence. “

posture during nose poke (which may depend on prior choice), position in sequence; side chosen,

We apologize for the confusion. With the exception of the strategy decoding section, all of the neural data analyses in the manuscript compare representations of the same given action sequence, just across different instances of its execution. Furthermore, the time window around the center port entry on the first step is excluded from the analyses to minimize the contribution from the previous sequence. Thus, while small variations in posture are still possible, those dependent on history, position in sequence or side chosen would be the same across different instances of the same action sequence, and thus would not confound our analyses.

Nevertheless, to highlight the fact that the resolution of our video data precludes us from doing detailed posture tracking, we have rounded up the section on motor confounds with the following statement:

“While it remains possible that more nuanced aspects of movement, like the animal’s posture, contribute to the observed activity modulation, these observations suggest that gross movement-related parameters are not behind the explanatory power of strategy prevalence. “

a relative time during the session (motivation and/or typical run-down of neural firing during long operant tasks),

Figure 3c-d presents the easiest way to see that time during the session cannot explain the observed effects. Indeed, the two blocks of ‘dominant’ ‘RRL’s in the example in 3c, and more generally across the dataset, are more distant from each other in time than each pairing of ‘dominant’ vs ‘exploratory’ blocks. Yet, the representations across the two dominant blocks are always more similar than those between ‘dominant’ and ‘exploratory’.

and rule certainty.

We have now expanded the set of contexts where a given behavioral sequence was performed under a ‘no-reward’ condition. Specifically, we have now collected new data in a setting where persistent, but unrewarded ‘LLR’ and ‘RRL’ sequence instances dominated within the unfamiliar ‘LLLR’ and ‘RRRL’ contexts. What makes this fundamentally different from the case of ‘dominant tails’ is that the observed persistence with an unrewarded sequence in this case occurred after the animals had detected an unsignalled block change and switched away from the previous strategy and thus cannot be explained by a lack of rule switch awareness.

The addition of this extra 'no reward’ condition permitted us to more convincingly demonstrate (a) that the global ‘dominant’ ACC ensemble state relates to strategy dominance rather than the associated reward, and (b) that strategy prevalence makes at least some reward independent contribution to modulating ACC activity during the execution of a specific strategy. Critically, since rule certainly captures one’s belief about strategy outcome, observing activity modulation by strategy prevalence under conditions of long-term persistence with an unrewarded strategy argues against that modulation reflecting rule certainty.

Neural activity often drifts over the course of long sessions in which animals perform many trials of an operant task. The authors need to dismiss that this is a possibility, particularly in their analysis of dominant vs exploratory strategy. It is not evident from the manuscript how many trials animals are doing and how long the blocks are. Because of the number of possible sequences, it seems likely that the time between each strategy being the dominant versus exploratory may allow for task-independent drift (fatigue or mechanical instability) in neural activity. Authors should show that difference in firing for dominant/exploratory strategy is not influenced by drift in neural firing over time. If it is, this variance must be parsed out.

Figure 3c-d presents the easiest way to see that activity drift during the session cannot explain the observed effects. Indeed, the two blocks of ‘dominant’ ‘RRL’s in the example in 3c, and more generally across the dataset, are more distant from each other in time than each pairing of ‘dominant’ vs ‘exploratory’ blocks. Yet, the representations across the two dominant blocks are always more similar than those between ‘dominant’ and ‘exploratory’.

We write on p. 11:

“Moreover, the mean Euclidean distance in the activity state space between two ‘dominant’ blocks separated in time was significantly smaller than the mean pairwise distances between a dominant and exploratory block, arguing against the possibility that the observed representational transitions in ACC arose from an instability in neural recordings (Figure 3 c,d).”

6) I recommend tempering the language in the abstract, which mentions intelligence and complex settings. The relation of the present work to these topics is better presented in the discussion if the authors think them relevant.

We have now removed any reference to intelligence from the abstract.

7) The firing rates shown in Figure 1 and 4 appear quite high for rat pyramidal neurons in mPFC. Are these cells typical of the dataset, or outliers in terms of firing rate? A quantification of the general activity over the population of cells is needed. Moreover, a plot of firing rate versus spike width would provide valuable insight into the distribution of putative pyramidal and interneurons.

It is indeed typical to see peak firing rates of 20-60 Hz in rat mPFC neurons around behaviorally-relevant events (see, for example, Figure 3B in (Tang et al., 2023) for a navigation task, and Figure 3B in (Murakami et al., 2017) for a decision-making task). The average firing rates for those neurons are much lower (see Author response image 1). Unfortunately, unlike for hippocampal neurons, spike width is not considered a robust criterion for cleanly separating cortical pyramidal cells and interneurons. Nevertheless, the preponderance of relatively low-firing (on average) cells in our dataset, and the wide-spread nature of both the global and the local modulation at the core of this manuscript suggests that pyramidal cells play a prominent role in the described encoding.

Author response image 1

Boorman, E.D., Behrens, T.E., Woolrich, M.W., and Rushworth, M.F. (2009). How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron 62, 733-743.Hayden, B.Y., and Platt, M.L. (2010). Neurons in anterior cingulate cortex multiplex information about reward and action. J Neurosci 30, 3339-3346.

Kennerley, S.W., Dahmubed, A.F., Lara, A.H., and Wallis, J.D. (2009). Neurons in the frontal lobe encode the value of multiple decision variables. J Cogn Neurosci 21, 1162-1178.

Kolling, N., Behrens, T.E., Mars, R.B., and Rushworth, M.F. (2012). Neural mechanisms of foraging. Science 336, 95-98.

Luk, C.-H., and Wallis, J.D. (2013). Choice coding in frontal cortex during stimulus-guided or action-guided decision-making. Journal of Neuroscience 33, 1864-1871.

Murakami, M., Shteingart, H., Loewenstein, Y., and Mainen, Z.F. (2017). Distinct sources of deterministic and stochastic components of action timing decisions in rodent frontal cortex. Neuron 94, 908-919. e907.

Shidara, M., and Richmond, B.J. (2002). Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296, 1709-1711.

Tang, W., Shin, J.D., and Jadhav, S.P. (2023). Geometric transformation of cognitive maps for generalization across hippocampal-prefrontal circuits. Cell reports 42.

Tervo, D.G., Proskurin, M., Manakov, M., Kabra, M., Vollmer, A., Branson, K., and Karpova, A.Y. (2014). Behavioral variability through stochastic choice and its gating by anterior cingulate cortex. Cell 159, 21-32.

https://doi.org/10.7554/eLife.84897.sa2

Article and author information

Author details

  1. Mikhail Proskurin

    1. Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, United States
    2. Department of Neuroscience, Johns Hopkins University Medical School, Baltimore, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2548-9722
  2. Maxim Manakov

    1. Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, United States
    2. Department of Neuroscience, Johns Hopkins University Medical School, Baltimore, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Alla Karpova

    Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, United States
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Investigation, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    alla@janelia.hhmi.org
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5869-6336

Funding

Howard Hughes Medical Institute

  • Alla Karpova

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Elena Kuleshova for help with surgeries and histology. We thank Shaul Druckmann, Gowan Tervo, Michael Brainard and Vivek Jayaraman for advice and comments on the manuscript. Funding: This work was supported by the Howard Hughes Medical Institute.

Ethics

All animal experiments were conducted according to National Institutes of Health guidelines for animal research and were approved by the Institutional Animal Care and Use Committee at HHMI's Janelia Research Campus.

Senior and Reviewing Editor

  1. Timothy E Behrens, University of Oxford, United Kingdom

Version history

  1. Received: November 14, 2022
  2. Preprint posted: November 17, 2022 (view preprint)
  3. Accepted: November 20, 2023
  4. Accepted Manuscript published: November 22, 2023 (version 1)
  5. Version of Record published: December 7, 2023 (version 2)

Copyright

© 2023, Proskurin et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 564
    Page views
  • 91
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Mikhail Proskurin
  2. Maxim Manakov
  3. Alla Karpova
(2023)
ACC neural ensemble dynamics are structured by strategy prevalence
eLife 12:e84897.
https://doi.org/10.7554/eLife.84897

Share this article

https://doi.org/10.7554/eLife.84897

Further reading

    1. Neuroscience
    Eyal Y Kimchi, Anthony Burgos-Robles ... Kay M Tye
    Research Article

    Basal forebrain cholinergic neurons modulate how organisms process and respond to environmental stimuli through impacts on arousal, attention, and memory. It is unknown, however, whether basal forebrain cholinergic neurons are directly involved in conditioned behavior, independent of secondary roles in the processing of external stimuli. Using fluorescent imaging, we found that cholinergic neurons are active during behavioral responding for a reward – even prior to reward delivery and in the absence of discrete stimuli. Photostimulation of basal forebrain cholinergic neurons, or their terminals in the basolateral amygdala (BLA), selectively promoted conditioned responding (licking), but not unconditioned behavior nor innate motor outputs. In vivo electrophysiological recordings during cholinergic photostimulation revealed reward-contingency-dependent suppression of BLA neural activity, but not prefrontal cortex. Finally, ex vivo experiments demonstrated that photostimulation of cholinergic terminals suppressed BLA projection neuron activity via monosynaptic muscarinic receptor signaling, while also facilitating firing in BLA GABAergic interneurons. Taken together, we show that the neural and behavioral effects of basal forebrain cholinergic activation are modulated by reward contingency in a target-specific manner.

    1. Neuroscience
    Olgerta Asko, Alejandro Omar Blenkmann ... Anne-Kristin Solbakk
    Research Article Updated

    Orbitofrontal cortex (OFC) is classically linked to inhibitory control, emotion regulation, and reward processing. Recent perspectives propose that the OFC also generates predictions about perceptual events, actions, and their outcomes. We tested the role of the OFC in detecting violations of prediction at two levels of abstraction (i.e., hierarchical predictive processing) by studying the event-related potentials (ERPs) of patients with focal OFC lesions (n = 12) and healthy controls (n = 14) while they detected deviant sequences of tones in a local–global paradigm. The structural regularities of the tones were controlled at two hierarchical levels by rules defined at a local (i.e., between tones within sequences) and at a global (i.e., between sequences) level. In OFC patients, ERPs elicited by standard tones were unaffected at both local and global levels compared to controls. However, patients showed an attenuated mismatch negativity (MMN) and P3a to local prediction violation, as well as a diminished MMN followed by a delayed P3a to the combined local and global level prediction violation. The subsequent P3b component to conditions involving violations of prediction at the level of global rules was preserved in the OFC group. Comparable effects were absent in patients with lesions restricted to the lateral PFC, which lends a degree of anatomical specificity to the altered predictive processing resulting from OFC lesion. Overall, the altered magnitudes and time courses of MMN/P3a responses after lesions to the OFC indicate that the neural correlates of detection of auditory regularity violation are impacted at two hierarchical levels of rule abstraction.