Control of adaptive action selection by secondary motor cortex during flexible visual categorization

  1. Tian-Yi Wang
  2. Jing Liu
  3. Haishan Yao  Is a corresponding author
  1. Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, China
  2. University of Chinese Academy of Sciences, China
  3. Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, China

Abstract

Adaptive action selection during stimulus categorization is an important feature of flexible behavior. To examine neural mechanism underlying this process, we trained mice to categorize the spatial frequencies of visual stimuli according to a boundary that changed between blocks of trials in a session. Using a model with a dynamic decision criterion, we found that sensory history was important for adaptive action selection after the switch of boundary. Bilateral inactivation of the secondary motor cortex (M2) impaired adaptive action selection by reducing the behavioral influence of sensory history. Electrophysiological recordings showed that M2 neurons carried more information about upcoming choice and previous sensory stimuli when sensorimotor association was being remapped than when it was stable. Thus, M2 causally contributes to flexible action selection during stimulus categorization, with the representations of upcoming choice and sensory history regulated by the demand to remap stimulus-action association.

Introduction

Flexible action selection allows the animal to adapt to changes in the relations between stimulus, response and outcomes (Gold and Stocker, 2017; Ragozzino, 2007; Wise and Murray, 2000). Specifically, during flexible stimulus categorization, changes in the categorical boundary requires a remapping between particular sensory stimuli and behavioral responses. For instance, a line segment can be classified as short or long, depending on the criterion length (Grinband et al., 2006). Rodents as well as primates demonstrate the ability to flexibly adjust stimulus-action association after the switch of categorical boundary (Ferrera et al., 2009; Grinband et al., 2006; Jaramillo et al., 2014; Mendez et al., 2011). In flexible sound categorization task, the performance of rodents could be accounted for by a distance-to-boundary model, in which the animal compared the current stimulus with an internal categorical boundary (Jaramillo et al., 2014). Previous works showed that perceptual decisions are influenced not only by sensory input but also trial history, which includes history of past stimuli and choice outcome (Akrami et al., 2018; Busse et al., 2011; Hwang et al., 2017; Lak et al., 2020; Thompson et al., 2016). During flexible stimulus categorization task, it is unclear how different aspects of trial history influence the internal categorical boundary and the action selection after boundary switch. Applying computational models incorporating history factors could reveal the strategies that underlie the flexible response (Churchland and Kiani, 2016).

Neural representation of stimulus category has been found in lateral intraparietal cortex (Freedman and Assad, 2006), prefrontal cortex (Freedman et al., 2001) and medial premotor cortex (Romo et al., 1997). Neural correlates of flexible stimulus categorization are found in monkey pre-supplementary motor cortex (Mendoza et al., 2018) and the frontal eye field (Ferrera et al., 2009). Neurons in monkey prefrontal and anterior cingulate cortex also showed dynamic task selectivity in task switching that requires flexible adjustment of behavior (Johnston et al., 2007). However, the causal role of frontal and premotor regions in the performance of flexible stimulus categorization remains to be investigated.

The secondary motor cortex (M2) in rodents is a homolog of the premotor cortex, supplementary motor area, or frontal eye field in monkey (Barthas and Kwan, 2017; Reep et al., 1987; Svoboda and Li, 2018; Zingg et al., 2014). M2 plays a critical role in the flexible control of voluntary action (Barthas and Kwan, 2017; Ebbesen et al., 2018). Removal or inactivation of M2 caused deficits in cue-guided actions (Barthas and Kwan, 2017; Erlich et al., 2015; Passingham et al., 1988), and an increase in errors during behavioral switch from nonconditional responding to cue-guided actions (Siniscalchi et al., 2016). M2 neurons exhibited choice-related activity, which is earlier than that in other frontal cortical regions (Sul et al., 2011). Neural signals in M2 also conveyed information about past choice and outcome (Hattori et al., 2019; Jiang et al., 2019; Scott et al., 2017; Siniscalchi et al., 2019; Sul et al., 2011; Yuan et al., 2015). The findings in these previous studies raise the possibility that M2 may be important for adaptive action selection during flexible stimulus categorization. Furthermore, it would be interesting to elucidate whether the choice- and history-related signals in M2 are modulated by the task demand to remap stimulus-action association.

Here, we combined behavioral modeling, chemogenetic manipulation and extracellular recording to explore the mechanism underlying adaptive action selection. Freely-moving mice categorized the spatial frequencies (SFs) of gratings as low or high according to a boundary that shifted between a lower and a higher frequency across blocks of trials. For the reversing stimulus whose frequency lied between the two boundaries, the mice should reverse their choice as the boundary switched. Using a behavioral model in which the decision criterion (DC) was updated according to the history of action outcome and sensory stimuli, we found that sensory history was important for correct adjustment of DC during the switching period. Bilateral inactivation of M2 impaired the performance of reversing response by reducing the mice’s ability to update DC according to sensory history, without affecting the performance in trials when the stimulus-action association was stable. Furthermore, M2 neurons encoded the upcoming choice and stimulus in previous trial more accurately during the switching than the stable period. Together, the results suggest that during stimulus categorization, M2 is involved in flexible adjustment of stimulus-action association in the face of boundary change.

Results

Flexible visual categorization task for freely-moving mice

For the visual system, the perceived size (or SF) of a visual object changes with viewing distance, and categorization of a visual stimulus as low or high SF may be adaptive to the change of viewing distance. In our study, we trained freely-moving mice to categorize visual stimuli as low or high SFs using a two-alternative forced-choice paradigm. Similar to the auditory flexible categorization task described in a previous study (Jaramillo et al., 2014), we changed the categorization boundary in different blocks of trials. The mouse poked its nose in the central port of a behavioral chamber to initiate a trial (Long et al., 2015). A static grating was presented on the screen, and the mouse was required to maintain its head in the central port until a “Go’ signal, a light-emitting diode, was turned on to indicate that the mouse could choose one of the two side ports (Figure 1A). The rewarded side port was on the left (or right) if the grating SF was lower (or higher) than a categorical boundary, which was not cued but learned by the mouse through trial and error. The visual stimuli consisted of 7 SFs that were logarithmically equally-spaced (0.03, 0.044, 0.065, 0.095, 0.139, 0.204 and 0.3 cycles/o). Within a session, the categorical boundary shifted between a lower and a higher SF several times without a warning cue (Figure 1B). For the low-boundary block, the optimal decision boundary was located between 0.065 and 0.095 cycles/o, and gratings at 0.03 and 0.095 cycles/o (SF1 and SF4 in Figure 1B) were frequently presented in 90% of trials. For the high-boundary block, the optimal decision boundary was located between 0.095 and 0.139 cycles/o, and gratings at 0.095 and 0.3 cycles/o (SF4 and SF7 in Figure 1B) were frequently presented in 90% of trials. As a result, the stimulus statistics differed between the low-boundary and high-boundary blocks. In each session, each block consisted of at least 60 trials, and the categorical boundary switched once the performance for the reversing stimulus (0.095 cycles/o, SF4 in Figure 1B) reached 70% over the last 10 reversing-stimulus trials. After the boundary changed, the mouse was required to reverse its choice for the reversing stimulus (Figure 1C and D). Across all sessions from 10 mice (11 sessions/mouse), the number of trials in each block was 61.9 ± 0.33 (mean ± SEM) and the number of switches in a session was 6.91 ± 0.29 (mean ± SEM) (Figure 1E and F).

Flexible visual categorization task.

(A) Schematic of the task and timing of behavioral events. (B) Visual stimuli and categorical boundaries. Each session consisted of alternating low-boundary and high-boundary blocks, the order of which was counterbalanced across sessions. For low-boundary blocks, SF1 and SF4 were presented for 90% of trials; for high-boundary blocks, SF4 and SF7 were presented for 90% of trials. (C and D) Performance of one example mouse for SF1 (0.03 cycles/o), SF4 (0.095 cycles/o) and SF7 (0.3 cycles/o) in two different sessions. For each trial, the probability of right choice was computed over the previous 15 trials. H, high-boundary block; L, low-boundary block. (E) Number of trials per block for each mouse. (F) Number of switches per session for each mouse. (G) Psychometric curves from low-boundary and high-boundary blocks for an example mouse (data from 11 sessions). (H) Comparison of internal decision boundary between blocks for a population of mice. p=2 × 10−3, n = 10 mice, Wilcoxon signed rank test. (I) Performance for the reversing stimulus before and after the boundary switch for an example mouse. The curve is an exponential fit of the data after the boundary switch. (J) Distribution of the number of trials to reverse choice, which was the number of trials for the correct rate of reversing stimulus to reach 50% after the boundary switch. n = 10 mice. Error bar,± SEM. See Figure 1—source data 12 for complete statistics.

Figure 1—source data 1

Number of trials per block, number of switches per session and number of trials to reverse choice for a population of mice.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig1-data1-v2.txt
Figure 1—source data 2

Psychometric curves and decision boundary for low-boundary and high-boundary blocks.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig1-data2-v2.txt

Although only the action for the reversing stimulus was required to change, we found that the performance for both the reversing and the non-reversing stimuli was influenced by the change of categorical boundary (Figure 1G), consistent with that observed in auditory flexible categorization task (Jaramillo et al., 2014). The subjective categorical boundary estimated from the psychometric curve was significantly lower in the low-boundary than in the high-boundary block (p=2 × 10−3, n = 10 mice, Wilcoxon signed rank test, Figure 1H), suggesting that the mice adapted their DC to the boundary contingency. To examine how fast the mice adapted to the boundary change, we used all blocks in all sessions to compute correct rate for the reversing stimulus in each trial. The performance in the first 60 trials after boundary switch was fitted with an exponential function (Figure 1I). The number of trials to reverse choice, which was the number of trials needed to cross the 50% correct rate of the fitted curve, was 6.41 ± 0.42 (mean ± SEM, n = 10 mice, Figure 1J). In the following analysis, we defined the last 15 trials before boundary switch as the stable period, and the first 15 trials after boundary switch as the switching period.

Behavioral strategies revealed by computational modeling

After the boundary switch, the animals may update their stimulus-action association according to the outcome of response to the reversing stimulus and/or the appearance of non-reversing stimulus frequently presented in that block (Jaramillo and Zador, 2014). To examine the behavioral strategies of mice in trials before and after the boundary switch, we designed a logistic regression model with a dynamic DC to fit the mouse’s choice on each trial. In this model, the probability p of choosing right in current trial t is a logistic function of a decision variable z, which is the difference between the stimulus S weighted by γ1 (γ1 > 0) and the subjective DC of the animal (Figure 2A, see Materials and methods). The SF of stimulus S is normalized between −1 and 1, in which negative and positive values indicate SFs lower and higher than the SF of the reversing stimulus, respectively. The DC is updated on a trial-by-trial basis according to choice outcome and sensory history (Figure 2A). Following a trial of reversing stimulus, the DC is updated according to the outcome of previous choice, with the effect of rewarded and unrewarded choices modeled by the parameters α1 and α2, respectively. Considering that DC may exhibit a drift tendency independent of trial history, a parameter β (β <1) was included to describe how fast DC drifts toward zero (0 < β <1) or away from zero (β <0). For α1 (α2) with a positive value, rewarded (unrewarded) choices would introduce a bias towards (away from) the previously chosen side, mimicking a win-stay (lose-shift) strategy. Similarly, α1 (α2) with a negative value represents a win-shift (lose-stay) strategy. Following a trial of non-reversing stimulus, the DC is updated according to the sensory history, with the previous stimulus weighted by the parameter γ2. For γ2 with a positive value, a previously experienced low-frequency (high-frequency) stimulus would cause a shift of DC towards low (high) frequency, leading to a right (left) choice bias.

Behavioral strategies revealed by computational modelling.

(A) Schematic of the dynamic DC model, in which the DC was updated on a trial-by-trial basis according to choice outcome history (α1 and α2) or sensory history (γ2). (B) Model parameters for the stable period. (C) Model parameters for the switching period. (D) Comparison of parameters important for the update of DC between the stable and the switching periods. *p<0.05, **p<0.01, n = 10 mice, Wilcoxon signed rank test. See Figure 2—source data 1 for complete statistics.

Figure 2—source data 1

Parameters of the dynamic-DC model fitted to choices in the stable and switching periods separately.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig2-data1-v2.txt

We modeled the choices of individual trials for each mouse by combining data across multiple sessions. We first used the model to fit choices in the stable and switching periods separately so that we could compare the strategies between the two periods (Figure 2B and C). For both the stable and switching periods, γ1 was significantly larger than zero (p=2 × 10−3, n = 10, Wilcoxon signed rank test), indicating that current sensory information was used to form decision and guide action selection in both periods. We next examined the values of α1, α2, γ2 and β, which are relevant to the update of DC (Figure 2B−D). The reward history parameter α1 was not significantly different from zero in either the stable (p=0.23) or the switching period (p=0.49, Wilcoxon signed rank test). The non-reward history parameter α2 was significantly larger than zero in the stable period (p=9.8 × 10−3, Wilcoxon signed rank test), but not significantly different from zero in the switching period (p=0.7, Wilcoxon signed rank test), suggesting that the mice adopted a lose-shift strategy only in the stable period. The sensory history parameter γ2 was significantly larger than zero in both the stable and the switching periods (p=0.037 and p=2 × 10−3, Wilcoxon signed rank test), suggesting that the mice could optimally use the sensory history of non-reversing stimuli to guide choice in both periods. The DC-drift parameter β was significantly larger than zero in the switching period (p=2 × 10−3) but not in the stable period (p=0.56, Wilcoxon signed rank test). This demonstrates that the DC tended to drift towards zero in the switching period, reflecting the tendency of mice to abandon pervious decision criterion.

To evaluate whether each model parameter is necessary, we further built reduced model variants (model 2: α1 = 0; model 3: α2 = 0; model 4: α1 = 0 and α2 = 0; model 5: β = 0; model 6: γ1 = 1; model 7: γ2 = 0) (Figure 3) to compare with the full model (model 1). For the stable period, only model 6 (γ1 = 1) showed a significantly lower cross-validated (CV) likelihood as compared to that of the full model (p=2 × 10−3, Wilcoxon signed rank test, Figure 3A), indicating that γ1 (weight of current stimulus) was most important for predicting choices during steady state of stimulus-action association. For the switching period, the CV likelihood for model 5 (β = 0), model 6 (γ1 = 1) and model 7 (γ2 = 0) was significantly reduced as compared to that of the full model (p=5.9 × 10−3, 2 × 10−3 and 5.9 × 10−3, Wilcoxon signed rank test, Figure 3B), demonstrating that β, γ1 and γ2 were important parameters for explaining choices in the switching phase. For both the stable and switching periods, the CV likelihood for model 4 (α1 = 0 and α2 = 0) was significantly higher than that of the full model (p=0.037 and 0.02, Wilcoxon signed rank test, Figure 3A and B), suggesting that choice outcome was not useful to guide the mice’s choice.

Figure 3 with 7 supplements see all
Comparison of cross-validated likelihood among different models.

(A) CV likelihood of different variants of the dynamic-DC model for the stable period. For each of the reduced model variants (model 2 – model 7), we compared the CV likelihood with that of the full model. Dashed line, the CV likelihood averaged across 10 mice for the full model. (B) CV likelihood of different variants of the dynamic-DC model for the switching period. Similar to that described in (A). (C) Parameters of the RL model for the stable period. (D) Parameters of the RL model for the switching period. (E) Comparison of CV likelihood between the RL model and model 4 of the dynamic-DC model for the stable period. p=0.32, Wilcoxon signed rank test. (F) Comparison of CV likelihood between the RL model and model 4 of the dynamic-DC model for the switching period. p=2 × 10−3, Wilcoxon signed rank test. (G) Best model frequency (number of mice a model being the best model) for each model in the stable period. (H) Best model frequency for each model in the switching period. (I) Parameters of model four in the stable period. (J) Parameters of model four in the switching period. (K) Correlation between sensory history parameter γ2 (model 4) in the switching period and the number of trials to reverse choice. *p<0.05, **p<0.01, n = 10 mice, Wilcoxon signed rank test. Error bar,± SEM. For more details, see Figure 3—figure supplements 17. See Figure 3—source data 13 for complete statistics.

Figure 3—source data 1

Cross-validated likelihood for each model.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig3-data1-v2.txt
Figure 3—source data 2

Parameter values for model 4.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig3-data2-v2.txt
Figure 3—source data 3

Parameter values for the RL model.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig3-data3-v2.txt
Figure 3—source data 4

Comparison of cross-validated likelihood among different models that were fitted to data in all trials.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig3-data4-v2.txt
Figure 3—source data 5

Visualization of the simulated behavior for model 1 and model 4.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig3-data5-v2.txt
Figure 3—source data 6

Parameter recovery analysis.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig3-data6-v2.txt
Figure 3—source data 7

Right choice bias for reversing stimulus under two different conditions of sensory history.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig3-data7-v2.txt

An alternative hypothesis is that the mouse might be updating the value functions of left and right choices separately, rather than comparing the current stimulus to one single decision boundary. We thus built a reinforcement learning (RL) model without an internal boundary. In the RL model, the expected value of left (right) choice is the learned value of left (right) choice weighted by current sensory stimulus, with the parameters α and γ representing learning rate and stimulus weight, respectively. (Materials and methods, Figure 3C and D). We found that the CV likelihood of the RL model was significantly lower than that of the best dynamic-DC model (model 4: α1 = 0 and α2 = 0) in the switching period (p=2 × 10−3, Wilcoxon signed rank test, Figure 3F), indicating that the behavioral strategy in the switching period was better accounted for by the model with a dynamic decision criterion.

For each mouse, we further determined which of the different types of model (the RL model and 7 variants of the dynamic-DC model) was the best model that yielded the largest CV likelihood. We found that model 4 of the dynamic-DC model (α1 = 0 and α2 = 0) was the best model for the greatest number of mice in both the stable and the switching periods (Figure 3G and H).

The above modeling analysis was based on fitting the models separately to choices in the stable and switching periods. We next used each of these models to fit choices in all trials (Figure 3—figure supplement 1). We found that the CV likelihood of model 1 or model four was both significantly higher than that of the RL model (Figure 3—figure supplement 1). Combining the result of model comparison and the histogram of the number of mice best fit by each model, we found that model 4 of the dynamic-DC model (α1 = 0 and α2 = 0) remained the winning model (Figure 3—figure supplement 1). We also visualized the simulated behavior for model 1 and model 4, using the parameters of the model fitted to data in all trials. As shown by Figure 3—figure supplement 2, both models could capture the dynamic change in performance for the reversing stimulus after the boundary switch. We found that the number of trials to reverse choice estimated from model one simulation tended to be larger than that estimated from the actual performance, whereas that estimated from model four simulation matched well with the actual data (Figure 3—figure supplement 2). We also visualized the simulated psychometric curves. As shown by Figure 3—figure supplement 2 for an example mouse, both model 1 and model four could capture the performance in the high-boundary block, but with an exaggerated right choice probability for SF at 0.044 or 0.065 cycles/o in the low-boundary block. For simulations using model parameters for the 10 mice in Figure 1, the simulated decision boundary in the high-boundary block was similar to the actual data (p=0.16 and 0.08 for model 1 and model 4, respectively, Wilcoxon signed rank test), whereas that in the low-boundary block was significantly lower than the actual data (p=0.002 for both model 1 and model 4, Wilcoxon signed rank test, Figure 3—figure supplement 2). This may be due to the fact that, in the model using a logistic function, the performance for a specific SF depended on the its distance from the boundary SF, so that the performance for low SFs was lower than that for high SFs in the low-boundary block and the performance for low SFs was higher than that for high SFs in the high-boundary block (Figure 3—figure supplement 3). For the mice’s actual behavior, however, the performance difference between low and high SFs was more evident in the high-boundary block but less so in the low-boundary block (Figure 3—figure supplement 3). Thus, although the dynamic-DC model was not perfect to predict the psychometric curve, it could recapitulate the adaptive action selection for the reversing stimulus.

We further performed a parameter recovery analysis to check whether our fitting procedure can accurately estimate parameters. We simulated model 1 and model 4, respectively, using the parameters of the model fitted to data in all trials for the 10 mice in Figure 1. For both models, the recovered parameters matched the original parameters (Figure 3—figure supplement 4). We also simulated model 1 or model four using the parameters of the model fitted separately to choices in the stable and switching periods. This was performed using two sets of parameters: one was the model parameters for the 10 mice in Figure 1, another was a wider range of parameters (see Materials and methods). For both cases, there was good agreement between the recovered and original parameters (Figure 3—figure supplement 5 and Figure 3—figure supplement 6).

The above analysis established that both model 1 and model 4 of the dynamic-DC model could recover original parameters from simulated data, and model four was the winning model to capture the adaptive behavior. As shown by Figure 3I and J, the sensory history parameter γ2 in model four was significantly larger than zero in both the stable and switching periods (p=0.014 and 2 × 10−3, Wilcoxon signed rank test). When we computed a right choice bias for reversing stimulus under two different conditions of sensory history (previous low-frequency and previous high-frequency) (Figure 3—figure supplement 7), we found that, during the switching period, the right choice bias was significantly larger when the previous stimulus was at a lower than at a higher SF (p=2 × 10−3, Wilcoxon signed rank test), confirming the effect of sensory history as revealed by positive γ2. In addition, larger value of γ2 in the switching period (model 4) was associated with fewer number of trials to reverse choice (Pearson’s r = −0.71, p=0.02, Figure 3K). Taken together, the above results suggest that the sensory history of non-reversing stimuli, rather than the choice-outcome history of reversing stimulus, was important for correct adaptation of decision boundary in this task.

M2 inactivation impaired the reversing response after boundary switch

We next examined whether M2 activity is necessary for flexible action selection in the categorization task. We bilaterally injected AAV2/9-hSyn-hM4D-mCitrine in M2 to express the Gi-protein-coupled receptor hM4D (Figure 4A). Electrophysiological recordings validated that intraperitoneal injection of clozapine-N-oxide (CNO) could significantly reduce the firing rates of M2 neurons (p<0.001, n = 24 and 14 cells from two mice, respectively, Wilcoxon signed rank test, Figure 4B). To examine the effect of M2 inactivation, we compared the behavioral performance of mice between sessions with CNO and with saline injection (Figure 4—figure supplement 1), which were conducted in different days,~40 min before the behavioral task. To analyze the performance for the reversing stimulus before and after the boundary switch, we only included those sessions that contained ≥2 blocks. We found that chemogenetic inactivation of M2 significantly decreased the correct rate and response latency for the reversing stimulus during the switching period (p=0.022 and 0.019, two-way repeated-measures ANOVA followed by Sidak’s multiple comparisons test), while did not significantly affect those during the stable period (p=0.87 and 0.47, n = 10, two-way repeated-measures ANOVA followed by Sidak’s multiple comparisons test, Figure 4C and D). After the boundary switch, the number of trials to reverse choice for the reversing stimulus was greater in the CNO sessions (7.48 ± 1.19, mean ± SEM) than in the saline sessions (5.45 ± 0.40, mean ± SEM, p=0.049, n = 10, Wilcoxon signed rank test). For control mice with EGFP expressed in M2, data from saline and CNO sessions did not differ in the correct rate for reversing stimulus during either the stable or the switching period (Figure 4—figure supplement 2). On the other hand, M2 inactivation did not affect the correct rate for the non-reversing stimuli, the number of trials within a block, the number of switches per session or the distance between the two subjective categorical boundaries estimated from trials after the switching period (Figure 4—figure supplement 3), indicating that M2 inactivation did not cause general deficit in motor control or visual perception. Thus, bilateral M2 inactivation impaired adaptive action selection for the reversing stimulus specifically during the switching period when the mice needed to remap the stimulus-action association.

Figure 4 with 7 supplements see all
Bilateral inactivation of M2 impairs adaptive action selection in the switching period.

(A) Representative fluorescence images showing the expression of AAV2/9-hSyn-hM4D-mCitrine in M2. (B) Firing rates of M2 neurons before and after CNO injection. Upper row, firing rates for 24 neurons recorded from one mouse (bar plot, mean firing rates 0–20 min before and 40–120 min after CNO injection). Lower row, firing rates for 14 neurons recorded from another mouse (bar plot, mean firing rates 0–20 min before and 40–100 min after CNO injection). The two vertical dashed lines indicate the time points corresponding to the start and end of a behavioral session, respectively. (C) Comparison of performance in the stable period between saline and CNO sessions. Left, correct rate for the reversing stimulus; right, response latency for the reversing stimulus. (D) Comparison of performance in the switching period between saline and CNO sessions, similar to that described in (C). (E) Parameters of the dynamic-DC model in the stable period. Left, model 1; right, model 4 (α1 = 0 and α2 = 0). (F) Parameters of the dynamic-DC model in the switching period. Left, model 1; right, model 4. (G) Comparison of DC curves in the stable period between saline and CNO sessions. Left, model 1; right, model 4. (H) Comparison of DC curves in the switching period between saline and CNO sessions. Left, model 1; right, model 4. *p<0.05, ***p<0.001. Wilcoxon signed rank test for (B). Two-way repeated measures ANOVA followed by Sidak’s multiple comparisons test for (C and D). Two-way repeated measures ANOVA for (G and H). Data in (C−H) were from n = 10 mice. Shading and error bar,± SEM. For more details of the effect of chemogenetic manipulation, see Figure 4—figure supplements 17. See Figure 4—source data 15 for complete statistics.

Figure 4—source data 1

Effect of M2 inactivation on behavioral performance and model parameters.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig4-data1-v2.txt
Figure 4—source data 2

Performance and model parameters for control mice with EGFP expressed in M2.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig4-data2-v2.txt
Figure 4—source data 3

Parameter recovery analysis for saline and CNO sessions of M2 inactivation.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig4-data3-v2.txt
Figure 4—source data 4

Effect of mPFC inactivation on behavioral performance and model parameters.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig4-data4-v2.txt
Figure 4—source data 5

Effect of OFC inactivation on behavioral performance and model parameters.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig4-data5-v2.txt

We next used the dynamic-DC model to fit the behavioral data of hM4D-expressing mice in saline sessions and CNO sessions, respectively, and compared the model parameters and DC values inferred from these parameters between saline and CNO sessions for both model 1 (full model) and model 4 (α1 = 0 and α2 = 0) (Figure 4E−H). We found that M2 inactivation did not cause significant changes in the model parameters (p>0.05, Wilcoxon signed rank test) or in the DC curves (p>0.05, two-way repeated measures ANOVA) during the stable period (Figure 4E and G), and did not significantly affect the value of γ1 (weight of current stimulus) in either the stable or the switching period (p>0.05, Wilcoxon signed rank test). By contrast, M2 inactivation caused a significant decrease in γ2 in both model 1 and model four during the switching period (p=0.02 and 0.014, Wilcoxon signed rank test, Figure 4F), suggesting an impairment in utilizing sensory history of the non-reversing stimuli to update DC. Analysis of choice bias during the switching period showed that M2 inactivation tended to reduce the choice bias difference between different conditions of sensory history (Figure 4—figure supplement 4), consistent with the decrease in γ2. M2 inactivation also slowed down the change of DC during the switching period (Figure 4H). By simulating behavior with model 1 or model 4, we found that the recovered parameter γ2 in the switching period showed a significant decrease in CNO sessions as compared to saline sessions (Figure 4—figure supplement 5), indicating that parameter change can be recovered from the model. For control mice with EGFP expressed in M2, CNO injection did not cause a significant change in any of the model parameters for the stable or the switching period (Figure 4—figure supplement 2). Thus, a major effect of bilateral M2 inactivation was an impairment in updating internal decision criterion according to the sensory history of non-reversing stimuli during the switching period.

Prefrontal cortical regions, including the medial prefrontal cortex (mPFC) and the orbitofrontal cortex (OFC), have been implicated in flexible behavior (Birrell and Brown, 2000; Dias et al., 1996; Duan et al., 2015; Floresco et al., 2006; Izquierdo et al., 2017; Ragozzino, 2007; Ragozzino et al., 1999; Stefani et al., 2003). We thus also examined the role of prefrontal cortex in the flexible visual categorization task. We found that bilateral inactivation of mPFC or OFC did not cause significant change in the performance for the reversing stimulus, the distance between internal categorical boundaries or number of switches per session (p>0.05, Wilcoxon signed rank test, mPFC: n = 13 mice, OFC: n = 9 mice, Figure 4—figure supplement 6 and Figure 4—figure supplement 7), although OFC inactivation tended to increase the number of trials per block (Figure 4—figure supplement 7). Inactivation of mPFC or OFC did not affect the parameters in the dynamic-DC model (Figure 4—figure supplement 6 and Figure 4—figure supplement 7). Thus, the impaired performance for the reversing stimulus during the switching period was specific to M2 inactivation.

Information about choice and sensory history encoded by M2 neurons

We next examined the responses of M2 neurons in behaving mice (see Materials and methods, Figure 5—figure supplement 1). Mice used for electrophysiological recordings were also found to adopt behavioral strategies (Figure 5—figure supplement 2) similar to those mice in Figure 2 and Figure 3. For each M2 neuron, we analyzed the firing rates around the holding period, during which the mice held their heads in the central port viewing the stimulus and presumably forming a decision.

As M2 inactivation-induced impairment in performance for the reversing stimulus during the switching period was largely due to a reduced influence of sensory history (Figure 4), we examined the choice- and sensory history-related responses of M2 neurons during the reversing-stimulus trials (Figure 5A−D). To compare neural responses between different choices (left vs right) or between different sensory history (previous low-frequency vs previous high-frequency), we used the receiver operating characteristic (ROC) analysis (Green and Swets, 1966) to measure auROC (area under the ROC curve) (Figure 5E−J), which was used to quantify ROC preference. As the right choice bias of the mice was significantly larger when the previous stimulus was at a lower than at a higher frequency (Figure 5—figure supplement 3), to understand the relationship between preference for choice and preference for previous stimulus, we used the responses of left and right choices to compute choice preference. We computed the ROC preference as 2×(auROC – 0.5), which ranged from −1 to 1 (Feierstein et al., 2006). For left-right choice preference, negative (positive) values indicate that the firing rates in current trial were higher when the choice is left (right). To compute the preference for previous stimulus, we divided the responses of current reversing-stimulus trials into two groups, in which the previous frequency was higher and lower than the frequency of the reversing stimulus, respectively. For previous-stimulus preference, negative (positive) values indicate that the firing rates in current trial were higher when the previous non-reversing stimulus was at a low (high) SF.

Figure 5 with 7 supplements see all
Correlation between choice preference and previous-stimulus preference for M2 neurons.

(A) Spike raters of an example M2 neuron in 20 reversing-stimulus trials, grouped by left or right choice. The two vertical dashed lines indicate the time of stimulus onset and ‘Go’ signal onset, respectively. (B) Spike raters of another example M2 neuron in 20 reversing-stimulus trials, grouped by previous stimulus (previous low or previous high SF). (C) PSTHs of the M2 neuron in (A) in correct reversing-stimulus trials, grouped by left or right choice. (D) PSTHs of the M2 neuron in (B) in correct reversing-stimulus trials, grouped by previous stimulus (previous low or previous high SF). (E) Frequency histogram of left-choice and right-choice responses from the M2 neuron in (A) in correct trials. (F) Frequency histogram of previous-low-frequency and previous-high-frequency responses from the M2 neuron in (B) in correct trials. (G) Frequency histogram of left-choice and right-choice responses from the M2 neuron in (A) in wrong trials. (H) Frequency histogram of previous-low-frequency and previous-high-frequency responses from the M2 neuron in (B) in wrong trials. (I) ROC curves for the two pairs of response distributions illustrated in (E) and (G), respectively. (J) ROC curves for the two pairs of response distributions illustrated in (F) and (H), respectively. (K) Left-right choice preference in correct and wrong trials was significantly correlated. n = 113 neurons. Upper, stable period; lower, switching period. (L) Left-right choice preference and previous-stimulus preference were significantly correlated in correct trials. n = 72 neurons. Upper, stable period; lower, switching period. (M) Left-right choice preference and previous-stimulus preference were not significantly correlated in wrong trials. n = 72 neurons. Upper, stable period; lower, switching period. r, Pearson’s correlation coefficient. Shading in (C and D), ± SEM. See Figure 5—figure supplements 17 for more details of the recording and data analysis. See Figure 5—source data 1 for complete statistics.

Figure 5—source data 1

Choice preference and previous-stimulus preference for M2 neurons.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig5-data1-v2.txt
Figure 5—source data 2

Model parameters and right choice bias for the mice used for electrophysiologcial recordings.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig5-data2-v2.txt
Figure 5—source data 3

Choice preference and previous-stimulus preference for M2 neurons recorded from the left and right hemispheres, respectively.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig5-data3-v2.txt
Figure 5—source data 4

Sliding window analysis of the correlation between choice preference and previous-stimulus preference of M2 neurons in correct trials.

https://cdn.elifesciences.org/articles/54474/elife-54474-fig5-data4-v2.txt

Because neither the choice preference nor the previous-stimulus preference was different between neurons recorded from the left and right hemispheres (Figure 5—figure supplement 4), we combined neurons recorded in both hemispheres. We found that the choice preferences in correct and wrong trials were positively correlated (stable period: Pearson’s r = 0.36, p=1.0 × 10−4; switching period: Pearson’s r = 0.24, p=9.3 × 10−3; n = 113, Figure 5K), indicating that M2 activity during stimulus-viewing period reflects the orienting choice of the mice, consistent with the previous report (Erlich et al., 2011). We next examined the relationship between choice preference and previous-stimulus preference. For correct trials, the preference for left-right choice in current trial showed a significant negative correlation with the preference for stimulus in the last trial (stable period: Pearson’s r = −0.87, p=3.2 × 10−23; switching period: Pearson’s r = −0.92, p=9.5 × 10−31; n = 72, Figure 5L), and such effect could be observed for M2 neurons recorded in both hemispheres (Figure 5—figure supplement 5). Using a sliding window of 50 ms, we found that the negative correlation was observed in all time bins, including those time bins between central port entry and stimulus onset. For the sliding window analysis of correlation between the preference for left-right choice in current trial and the preference for stimulus in the trial before last (Figure 5—figure supplement 6), the statistical significance of the correlation diminished in most time bins. Thus, the result suggests that the choice preference of M2 neurons is strongly influenced by stimulus history, and the preference for choice in current trial is coupled to the preference for stimulus in the last but not earlier trial. For wrong trials, however, the correlation between choice preference and previous-stimulus preference was not significant (stable period: Pearson’s r = −0.15, p=0.2; switching period: Pearson’s r = 0.08, p=0.5; n = 72, Figure 5M). Furthermore, the preference for ipsilateral-contralateral choice in current trial did not exhibit a significant correlation with the preference for stimulus in last trial, for either correct or wrong trials (Figure 5—figure supplement 7). Thus, the left-right choice preference of M2 neurons was intimately related to the preference for previous stimulus in correct trials, consistent with the finding that sensory history of previous non-reversing stimuli was important for correct action selection in the task.

We further trained linear support vector machine (SVM) classifier to decode upcoming choice or sensory history from M2 responses to the reversing stimulus (see Materials and methods). The analysis was applied separately to correct (or wrong) trials in the stable and switching periods. For each type of choice (sensory history) under each condition, we randomly sampled 10 trials from each neuron to form a resampled dataset, trained the classifier on 75% of the resampled data, and tested it on the remaining 25% of the resampled data. This cross-validation procedure was repeated 100 times and an averaged prediction accuracy was computed. The resampling procedure to compute prediction accuracy was repeated for 1500 times. As shown by Figure 6A, the accuracy of predicting choice was significantly higher in the switching than in the stable period for correct trials (p<1.0 × 10−4, Wilcoxon rank sum test). For both the stable and switching periods, the accuracy of predicting choice was higher for correct trials than for wrong trials (p<1.0 × 10−4, Wilcoxon rank sum test, Figure 6A). When we examined the accuracy for classifying the previous stimulus as a lower or a higher frequency, we also found that the accuracy was significantly higher in the switching than in the stable period for correct trials (p<1.0 × 10−4, Wilcoxon rank sum test), and the accuracy for the switching period was higher in correct than in wrong trials (p<1.0 × 10−4, Wilcoxon rank sum test, Figure 6B). Thus, the results demonstrate that representations of upcoming choice and sensory history in M2 were stronger during the switching period, when mice were faced with a demand to switch stimulus-action association, than during the stable period. This corresponds with the causal contribution of M2 in controlling action selection when the stimulus-action association was remapped.

M2 neurons encode upcoming choice and previous stimulus more accurately in the switching than in the stable period.

(A) Prediction accuracy for upcoming choice in response to the reversing stimulus. n = 1500 resampled datasets, 113 neurons. (B) Prediction accuracy for stimulus in previous trial. n = 1500 resampled datasets, 72 neurons. (C–H) Temporal dynamics of decoding accuracy. Time 0 is the time of ‘Go’ signal onset, and −700 ms is the time of central port entry. (C−E) Prediction accuracy for upcoming choice in response to the reversing stimulus. (F−H) Prediction accuracy for stimulus in previous trial. For (A and B), ****, p<1.0 × 10−4, Wilcoxon rank sum test. For (C–H), * marks the time bins with p<0.05, two-way ANOVA followed by Sidak’s multiple comparisons test. See Figure 6—source data 1 for complete statistics.

To quantify the dynamics of prediction accuracy, we next trained and tested SVM classifier on each 50 ms time bin with a 25 ms moving window (Figure 6C−H). For decoding upcoming choice, we found that the prediction accuracy tended to increase over time during the holding period (Figure 6C−E). For decoding upcoming choice in correct trials, higher accuracy in the switching than in the stable period was mostly observed within a ± 200 ms window around stimulus onset (Figure 6C). For decoding previous stimulus, the prediction accuracy was highest around the time of central port entry (Figure 6F−H), consistent with the representation of trial history. For decoding previous stimulus in correct trials, higher accuracy in the switching than in the stable period was evident both around the time of central port entry and after the onset of ‘Go’ signal (Figure 6F). Although the prediction accuracy for previous stimulus was decreased after stimulus onset, the decrease was more evident in wrong trials, resulting in higher accuracy in correct than in wrong trials (Figure 6G and H). Thus, the representations of upcoming choice and previous stimulus by M2 neurons exhibited different dynamics, and temporal integration of both information may be important for the action selection of mice.

Discussion

Using behavioral modeling to analyze choices in a flexible visual categorization task, we found that mice depended on sensory history to correctly change stimulus-action association when categorical boundary switched between blocks of trials. Chemogenetic manipulation showed that M2 activity was specifically required for correct choices during remapping of stimulus-action association, but not when the sensorimotor association was stable. We further found that the representations of upcoming choice and sensory history by M2 neurons were stronger when the sensorimotor association needed to be flexibly adjusted. Thus, the choice- and sensory history-related signals in M2 are adaptive to task requirement, which may account for the important role of M2 in adaptive choice behavior during flexible stimulus categorization.

Understanding behavioral strategy using computational modeling is important for studying the neural basis of behavior (Churchland and Kiani, 2016; Krakauer et al., 2017). In our task of visual categorization, the mice were required to switch choice to the reversing stimulus several times within a single session. We thus used a probabilistic choice model in which the decision variable was determined by the comparison between current sensory input and a dynamic decision criterion. Previous studies showed that trial history impaired performance in typical perceptual decision tasks (Abrahamyan et al., 2016; Jiang et al., 2019). For the task in current study, however, because the probability of the non-reversing stimulus at the lowest (or highest) frequency differed between blocks, sensory history could be used as a prominent cue indicating a change in categorization contingency. Our behavioral model and choice bias analysis revealed that sensory history significantly contributed to adaptive choices during the switching period. Although a change in the outcome of response to the reversing stimulus may also be used for guiding adaptive choices, the model analysis showed that action outcome history had a negligible effect on updating DC in the switching period, particularly for those well-trained mice that had been trained for 3–4 months before data collection (Figure 2). Unlike the mice in Figure 2, those mice in Figure 4 were trained with a shorter period of time (~40 d) before we measured the effect of chemogenetic manipulation and those in Figure 5—figure supplement 2 were measured with a narrower range of SFs. The behaviors of mice in Figure 4 (saline sessions) and Figure 5—figure supplement 2 exhibited a win-shift strategy (α1 < 0) in the switching period, suggesting that the effect of choice-outcome history may depend on length of training or stimulus difficulty. We also showed that, for the flexible visual categorization task in this study, the model with a dynamic DC could better account for the adaptive choices than a RL model in which sensory stimuli were used in the computation of expected values of left and right choices. Future work may further compare the dynamic DC model with other forms of RL model. In our study, the change in stimulus statistics associated with the boundary switch likely promoted a strategy that involved sensory history during adaptive action selection. On the other hand, it is possible that the influence of outcome may occur at a slower timescale, which may not be well captured by our model based on the trial-by-trial analysis. In the future, applying computational models that consider the influence of sensory history and outcome at different timescales may allow us to better understand the process of adaptive action selection.

Categorical decision involves action selection that recruits the premotor cortex (Ashby and Maddox, 2005). In primates, the pre-supplementary motor area is involved in the switch of actions (Isoda and Hikosaka, 2007; Rushworth et al., 2002). Rodent M2, which is a putative homolog of primate premotor cortex, supplementary motor regions and frontal eye field, has been shown to play a critical role in action selection guided by sensory stimuli, memory or prior actions (Barthas and Kwan, 2017; Erlich et al., 2011; Gilad et al., 2018; Goard et al., 2016; Guo et al., 2014; Itokazu et al., 2018; Li et al., 2015; Makino et al., 2017; Murakami et al., 2017; Ostlund et al., 2009; Siniscalchi et al., 2016). M2 neurons have been found to encode information about trial history as well as upcoming choice (Hattori et al., 2019; Jiang et al., 2019; Scott et al., 2017; Siniscalchi et al., 2019; Sul et al., 2011; Yuan et al., 2015). In our study, we found that the left-right choice preference of M2 neurons was well correlated with the preference for previous stimulus in correct trials, suggesting that M2 neurons could integrate sensory history to generate choice signal. Previous studies also showed that M2 activity is task dependent. For instance, M2 responses are modulated by effector, task context and action outcome (Erlich et al., 2011; Kargo et al., 2007; Murakami et al., 2014; Siniscalchi et al., 2019). A recent study found that M2 activity patterns differ between conditions of cue-guided and nonconditional actions, and bilateral muscimol inactivation of M2 slowed the transition from action-guided to sound-guided response, without affecting the high performance of sound-guided trials after the transition (Siniscalchi et al., 2016). Consistent with this study, we found that bilateral inactivation of M2 slowed the adaptive adjustment of action selection following the boundary switch, without affecting choice behavior in steady state. The effect of M2 inactivation on adaptive activation selection is also consistent with our finding that the representations of choice and history information by M2 neurons were stronger during the switching period.

Mouse M2 occupies a wide range of cortical area, spanning from AP 2.5 mm to AP −1 mm along the rostral-caudal axis (Barthas and Kwan, 2017; Franklin and Paxinos, 2007). In our study, the effect of bilateral inactivation of M2 (AP 2.0 mm, ML 0.75 mm) on adaptive action selection was modest, which may be partly due to the fact that the virus infected only a limited part of M2. Alternatively, it is possible that other parts of M2 or other regions also contribute to the adaptive action selection. We found that inactivation of M2 but not mPFC or OFC affected the performance for reversing stimulus in the switching period, suggesting that sensory-history dependent adaptive action selection may specifically require M2 activity. Of course, it should be noted that our chemogenetic manipulation did not produce a complete inactivation of neuronal activity and thus the negative effect of mPFC/OFC inactivation may need to be further confirmed using experiments with more complete inactivation. Another limitation of our chemogenetic manipulation is the lack of precise temporal control of M2 inactivation. Future studies using optogenetics may allow to examine the temporal specificity of the effect of M2 inactivation. It is also of interest to further examine the circuit mechanism for adaptive encoding of choice signal in M2.

Materials and methods

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional information
Strain, strain background Mus musculusC67BL/6Slac Laboratory AnimalN/A
Chemical compound, drugclozapine-N-OxideSigma-Aldrich CorporationC0832-5MG
Software, algorithmOffline SorterPlexonhttps://plexon.com
Software, algorithmPrismGraphPadhttps://www.graphpad.com/scientificsoftware/prism/;
RRID:SCR_00279
Software, algorithmMATLABMathworkshttps://www. mathworks.com/;RRID:SCR_001622

Animals

Animal use procedures were approved by the Animal Care and Use Committee at the Institute of Neuroscience, Chinese Academy of Sciences (approval number NA-013–2019), and were in accordance with the guidelines of the Animal Advisory Committee at the Shanghai Institutes for Biological Sciences. Data were from a total of 68 male adult C57BL/6 mice (3–12 months old), in which 11 were used for electrophysiological recordings during behavioral task. The mice were housed in groups of four to six per cage (mice for chronic extracellular recordings were housed individually). The mice were deprived of water in the home cage and obtained water reward during daily behavior sessions. On days that mice did not perform the task, restricted water access (~1 ml) was provided each day. The mice were kept on a 12 hr light/12 hr dark cycle (lights on at 7:00 am). All sessions were performed in the light phase.

Behavior and visual stimuli

Request a detailed protocol

All training and experiments took place in a custom-design behavioral chamber with three ports (Long et al., 2015), controlled by custom Matlab (Mathworks, Natick, USA) scripts and digital I/O devices (PCI-6503, National Instruments Corporation, Austin, USA). Visual stimuli were presented on a 17’ LCD monitor (Dell E1714S, max luminance 100 cd/m2) placed 10 cm away from the front wall of the chamber. Gamma correction was used to calibrate the monitor to establish a linear relationship between program-controlled pixel intensity and actual luminance. The monitor subtended 126o × 113o of visual space, assuming that the mouse’s head was at the central port facing the stimulus. A yellow light-emitting diode (LED) was placed at the central port to provide the ‘Go’ signal.

The mice initiated a trial by nose-poking into a hole in the central port. A full-field visual stimulus was presented on the monitor after 200 ms of trial initiation, and the mice were required to continue staying in the central port for at least 500 ms until the yellow LED (‘Go’ signal) lighted up. The mice were required to compare the current stimulus to an internal decision boundary and reported their choice by going to one of the two side ports. The rewarded side port was on the left (right) if the stimulus SF was lower (higher) than a categorical boundary. For each trial, the visual stimulus was presented until the mouse chose one of the two side ports, and the ‘Go’ signal was turned off at the same time as stimulus offset. Gray screen of mean luminance was used between trials.

The mice were trained to perform flexible visual categorization task using the following steps. In step 1 (2 d), the central port was blocked, and mice could collect water reward by nose-poking to the left and right ports alternatively. In step 2 (2 d), one of the side ports was blocked alternatively between daily sessions. The mice learned to initiate a trial by poking its nose into the central port, then go to the unblocked side port to receive water reward. In step 3 (3–10 d), the task was similar to that in step two except that the mouse was required to hold its head in the central port. The required holding time was at 50 ms initially and increased by 75 ms each time the mouse succeeded holding in 75% of the previous 20 trials. The training continued until the successful holding time reached 1000 ms in 75% of trials in a session. In step 4 (4–10 d), all ports were open, and the mouse initiated a trial by nose-poking into the central port. After the mouse held its head for 200 ms, a full-field static grating (vertical orientation) at a low SF (0.03 cycles/o) or a high SF (0.3 cycles/o) was presented until the mouse entered one of the two side ports. If the mouse chose the left (right) side port for a low (high) frequency, it was rewarded by 3–4 μl water. Thus, in this step the mouse learned to discriminate between a low and a high SF. In step 5 (2–15 d), each daily session consisted of two types of blocks. Stimuli in the low-boundary block consisted of gratings at 0.03 and 0.095 cycles/o, and those in the high-boundary block were at 0.095 and 0.3 cycles/o. Each block consisted of at least 60 trials. For both types of blocks, a low (high) frequency stimulus indicated that choosing the left (right) port was considered as correct and would be rewarded. After the performance for the reversing stimulus (0.095 cycles/o) reached 70% or 80% in the last 10 reversing-stimulus trials, the boundary switched and the mouse was required to reverse its choice for the reversing stimulus. Thus, in this step the mouse learned to switch stimulus-action association for the reversing stimulus at 0.095 cycles/o. In step 6 (3–7 d for most mice), the task was similar to that in step five except that visual stimuli consisted of 7 SFs (0.03, 0.044, 0.065, 0.095, 0.139, 0.204, and 0.3 cycles/o) in both types of blocks. For the low-boundary block, gratings at 0.03 and 0.095 cycles/o were presented for 90% of trials, gratings at the other frequencies were presented for 10% of trials, and the boundary frequency was between 0.065 and 0.095 cycles/o. For the high-boundary block, gratings at 0.095 and 0.3 cycles/o were presented for 90% of trials, and the boundary frequency was between 0.095 and 0.139 cycles/o. Thus, the stimulus statistics differed between blocks. After step 6, the mouse was used for data collection of behavioral experiment. For behavioral data of mice in Figure 1Figure 4, the visual stimuli were the same as those in step 6.

For behavioral experiments in control mice with EGFP expressed in M2 (Figure 4—figure supplement 2), in mice with chemogenetic inactivation of mPFC or OFC (Figure 4—figure supplement 6 and Figure 4—figure supplement 7) and in mice used for electrophysiological recordings (Figure 5 and Figure 6), the SFs of the gratings were at 0.06, 0.073, 0.09, 0.11, 0.134, 0.164, and 0.2 cycles/o. In the low-boundary block, gratings at 0.06 and 0.11 cycles/o were presented for 90% of trials, and the boundary frequency was between 0.09 and 0.11 cycles/o. In the high-boundary block, gratings at 0.11 and 0.2 cycles/o were presented for 90% of trials, and the boundary frequency was between 0.11 and 0.134 cycles/o.

For mice not used for electrophysiological recordings, the behavior of each mouse was measured for 9.41 ± 1.42 (mean ± SD) sessions. For mice used for electrophysiological recordings, the behavior of each mouse was measured for 23 ± 5.2 (mean ± SD) sessions.

Surgery

Request a detailed protocol

For chemogenetic inactivation experiments, mice were injected with virus before behavioral training. Mice were anesthetized with a cocktail of midazolam (5 mg/kg), medetomidine (0.5 mg/kg) and fentanyl (0.05 mg/kg) injected intraperitoneally before surgery, then head-fixed in a stereotaxic apparatus. Their body temperature was maintained at 37°C. Eye drops and eye ointment were applied to the eyes to prevent from drying. The incision site was treated with lidocaine jelly. Two craniotomies (∼ 0.8 mm diameter) were performed bilaterally above M2 (AP 2.0 mm, ML ±0.75 mm), mPFC (AP 2.3 mm, ML ±0.3 mm) or OFC (AP 2.6 mm, ML ±1.0 mm). For virus injection, the bones were not cracked, only thinned enough to allow easy penetration by a borosilicate glass pipette with a tip diameter of ~40–50 μm. A total of 300 nl AAV2/9-hSyn-hM4D-mCitrine (or AAV2/8-CamKIIa-EGFP-3Flag-WPRE-SV40pA for control mice, which were randomly assigned among cagemates) was injected at a depth of 900 μm for M2 (at 1350 μm for mPFC and at 2100 μm for OFC) using a syringe pump (Pump 11 Elite, Harvard Apparatus, Holliston, USA). After the virus injection, the pipette was left in place for 10–15 min before retraction. The mice were given carprofen (5 mg/kg) subcutaneously after the surgery.

Chronic electrodes were implanted after the mice were fully trained. At least 2 days before the surgery, the mice were granted free access to water. The mice were anesthetized and prepared using the same procedures described above. A grounding screw was implanted at a site posterior to Lamda. A craniotomy was made above the left or right M2 (AP 2.0 mm, ML 0.75 mm), and a portion of the dura was removed. The cortical surface was applied with artificial cerebrospinal fluid. A 16-sites silicon probe (A1 ×16–3 mm-50-177-CM16LP, NeuroNexus Technologies, Ann Arbor, USA) was lowered into the brain to a depth of 900 μm with a micromanipulator (Siskiyou Corporation, Grants Pass, USA). The craniotomy was covered with a thin layer of silicone elastomer Kiwi-Cast, followed by another layer of silicone elastomer Kiwi-Sil (World Precision Instruments, Sarasota, USA). A headplate was placed on the skull to facilitate later handling of the animal (attaching the headstage to the probe connector before each recording session), and cyanoacrylate tissue adhesive (Vetbond, 3M, Saint Paul, USA) was applied to the skull surface to provide stronger fixation. After the tissue adhesive cured, several layers of dental acrylic cement were applied to fix the whole implant in place. A ferrule of optical fiber was partially embedded in the dental cement (above the grounding screw) to provide guidance to a rotary joint (Doric Lenses, Quebec, Canada). Mice were allowed to recover from the surgery for at least 7 days before water-restriction and behavioral training. For mice implanted with electrodes, they were returned to step 4 of the training to get used to moving around with the implants and headstage cable before the recordings started.

Chemogenetic inactivation

Request a detailed protocol

We used the system of designer receptor exclusively activated by designer drug (DREADD) (Stachniak et al., 2014) to inactivate M2, mPFC or OFC. At ~40 min before each daily session, the mice were briefly anesthetized with isoflurane (4%) and received an intraperitoneal injection of clozapine-N-Oxide (CNO) (1.5 mg/kg) or saline. The concentration of CNO solution was adjusted such that each mouse received ~100 μl of solution from the injection. The mice were allowed to recover individually in a cage before behavioral testing. We verified in anesthetized mice that the effect of DREADD inactivation on M2 neuronal spiking lasted for 2 hr. Because some SFs were presented for 10% of trials and the number of switches per session was limited (<7 for most mice), we measured 8.7 ± 0.95 (mean ± SD) sessions for saline injection (or CNO injection), so that we could obtain enough number of trials for psychometric curve analysis and modelling analysis.

Electrophysiological recording

Request a detailed protocol

Neural signals were amplified and filtered using the Cerebus 32-channel system (Blackrock Microsystems, Salt Lake City, USA).The spiking signals were sampled at 20 kHz. To detect the waveforms of spikes, we band-pass filtered the signals at 250 Hz − 5 k Hz. Signals above 3.5 s.d. of the average noise level (1–5 kHz) were detected as spikes. Task-related behavioral events were digitized as TTL levels and recorded by the Cerebus system.

Histology

Request a detailed protocol

The mice were deeply anesthetized with a cocktail of midazolam (7.5 mg/kg), medetomidine (0.75 mg/kg) and fentanyl (0.075 mg/kg), and perfused with paraformaldehyde (PFA, 4%). Brains were removed, fixed in 4% PFA (4°C) overnight, and then transferred to 30% sucrose in phosphate-buffered saline until equilibration. Brains were sectioned at 50–70 μm. Fluorescence images were taken with a virtual slide microscope (VS120, Olympus, Shinjuku, Japan). The atlas schematics in Figure 4, Figure 4—figure supplement 2, Figure 4—figure supplement 6, Figure 4—figure supplement 7 and Figure 5—figure supplement 1 are modified from Franklin and Paxinos, 2007.

To verify the location of implanted electrode, we made electrolytic lesions by applying 20 μA of current for 20 s through each channel of the electrode. Brain sections were stained with cresyl violet.

Analysis of behavior

Request a detailed protocol

To estimate psychometric curves for both low-boundary and high-boundary blocks, all trials (except those within 30 trials after boundary switch) in all sessions were used to compute the correct rate for each SF. The curve of correct rate was fitted with the psychometric function using psignifit (Wichmann and Hill, 2001):

Ψ(x)=γ+(1γλ)11+exp(g(x)),g(x)=xαβ,

in which γ and λ represent the lower and higher asymptotes, respectively, α is the boundary, and β is the slope of the curve.

To estimate the number of trials to reverse choice, we averaged the performance for reversing stimulus after boundary switch across all blocks in all sessions. We fitted an exponential function to the data of correct rate for the first 60 reversing-stimulus trials after boundary switch (Jaramillo and Zador, 2014):

f(n)=A(1en/τ)+I,

in which n indicates trial number after a switch, τ is the speed of change, A represents the asymptotic performance, and 1 - I is the performance for the reversing stimulus trials just before the switch. The number of trials to reverse choice was computed as the number of trials needed to cross the 50% correct rate of the fitted curve.

The 15 trials before and the 15 trials after the switch of categorical boundary were defined as stable and switching periods, respectively. The response latency was defined as the duration between stimulus onset (200 ms after central port entry) and nose-poking the side port. We computed right choice bias to estimate the influence of a previous non-reversing stimulus on the choice for the reversing stimulus. Right choice bias following a low-frequency (high-frequency) stimulus was computed as the probability of right choice subtracted by that averaged over all reversing-stimulus trials in the low-boundary (high-boundary) block. To compute the difference in right choice bias, the right choice bias following a low-frequency stimulus was subtracted by that following a high-frequency stimulus.

Behavioral models

Request a detailed protocol

To estimate the fluctuation of decision criterion on a trial-by-trial basis, and to understand the contribution of trial history to the adjustment of decision criterion, we designed a logistic regression model with a dynamic DC and fitted the model to behavioral data combined across multiple sessions. The choice in trial t was modeled with a logistic function: p(t)=11+ez(t), where p is the probability of choosing right and z is a decision variable. For each trial t, z is calculated as: z(t)=γ1×S(t) - DC(t), where S(t) is current stimulus, γ1 is the weight for current stimulus, and DC(t) represents the internal decision criterion in current trial. The SF of stimulus S was normalized between −1 and 1, in which negative and positive values indicate lower and higher SFs, respectively, and 0 indicates the reversing stimulus. The parameter γ1 was constrained within [0, +∞). We fitted the model to choices in all trials, as well as to choices in the stable period (last 15 trials before switch) and the switching period (first 15 trials after switch) separately. For the latter case, because the categorical boundary switched once the performance for the reversing stimulus reached 70% (mice in Figure 1, Figure 2 and Figure 3) or 80% (mice in other Figures) over the last 10 reversing-stimulus trials, the initial value of DC for the first trial in each block of stable period was set to a value corresponding to a correct rate of 70%, and the initial value of DC for the first trial in each block of switching period was set to a value corresponding to a correct rate of 30% or 20%.

Intuitively, the mice could adjust their internal DC according to the experienced action-outcome association and the previously experienced visual stimuli (Akrami et al., 2018; Jaramillo and Zador, 2014). These possibilities motivated us to design rules to update DC in a trial-by-trial manner according to specific trial history.

After a trial of reversing stimulus, DC was updated according to the following rules:

  • DC(t)=(1β)×DC(t1) +α1 for a rewarded left choice,

  • DC(t)=(1β)×DC(t1) -α1 for a rewarded right choice,

  • DC(t)=(1β)×DC(t1) -α2 for an unrewarded left choice,

  • DC(t)=(1β)×DC(t1) +α2 for an unrewarded right choice,

where α1 and α2 are parameters modeling the effect of rewarded and unrewarded choices, respectively, and β (constrained within (-∞, 1]) models the tendency of DC to drift towards 0 (0 < β <1) or away from 0 (β <0). For a positive (negative) α1, a rewarded choice introduces a bias towards (away from) the previously chosen side, indicating a win-stay (win-shift) strategy. For a positive (negative) α2, an unrewarded choice causes a bias away from (towards) the previously chosen side, indicating a lose-shift (lose-stay) strategy.

After a trial of non-reversing stimulus, DC was updated according to the following:

DC(t)=(1β)×DC(t1) +γ2×S(t-1),

where γ2 is the weight for stimulus in previous trial. Here we did not implement parameters for reward/non-reward history because the performance for non-reversing stimuli was usually much higher than that for the reversing stimulus. For γ2 with a positive value, a previously experienced low-frequency stimulus (i.e., SF lower than that of the reversing stimulus) introduces a right choice bias, while a previously experienced high-frequency stimulus (i.e., SF higher than that of the reversing stimulus) introduces a left choice bias. Note that when the previous trial is the reversing stimulus, S(t-1) becomes 0 and DC is updated according to: DC(t)=(1β)×DC(t1.

To evaluate whether each model parameter is necessary, we built reduced model variants, including model 2 (α1 = 0), model 3 (α2 = 0), model 4 (α1 = 0 and α2 = 0), model 5 (β = 0), model 6 (γ1 = 1) and model 7 (γ2 = 0), in addition to the full model (model 1).

Given the possibility that the mice could solve the task by updating separate value functions of different choices rather than comparing the stimulus with a decision criterion, we also designed a reinforcement learning (RL) model (Sutton and Barto, 1998). In this model, we used sensory stimulus in the computation of expected values of left and right choices (Ql and Qr) (Lak et al., 2019), and Ql and Qr are mapped into the mice’s choice through a softmax function:

Pr(t)=eQr(t)eQl(t)+eQr(t),

where Pr represents the probability of right choice. For each trial t, Ql and Qr are calculated as:

Ql(t)=γ×(1S(t))×Vl(t),
Qr(t)=γ×S(t)×Vr(t),

where Vl and Vr are the value functions for left and right choices, respectively, and γ is the weight for current stimulus S(t). The SF of the stimulus was normalized between 0 and 1, in which 0 and 1 correspond to the lowest and highest SFs, respectively. The model updates the value functions according to the following rules:

  • Vl(t)=Vl(t1)+α×(1-Ql(t-1)) for a rewarded left choice,

  • Vr(t)=Vr(t1)+α×(1-Qr(t1)) for a rewarded right choice,

  • Vl(t)=Vl(t1)+α×(0-Ql(t-1)) for an unrewarded left choice,

  • Vr(t)=Vr(t1)+α×(0-Qr(t-1)) for an unrewarded right choice,

where α (constrained within (0, 1)) is the learning rate, and (1-Ql), (1-Qr), (0-Ql) or (0-Qr) represents reward prediction error.

For each of the models (different variants of the dynamic-DC model and the RL model), we fitted the model to behavioral data in all trials, as well as separately to choices in the stable and switching periods. The model was fitted with the method of maximum likelihood estimation, using sequential quadratic programming algorithm (Nocedal and Wright, 2006) to search for sets of parameters to minimize the average negative log-likelihood of the data (‘fmincon’ function in Matlab 2017b, with ‘sqp’ option). We applied 200 runs of 5-fold cross-validation, with balanced number of low-boundary and high-boundary blocks in each run. The median values of the parameter distribution were reported.

We simulated behavior using the parameters of model 1 or model four fitted to the mice’s actual behavior. Each simulation contained 50 low-boundary blocks and 50 high-boundary blocks, with the first block randomly chosen as low- or high-boundary block. As in the actual experiment, the boundary switched once the block consisted of at least 60 trials and the performance for the reversing stimulus reached 70% in the last 10 reversing-stimulus trials. For model simulation of choices in all trials, the DC in the first trial of a next block inherited the value in the last trial of the previous block. For model simulation of only those trials in the stable and switching periods, the DC in the first trial of the stable (switching) period was set to a value corresponding to 70% (30%) correct rate for the reversing stimulus.

To perform parameter recovery analysis, we fitted the model to simulated data by applying 200 runs of 5-fold cross-validation, and used the median values of the parameter distribution as the recovered parameters. Parameter recovery analysis was performed for the parameters of the model fitted to data in all trials, and also for those of the model fitted to choices in the stable and switching periods separately. In the latter case, we used two sets of original parameters: one was the model parameters for the 10 mice in Figure 1, another was a wider range of parameters, the values of which were described in the following. For model 1, the ranges of original parameters for α1, α2, β, γ1 and γ2 were set as [−1.5 0.8], [−0.7 1.5], [−0.2 0.9], [1 13] and [−0.4 1.5], respectively. For model 4, the ranges of original parameters for β, γ1 and γ2 were set as [−0.2 0.9], [1 13] and [−0.4 1.5], respectively. When we performed parameter recovery for one parameter (e.g., α1 in model 1) within a wide range, the other parameters (e.g., α2, β, γ1 and γ2 in model 1) were each fixed at a value corresponding to the median parameters of all mice: α1, α2, β, γ1 and γ2 in model one were fixed at −0.13, 0.13, 0.22, 3.64 and 0.42, respectively, and β, γ1 and γ2 in model four were fixed at 0.21, 3.61 and 0.36, respectively.

Analysis of neuronal responses

Request a detailed protocol

Spike sorting was performed offline using commercial software (Offline Sorter V3; Plexon Inc, Dallas, USA). The sorting involved an automated clustering process in 3-D principle component space of waveform features (Shoham et al., 2003) and a final stage of manual verification. Spike clusters were considered as single units if the interspike interval was >1 ms and the p value for multivariate ANOVA test on clusters was less than 0.05.

Except the analysis in Figure 6C−H and Figure 5—figure supplement 6, we analyzed the spikes within 0–700 ms after trial initiation (including 200 ms before stimulus presentation and 500 ms of stimulus presentation), during which the mouse held its head in the central port. For the units recorded in each session, we binned the spikes at 1 ms and computed correlation coefficients (CCs) between all pair-wise combinations of units. We also computed CC between the spike waveforms of each pair of units. Those pairs with spike CC >0.1 and spike waveform CC >0.95 were considered to be duplicate units that appeared in multiple channels, and the unit with a lower firing rate in the pair was discarded (Zhu et al., 2015). The same units recorded over two consecutive sessions in the same channel were tracked according to a previously reported method, which was based on quantification of the mean waveform shape, the autocorrelation, the average firing rate, and the cross-correlation with other simultaneously recorded neurons (Fraser and Schwartz, 2012). Only those units that were tracked for ≥2 sessions were included in the analysis.

To quantify the preference of each neuron for upcoming choice (sensory history), we applied ROC analysis (Green and Swets, 1966) to the distributions of spike counts on trials of different choice (different sensory history). The area under the ROC curve (auROC) indicates the accuracy with which an ideal observer can correctly classify whether a given response is recorded in one of two conditions. The ROC preference was defined as 2×(auROC – 0.5), which ranged from −1 to 1 (Feierstein et al., 2006). For the sliding window analysis of the correlation between left-right choice preference in current trial and stimulus preference in last trial (or in the trial before last), we used the spikes within 0–900 ms after trial initiation (including 200 ms before stimulus presentation, 500 ms of stimulus presentation and 200 ms after the ‘Go’ signal). The correlation coefficient was computed for each 50 ms time bin with a 25 ms moving window.

Linear decoders based on support vector machine (Boser et al., 1992) (SVM, fitcsvm function in Matlab 2017b) were trained to decode choice or sensory history from pseudopopulation of M2 neurons in response to the reversing stimulus. The decoding analysis was applied separately to four conditions: correct trials in the stable and switching periods, and wrong trials in the stable and switching periods. Spike counts of each neuron were z-score standardized. For each type of upcoming choice or sensory history, we randomly sampled 10 trials without replacement from each neuron to form a resampled dataset (Raposo et al., 2014). The resampling procedure was repeated for 1500 times. For each resampled dataset, we trained the SVM-based linear classifier on 75% of the resampled data, and tested it on the remaining 25% of the resampled data. This 4-fold cross-validation procedure was repeated 100 times and an averaged prediction accuracy was computed. Each condition yielded 1500 data points of prediction accuracy. To quantify the dynamics of prediction accuracy, we used the spikes within 0–900 ms after trial initiation (including 200 ms before stimulus presentation, 500 ms of stimulus presentation and 200 ms after the ‘Go’ signal). The SVM classifier was applied to each 50 ms time bin with a 25 ms moving window, and the resampling procedure was repeated for 100 times.

To be included in the ROC or SVM analysis, we required that a neuron should have at least 10 trials for each type of choice (sensory history) under each condition and firing rate >0.5 spikes/s in at least one of the conditions.

Statistical analysis

Request a detailed protocol

No statistical methods were used to pre-determine sample sizes. Sample sizes are consistent with similar studies in the field. The statistical analysis was performed using MATLAB or GraphPad Prism (GraphPad Software). Wilcoxon two-sided signed rank test, Wilcoxon two-sided rank sum test, two-way ANOVA or two-way repeated measures ANOVA (followed by Sidak’s multiple comparisons test) was used to determine the significance of the effect. Correlation values were computed using Pearson’s correlation. Unless otherwise stated, data were reported as mean ± SEM and p values < 0.05 were considered statistically significant.

Data availability

All data generated or analyzed during this study are available on Dryad (https://doi.org/10.5061/dryad.1c59zw3rs). Source data files have been provided for Figures 1–6.

The following data sets were generated
    1. Wang T-Y
    2. Liu J
    3. Yao H
    (2020) Dryad Digital Repository
    Data from: Control of adaptive action selection by secondary motor cortex during flexible visual categorization.
    https://doi.org/10.5061/dryad.1c59zw3rs

References

  1. Conference
    1. Boser BE
    2. Guyon IM
    3. Vapnik VN
    (1992) A training algorithm for optimal margin classifiers
    Proceedings of the Fifth Annual Workshop on Computational Learning Theory.
    https://doi.org/10.1145/130385.130401
  2. Book
    1. Franklin KBJ
    2. Paxinos G
    (2007)
    The Mouse Brain in Stereotaxic Coordinates
    Amsterdam: Elsevier Academic Press.
  3. Book
    1. Green DM
    2. Swets JA
    (1966)
    Signal Detection Theory and Psychophysics
    New York: Wiley.
  4. Book
    1. Sutton RS
    2. Barto AG
    (1998)
    Reinforcement Learning: An Introduction
    Cambridge: MIT Press.

Decision letter

  1. Naoshige Uchida
    Reviewing Editor; Harvard University, United States
  2. Timothy E Behrens
    Senior Editor; University of Oxford, United Kingdom
  3. Alex C Kwan
    Reviewer; Yale University, United States
  4. Carl CH Petersen
    Reviewer; École Polytechnique Fédérale de Lausanne (EPFL), Switzerland

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This work demonstrates that chemogenetic inactivation in the secondary motor cortex (M2) impairs the performance of mice in perceptual decision making when a decision boundary is dynamically shifted but not when the task is stable, thus revealing a specific role of M2 in adaptive decision-making. This work will be of great interest to those who study the neural mechanisms of decision making.

Decision letter after peer review:

Thank you for submitting your article "Control of adaptive action selection by secondary motor cortex during flexible visual categorization" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Alex C Kwan (Reviewer #2); Carl CH Petersen (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

In this study, Wang and colleagues examined the role of secondary motor cortex (M2) in flexible decision making. Mice were trained to categorize visual stimuli based on spatial frequency, designed after the task developed by Jaramillo, Zador and their colleagues. The position of decision boundary along the spatial frequency was shifted across blocks of trials. Mice biased their choice depending on the position of decision boundary. The behavior was consistent with a model in which animals' choices were guided by the stimulus history but not by the reward history. The authors then demonstrate that chemogenetic inactivation of M2 impaired flexible changes in the animals' decision boundary although it did not impair the performance without boundary shifts.

All the reviewers thought that this study addresses an important question, uses a good task, and provides important results. However, the reviewers thought that there are various technical and interpretational concerns that need to addressed before publication of this work in eLife.

Essential revisions:

1) The effect of chemogenetic inactivation is rather small, and the results and the data analysis do not appear to be very robust. Given that DREADD likely results in partial inactivation, it is difficult to interpret negative results for mPFC and OFC. Although the reviewers commend that these experiments were done, the results need to be interpreted more carefully, and tests using more complete inactivation (e.g. muscimol) would be preferable.

2) The authors use model-based analysis and conclude that the animal's choices are guided by stimulus history but not by reward history. Although this is a very important effort, the reviewers identified several issues that need to be addressed.

2a) The manuscript emphasizes changes in the decision boundary (task contingency) but the model analysis indicated that the animals were not reacting to reward history but stimulus history. It seems that this is mainly due to an unusual choice of stimuli (a majority of stimuli were chosen from right next to the decision boundary) used for each block, concurrently with shifting decision boundary. This unusual choice of stimuli might have masked the effect of reward history in behavior or data analysis. This task design needs to be explained more clearly in the Results section and preferably some figures representing it. Furthermore, the motivation of this task design, as opposed to shifting decision boundary without changing the stimulus statistics, needs to be explained.

2b) The validity of model-based analysis depends on whether the model was able to fit the data reasonably well in the first place. Please provide the evidence (quantification and visualization) of goodness of fit.

2c) The authors conclude that the animals' choices were not affected by reward history based on the observation that the model that depends on stimulus history fit the data better than a reinforcement learning (RL) model. The reviewers thought that it is impossible to make such a conclusion just by a comparison with a particular RL model. The authors need to explore more thoroughly what alternative RL models may fit the data well. The current RL model that the authors used computes action values for left versus right choices without considering stimuli. A simple possibility is an RL model that computes action values specifically for each stimulus (corresponding to "states" in RL).

3) The reviewers thought that the electrophysiological recording data are not thoroughly analyzed nor presented in an informative way. The reviewers make various suggestions to improve (see below). One possibility is to remove this part altogether (Reviewer 1) but we would like to see more informative presentations and insightful analyses of the electrophysiology data.

More detailed explanations of the above points from individual reviewers are included in the following. The manuscript will be re-evaluated based on your responses to these concerns and suggestions.

Reviewer #1:

The paper by Wang et al. developed a task which requires mice to indicate whether a visual stimulus was higher or lower in spatial frequency (SF) than a boundary SF value. The boundary SF was altered between two values in two different blocks, requiring mice to adjust an internal decision criterion to obtain maximum reward.

Using a logistic regression model, the paper estimates the dependence of decisions on the stimulus, and on trial history. In doing so, it demonstrates that mouse decisions after a block switch was primarily accounted for by stimulus history (which differed between the block types) rather than the experience of errors on the stimulus condition positioned between the boundary values.

The paper demonstrates that chemogenetic inactivation in M2 impairs choice behaviour during the switching period but not during the stable period. By applying the behavioural model, the paper finds that M2 inactivation during the switch period reduces the behavioural dependence on stimulus history (for non-reversing stimuli), suggesting that M2 plays a causal role in stimulus-action remapping based on stimulus history. Interestingly, the paper shows that M2 doesn't seem to play a role during stable stimulus-choice trials, and it shows that the effect on switching trials is specific to M2 and not nearby frontal regions such as mPFC or OFC. The paper also includes results from electrophysiological recording of M2 during the task.

Overall, the behavioural experiment and the inactivation results are very interesting. Nevertheless, the electrophysiological results are hard to understand, and seem to add little to the paper. The conclusion of the paper, that M2 plays a role in flexible stimulus-choice association based on stimulus history is novel. However we have several questions and concerns:

Major concerns:

Many of the conclusions hinge on the model quality. However, there is no indication anywhere that the behavioural model is actually fitting the behavioural data well. Only comparisons between different models are presented. It is necessary and useful to visualise the model fits using psychometric curves.

The stable period is conceptualised as a period when the decision criterion is stable. Yet the model shows that the DC is affected by stimulus history and lose-shift effects (Figure 2). Thus, the stable period is not so stable by these parameters. Given this, and the fact that blocks are short (60 trials), fitting the models separately on the stable and switch period might be problematic. This is particularly the case because the paper is then performing separate model comparison for the stable and the switching period trials. As such, a better approach might be to fit models on all trials, select the best model accordingly, and fit this best model separately on different sections of the data, if necessary. Or add a new parameter to the model that can indicate stable vs. switch epochs, and fit the model once using all trials.

Related to separate model fitting and in the case of inactivation data, why not fit a model to the CNO and Saline data together, and estimate a δ parameter which estimates how much the α/β/γ parameters are changed by inactivation?

The paper relies on fitting the model separately to saline and CNO sessions, to identify specific parameters that are affected by inactivation. But the model itself could be under-constrained, meaning the parameter estimates are not stable. It would be useful to simulate data with known parameter changes, and then see if it is possible to recover those parameter changes from the model based on the number of trials that were obtained.

The comparison with the RL model: it appears that the RL model performs as good as the best regression model in the stable period but not in the switching period. What was the learning rate of the fitted RL in the switching period compared to stable period? Was there any constraints on learning rate when fitting? More generally, since the paper is considering a learning situation, the comparison with the RL model seems important and should be explained further. The class of RL model tested here can be reformulated to be analogous to the regression with dynamic decision criterion (prediction error-mediated changes in Q values can be adjusted to be analogous to changes in decision criterion). As such, it is unclear how these two models are testing competing hypotheses.

The model only allows stimulus history effects after trials of non-reversing stimuli. Surely the mouse would be adjusting the DC for stimulus history even if that stimulus was the reversing stimulus. How do the result changes if considering these stimuli?

The data shows that M2 inactivation does not affect the correct rate for non-reversing stimulus. This is surprising and interesting given many of the studies the paper cites do find robust behavioural effect of M2 inactivation across stimulus conditions (Goard et al., 2016 visual detection, Guo et al., 2015 whisker detection). In these studies, the mice are presumably in a “stable” condition regarding stimulus-choice association. Why do you think there's this discrepancy? Does this relate to using chemogenetic (here) vs. trial by trial optogenetic used in those studies?

It would be necessary and informative to see example of psychometric curves or learning curves in the inactivation condition vs. control, rather than only relying on model fits.

The results from electrophysiology experiments are cryptic and hard to follow. It might be easier and more convincing to illustrate example neurons before introducing the other analyses. For instance, Figure 5—figure supplement 2 is an interesting result and should probably be the first one mentioned for the ephys analysis. Overall, the electrophysiological experiments do not seem to add to the paper, and it might be best to be removed from the paper.

Related to electrophysiological data, it is hard to understand the need to use three different analysis methods: regression, ANOVA, and ROC analysis, each doing slightly different things.

There are several instances of statistical tests not correcting for multiple comparisons. For instance, in Figure 4D, the effect of M2 inactivation on the percent correct for reversing stimuli seems to be statistically significant primarily due to data from 4 mice. Does this effect stay after correcting for multiple comparisons?

Reviewer #2:

This paper by Tian-Yi Wang and colleagues describes a series of experiments to study the role of mouse M2 in adaptive action selection. The strength of this paper is the rigor. The experiments were based on a well-designed task involving flexible stimulus categorization (that have been pioneered in rodents by Zador, Jaramillo, et al.). The authors also did a great job putting together a computational model that provides considerable insights into the mouse's behavioral strategy. This led to an intriguing behavioral conclusion that mice are doing the task by using sensory history but not reward-based learning.

In terms of the neural conclusion that M2 is involved in adaptive sensory-motor selection, there are a few other studies now suggesting that M2 is involved in driving sensory cue-guided choices following a switch in contingencies. Nevertheless, there is still substantial value here because the study is excellent and provides arguably the strongest evidence to date. There is also additional conceptual novelty in looking at region differences, comparing M2 with mPFC and OFC.

The manuscript is well-written, and very easy to follow and understand.

Overall, the study is technically sound and conceptually important.

Major comments:

– The neural activity analysis, correlating ROC selectivity value for previous stimulus preference (non-reversing stimulus trial) and current left-right choice preference (reversing stimulus trial) (Figure 5D), is taken as evidence that M2 neurons use sensory history to influence current choice. The analyses were done for a particular time window of a trial. What happens if this analysis was applied to a sliding window starting from last trial to current or even next trial? When does this sensory-choice coupling emerge and when does it end? This is different from the decoding analysis is Figure 6, because it speaks to the interaction rather than decoding of choice or stimulus alone.

– Again, because Figure 5D is important – currently this analysis was done for cases when current trial was the reversing stimulus and the prior trial was the non-reversing stimulus. What about for other trial conditions? Do we still see the correlation in the sensory and motor related neural signals? In particular, what about the case when the current trial was the reversing stimulus and the prior trial was also a reversing stimulus?

– The comparison between M2 and mPFC and OFC is important. The results were presented as Figure 4—figure supplement 5 for mPFC and figure supplement 6 for OFC. I feel that these are exciting results demonstrating regional differences. At least some parts of each should be moved to be a main figure.

Reviewer #3:

Wang, Liu and Yao study the role of M2 in mice during a visual categorization task. Mice were trained to obtain water reward on left vs. right depending upon the spatial frequency of a visual grating with a variable decision boundary. Through modeling of decision criteria, chemogenetic inactivation and electrophysiological recordings, the authors conclude that M2 contributes to flexible stimulus-action coupling.

I think the behavior is well-designed and mice seem to perform well. I also like the quantitative modeling of the behavior.

1) I find the overall effect of the DREADD inactivation of M2 on behavior to be small. It is not obvious to me that DREADD inactivation is being applied in a useful way here. Given that there is no cell-type-specific manipulation, it would probably have been simpler and better to use pharmacological inactivation (e.g. muscimol). This would likely give a complete inactivation of M2 rather than the reduction to ~30% activity currently shown in Figure 4B. Perhaps larger effects upon behavior might have been observed. The small effect size reported for M2, also means that the negative effects for mPFC and OFC inactivation are less impressive, although it is very good that the authors carried out these further experiments.

2) The electrophysiological data are summarized in Figure 5 as correlations, but the overall description of the data is rather limited. I think the authors could give a more extended analysis of spiking activity across trial time, including showing example neurons. I imagine that similar effects might be found in multiple other brain regions, if they were recorded.

3) I am somewhat concerned by the choice of stimuli presented to the mice. I read that the type of visual stimulus depends upon the boundary frequency. For example: "For the low boundary block, gratings at 0.03 and 0.095 cycles/o were presented for 90% of trials, gratings at the other frequencies were presented for 10% of trials, and the boundary frequency was between 0.065 and 0.095 cycles/o. For the high-boundary block, gratings at 0.095 and 0.3 cycles/o were presented for 90% of trials, and the boundary frequency was between 0.095 and 0.139 cycles/o." I think the statistics of presented stimuli will change perceptual thresholds. Why not use the same stimulus set throughout? This would seem to be fairer.

https://doi.org/10.7554/eLife.54474.sa1

Author response

Essential points:

1) The effect of chemogenetic inactivation is rather small, and the results and the data analysis do not appear to be very robust. Given that DREADD likely results in partial inactivation, it is difficult to interpret negative results for mPFC and OFC. Although the reviewers commend that these experiments were done, the results need to be interpreted more carefully, and tests using more complete inactivation (e.g. muscimol) would be preferable.

We thank the reviewers for the suggestion. We agree with the reviewers that muscimol inactivation of M2 may produce larger effect. However, due to the reason explained below, we chose to use the DREADD inactivation method.

In our experiment, most mice made less than 7 switches per session (3.94 ± 0.83 (mean ± SD) switches/session in CNO sessions and 4.07 ± 0.69 (mean ± SD) switches/session in saline sessions for M2 inactivation experiment), and we measured 8.7 ± 0.95 (mean ± SD) saline sessions and 8.7 ± 0.95 (mean ± SD) CNO sessions for each mouse to obtain enough number of trials for psychometric curve analysis and modelling analysis. Using the method of pharmacological inactivation would require that injection of muscimol/saline with micropipette be performed for about 18 sessions for each mouse. Such large numbers of micropipette penetration may itself cause brain tissue damage. Therefore, instead of the pharmacological method, we used the DREADD inactivation method, in which CNO or saline was injected intraperitoneally.

We have added the number of CNO and saline sessions in the Materials and methods of the revised manuscript. We also mentioned in the Discussion that “it should be noted that our chemogenetic manipulation did not produce a complete inactivation of neuronal activity and thus the negative effect of mPFC/OFC inactivation may need to be further confirmed using experiments with more complete inactivation”.

2) The authors use model-based analysis and conclude that the animal's choices are guided by stimulus history but not by reward history. Although this is a very important effort, the reviewers identified several issues that need to be addressed.

2a) The manuscript emphasizes changes in the decision boundary (task contingency) but the model analysis indicated that the animals were not reacting to reward history but stimulus history. It seems that this is mainly due to an unusual choice of stimuli (a majority of stimuli were chosen from right next to the decision boundary) used for each block, concurrently with shifting decision boundary. This unusual choice of stimuli might have masked the effect of reward history in behavior or data analysis. This task design needs to be explained more clearly in the Results section and preferably some figures representing it. Furthermore, the motivation of this task design, as opposed to shifting decision boundary without changing the stimulus statistics, needs to be explained.

We have explained the task and stimuli more clearly in the revised manuscript. Our visual task is similar to the auditory flexible categorization task described in a previous study (Jaramillo et al., 2014). For the visual system, the perceived size (or SF) of a visual object changes with viewing distance, and categorization of a visual stimulus as low or high SF may be adaptive to the change of viewing distance. Consistent with such change, the stimulus statistics differed between the low-boundary and high-boundary blocks in our task. This has been added in the revised manuscript.

2b) The validity of model-based analysis depends on whether the model was able to fit the data reasonably well in the first place. Please provide the evidence (quantification and visualization) of goodness of fit.

In the revised manuscript, we have provided visualization of model simulation and performed parameter recovery analysis.

2c) The authors conclude that the animals' choices were not affected by reward history based on the observation that the model that depends on stimulus history fit the data better than a reinforcement learning (RL) model. The reviewers thought that it is impossible to make such a conclusion just by a comparison with a particular RL model. The authors need to explore more thoroughly what alternative RL models may fit the data well. The current RL model that the authors used computes action values for left versus right choices without considering stimuli. A simple possibility is an RL model that computes action values specifically for each stimulus (corresponding to "states" in RL).

We have used a RL model in which sensory stimuli were used in the computation of action values (Lak et al., 2019). In this RL model, the expected value of left (right) choice is the learned value of left (right) choice weighted by current sensory stimulus, with the parameters α and γ representing learning rate and stimulus weight, respectively.

Since stimuli at adjacent SFs were similar and some stimuli were presented with a low probability, it is natural for us to assume the form in which action values are weighted by current sensory stimulus, rather than to compute action values specifically for each stimulus.

Combining the result of model comparison and the histogram of the number of mice best fit by each model (7 variants of the dynamic-DC model and the RL model), we found that model 4 of the dynamic-DC model (α1 = 0 and α2 = 0) was the winning model (Figure 3 and Figure 3—figure supplement 1).

3) The reviewers thought that the electrophysiological recording data are not thoroughly analyzed nor presented in an informative way. The reviewers make various suggestions to improve (see below). One possibility is to remove this part altogether (Reviewer 1) but we would like to see more informative presentations and insightful analyses of the electrophysiology data.

We have removed the regression and ANOVA analysis of the electrophysiological recording data. However, we believe that the analysis on the choice preference and the previous-stimulus preference is important, as we found that the preference for left-right choice in current trial was intimately related to the preference for stimulus in last trial, consistent with the finding that sensory history was important for correct action selection in the task. We also provided more informative presentations of M2 activity and additional sliding window analysis on the correlation between choice preference and previous-stimulus preference.

More detailed explanations of the above points from individual reviewers are included in the following. The manuscript will be re-evaluated based on your responses to these concerns and suggestions.

Reviewer #1:

[…]

Major concerns:

Many of the conclusions hinge on the model quality. However, there is no indication anywhere that the behavioural model is actually fitting the behavioural data well. Only comparisons between different models are presented. It is necessary and useful to visualise the model fits using psychometric curves.

We have provided visualization of the simulated behavior for our models and performed parameter recovery analysis, which are described in the responses below.

The stable period is conceptualised as a period when the decision criterion is stable. Yet the model shows that the DC is affected by stimulus history and lose-shift effects (Figure 2). Thus, the stable period is not so stable by these parameters. Given this, and the fact that blocks are short (60 trials), fitting the models separately on the stable and switch period might be problematic. This is particularly the case because the paper is then performing separate model comparison for the stable and the switching period trials. As such, a better approach might be to fit models on all trials, select the best model accordingly, and fit this best model separately on different sections of the data, if necessary. Or add a new parameter to the model that can indicate stable vs. switch epochs, and fit the model once using all trials.

We thank the reviewer for raising this point. In addition to fitting the models separately to choices in the stable and switch periods (Figure 2), we have used each of the models to fit choices in all trials (Figure 3—figure supplement 1). Combining the result of model comparison and the histogram of the number of mice best fit by each model, we found that model 4 of the dynamic-DC model (α1 = 0 and α2 = 0) remained the winning model (Figure 3—figure supplement 1).

We also visualized the simulated behavior for model 1 and model 4, using the parameters of the model fitted to data in all trials. As shown by Figure 3—figure supplement 2, both models could capture the dynamic change in performance for the reversing stimulus after the boundary switch. The number of trials to reverse choice estimated from model 1 simulation tended to be larger than that estimated from the actual performance, whereas that estimated from model 4 simulation matched well with the actual data (Figure 3—figure supplement 2).

Related to separate model fitting and in the case of inactivation data, why not fit a model to the CNO and Saline data together, and estimate a δ parameter which estimates how much the α/β/γ parameters are changed by inactivation?

As described above, we fitted each of the models to choices in all trials and performed model comparison analysis. We found that model 4 of the dynamic-DC model (α1 = 0 and α2 = 0) was the winning model (Figure 3—figure supplement 1).

We also performed parameter recovery analysis to check that our fitting procedure can accurately estimate parameters. We simulated model 1 and model 4, respectively, using the parameters of the model fitted to data in all trials for the 10 mice in Figure 1. For both models, the recovered parameters matched the original parameters (Figure 3—figure supplement 4). We also simulated model 1 or model 4 using the parameters of the model fitted separately to choices in the stable and switching periods. This was performed using two sets of parameters: one was the model parameters for the 10 mice in Figure 1, another was a wider range of parameters (see Materials and methods in the revised manuscript). For both cases, there was good agreement between the recovered and original parameters (Figure 3—figure supplement 5 and Figure 3—figure supplement 6).

The above analysis established that both model 1 and model 4 could recover original parameters from simulated data, we thus preferred to fit the model separately to CNO and saline data, and compare the model parameters between the two conditions.

The paper relies on fitting the model separately to saline and CNO sessions, to identify specific parameters that are affected by inactivation. But the model itself could be under-constrained, meaning the parameter estimates are not stable. It would be useful to simulate data with known parameter changes, and then see if it is possible to recover those parameter changes from the model based on the number of trials that were obtained.

We thank the reviewer for this suggestion. Using the parameter recovery analysis, we found that the recovered sensory history parameter γ2 in the switching period was significantly lower in CNO than in saline sessions, whereas the recovered parameter γ2 in the stable period was not significantly different between CNO and saline sessions, consistent with the actual parameter changes (Figure 4—figure supplement 5).

The comparison with the RL model: it appears that the RL model performs as good as the best regression model in the stable period but not in the switching period. What was the learning rate of the fitted RL in the switching period compared to stable period? Was there any constraints on learning rate when fitting? More generally, since the paper is considering a learning situation, the comparison with the RL model seems important and should be explained further. The class of RL model tested here can be reformulated to be analogous to the regression with dynamic decision criterion (prediction error-mediated changes in Q values can be adjusted to be analogous to changes in decision criterion). As such, it is unclear how these two models are testing competing hypotheses.

We thank the reviewer for raising this point. We agree with the reviewer that prediction error-mediated changes in Q values are somehow analogous to changes in decision criterion. However, the contribution of outcome history and sensory history can be separately analyzed in the dynamic-DC model, whereas the two factors were less separable in the RL model. We have revised our statement to: “we used the RL model to test an alternative hypothesis that the mouse might be updating the value functions of left and right choices separately, rather than comparing the current stimulus to one single decision boundary”.

In the RL model, the expected value of left (right) choice is the learned value of left (right) choice weighted by current sensory stimulus (Lak et al., 2019). The expected values of left and right choices (Ql and Qr) are mapped into the mice’s choice through a softmax function:

Pr(t)=eQr(t)eQl(t)+eQr(t),

where Pr represents the probability of right choice. For each trial t, Ql and Qr are calculated as:

Ql(t)=γ×(1S(t))×Vl(t),
Qr(t)=γ×S(t)×Vr(t),

where Vl and Vr are the value functions for left and right choices, respectively, and γ is the weight for current stimulus S(t). The SF of the stimulus was normalized between 0 and 1, in which 0 and 1 correspond to the lowest and highest SFs, respectively. The model updates the value functions according to the following rules:

Vl(t)=Vl(t1)+α×(1Ql(t1)) for a rewarded left choice,

Vr(t)=Vr(t1)+α×(1Qr(t1)) for a rewarded right choice,

Vl(t)=Vl(t1)+α×(0Ql(t1)) for an unrewarded left choice,

Vr(t)=Vr(t1)+α×(0Qr(t1)) for an unrewarded right choice,

where α is the learning rate, which is constrained within [0 1] (0 < α < 1), and (1-Ql), (1-Qr), (0-Ql) or (0-Qr) represents reward prediction error.

For the RL model fitted to choices in the stable and switching periods separately, the learning rate (α) in the switching period was significantly higher than that in the stable period (p = 0.002, Wilcoxon signed rank test, Figure 3).

We also used each of the models (the RL model and 7 variants of the dynamic-DC model) to fit choices in all trials (Figure 3—figure supplement 1). We found that the CV likelihood of model 1 or model 4 was both significantly higher than that of the RL model (Figure 3—figure supplement 1E and F). Combining the result of model comparison and the histogram of the number of mice best fit by each model, we found that model 4 of the dynamic-DC model (α1 = 0 and α2 = 0) was the winning model (Figure 3—figure supplement 1D−G).

The model only allows stimulus history effects after trials of non-reversing stimuli. Surely the mouse would be adjusting the DC for stimulus history even if that stimulus was the reversing stimulus. How do the result changes if considering these stimuli?

In our model, after a trial of non-reversing stimulus, DC was updated according to the following: DC(t)=(1β)×DC(t1)+γ2×S(t1), where γ2 is the weight for stimulus in previous trial. The S was normalized between -1 and 1, in which negative and positive values indicate lower and higher SFs, respectively, and 0 indicates the reversing stimulus. When the previous trial is the reversing stimulus, S(t-1) becomes 0 and DC is updated according to: DC(t)=(1β)×DC(t1). This has been clarified in the revised manuscript.

The data shows that M2 inactivation does not affect the correct rate for non-reversing stimulus. This is surprising and interesting given many of the studies the paper cites do find robust behavioural effect of M2 inactivation across stimulus conditions (Goard et al., 2016 visual detection, Guo et al., 2015 whisker detection). In these studies, the mice are presumably in a “stable” condition regarding stimulus-choice association. Why do you think there's this discrepancy? Does this relate to using chemogenetic (here) vs. trial by trial optogenetic used in those studies?

A recent study found that bilateral muscimol inactivation of M2 slowed the transition from action-guided to sound-guided response, without affecting the high performance of sound-guided trials after the transition (Siniscalchi et al., 2016). Consistent with this study, we found that bilateral inactivation of M2 slowed the adaptive adjustment of action selection following the boundary switch, without affecting choice behavior in steady state.

In our study, the visual stimulus was presented until the mouse chose one of the two side ports. However, the task in Goard et al., 2016 and Guo et al., 2014, both involved a short-term memory component. In the study of Guo et al., 2014, unilateral optogenetic inactivation of ALM caused an ipsilateral bias, with a stronger effect of inactivation during the delay epoch than the sample epoch. Erlich et al., 2011, found that unilateral inactivation of the rat FOF (equivalent to M2) generated a contralateral impairment, with stronger effect in memory than in non-memory trials. As our task did not involve a memory component, our result that the performance for non-reversing stimulus was not affected by M2 inactivation is consistent with the smaller effect of M2 inactivation on non-memory than on memory trials (Erlich et al., 2011; Goard et al., 2016; Guo et al., 2014).

It would be necessary and informative to see example of psychometric curves or learning curves in the inactivation condition vs. control, rather than only relying on model fits.

We have plotted psychometric curves and example performance curves for reversing stimulus in CNO vs. saline conditions (Figure 4—figure supplement 1).

The results from electrophysiology experiments are cryptic and hard to follow. It might be easier and more convincing to illustrate example neurons before introducing the other analyses. For instance, Figure 5—figure supplement 2 is an interesting result and should probably be the first one mentioned for the ephys analysis. Overall, the electrophysiological experiments do not seem to add to the paper, and it might be best to be removed from the paper.

Related to electrophysiological data, it is hard to understand the need to use three different analysis methods: regression, ANOVA, and ROC analysis, each doing slightly different things.

We have removed the regression and ANOVA analysis of the electrophysiological recording data. However, we believe that the analysis on the choice preference and the previous-stimulus preference is important, as we found that the preference for left-right choice in current trial was intimately related to the preference for stimulus in last trial, consistent with the finding that sensory history was important for correct action selection in the task. We also found that the representations of upcoming choice and sensory history in M2 were stronger during the switching than the stable period, which may account for the important role of M2 in adaptive action selection in the switching period.

To facilitate understanding of the analysis, we provided spike rasters and PSTHs of example M2 neurons in different choice conditions and in different sensory-history conditions (Figure 5A-D) before introducing the other analyses.

There are several instances of statistical tests not correcting for multiple comparisons. For instance, in Figure 4D, the effect of M2 inactivation on the percent correct for reversing stimuli seems to be statistically significant primarily due to data from 4 mice. Does this effect stay after correcting for multiple comparisons?

For the effect of M2 inactivation shown in Figure 4C and D, we have performed two-way ANOVA followed by Sidak’s multiple comparisons test, which showed that the reduction in performance for reversing stimulus was significant in the switching period (p = 0.022). This has been clarified in the revised manuscript.

Reviewer #2:

[…]

Major comments:

– The neural activity analysis, correlating ROC selectivity value for previous stimulus preference (non-reversing stimulus trial) and current left-right choice preference (reversing stimulus trial) (Figure 5D), is taken as evidence that M2 neurons use sensory history to influence current choice. The analyses were done for a particular time window of a trial. What happens if this analysis was applied to a sliding window starting from last trial to current or even next trial? When does this sensory-choice coupling emerge and when does it end? This is different from the decoding analysis is Figure 6, because it speaks to the interaction rather than decoding of choice or stimulus alone.

We thank the reviewer for this suggestion. We computed the correlation coefficient between the preference for left-right choice in current trial and the preference for stimulus in the last trial using a sliding window of 50 ms (Figure 5—figure supplement 6A and B). We found significant negative correlation for all time bins, including those time bins between central port entry and stimulus onset. For the correlation between the preference for left-right choice in current trial and the preference for stimulus in the trial before last (Figure 5—figure supplement 6C and D), the statistical significance of the correlation diminished in most time bins. Thus, the result suggests that the choice preference of M2 neurons in current trial is strongly influenced by stimulus history, and the preference for choice in current trial is coupled to the preference for stimulus in the last but not earlier trial.

– Again, because Figure 5D is important – currently this analysis was done for cases when current trial was the reversing stimulus and the prior trial was the non-reversing stimulus. What about for other trial conditions? Do we still see the correlation in the sensory and motor related neural signals? In particular, what about the case when the current trial was the reversing stimulus and the prior trial was also a reversing stimulus?

In Figure 5D (now in Figure 5L in the revised manuscript), we analyzed the relationship between the preference for choice in current reversing-stimulus trial and the preference for stimulus in previous trial. To compute the preference for previous stimulus, we divided the responses of current trial to two groups according to the SF in previous trial (previous low-frequency vs. previous high-frequency). When the previous trial was also the reversing stimulus, we were not able to divide the responses to two groups to compute the previous-stimulus preference.

For sensory-choice coupling in other trial conditions, we may also consider the case when the current trial was a non-reversing stimulus. Because the non-reversing stimulus at the lowest SF and the reversing stimulus were presented for 90% of trials in the low-boundary block, and the non-reversing stimulus at the highest SF and the reversing stimulus were presented for 90% of trials in the high-boundary block, we could consider analyzing the responses for the non-reversing stimulus at the lowest or the highest SF. In such case, however, we were not able to divide the responses of current trial to two groups according to the SF in previous trial, because all previous trials would have a SF higher (or lower) than that in current trial.

Therefore, we analyzed sensory-choice coupling only when the current trial was the reversing stimulus and the previous trial was the non-reversing stimulus.

– The comparison between M2 and mPFC and OFC is important. The results were presented as Figure 4—figure supplement 5 for mPFC and figure supplement 6 for OFC. I feel that these are exciting results demonstrating regional differences. At least some parts of each should be moved to be a main figure.

As reviewer 3 pointed out that “the small effect size reported for M2 means that the negative effects for mPFC and OFC inactivation are less impressive”, we thus chose to keep the results of mPFC and OFC inactivation in figure supplements.

To address the concern of reviewer 3, we also mentioned in the Discussion that “it should be noted that our chemogenetic manipulation did not produce a complete inactivation of neuronal activity and thus the negative effect of mPFC/OFC inactivation may need to be further confirmed using experiments with more complete inactivation”.

Reviewer #3:

Wang, Liu and Yao study the role of M2 in mice during a visual categorization task. Mice were trained to obtain water reward on left vs. right depending upon the spatial frequency of a visual grating with a variable decision boundary. Through modeling of decision criteria, chemogenetic inactivation and electrophysiological recordings, the authors conclude that M2 contributes to flexible stimulus-action coupling.

I think the behavior is well-designed and mice seem to perform well. I also like the quantitative modeling of the behavior.

1) I find the overall effect of the DREADD inactivation of M2 on behavior to be small. It is not obvious to me that DREADD inactivation is being applied in a useful way here. Given that there is no cell-type-specific manipulation, it would probably have been simpler and better to use pharmacological inactivation (e.g. muscimol). This would likely give a complete inactivation of M2 rather than the reduction to ~30% activity currently shown in Figure 4B. Perhaps larger effects upon behavior might have been observed. The small effect size reported for M2, also means that the negative effects for mPFC and OFC inactivation are less impressive, although it is very good that the authors carried out these further experiments.

We thank the reviewer for pointing out this issue. We agree with the reviewer that muscimol inactivation of M2 may produce larger effect. However, due to the reason explained below, we chose to use the DREADD inactivation method.

In our experiment, most mice made less than 7 switches per session (3.94 ± 0.83 (mean ± SD) switches/session in CNO sessions and 4.07 ± 0.69 (mean ± SD) switches/session in saline sessions for M2 inactivation experiment), and we measured 8.7 ± 0.95 (mean ± SD) saline sessions and 8.7 ± 0.95 (mean ± SD) CNO sessions for each mouse to obtain enough number of trials for psychometric curve analysis and modelling analysis. Using the method of pharmacological inactivation would require that injection of muscimol/saline with micropipette be performed for about 18 sessions for each mouse. Such large numbers of micropipette penetration may itself cause brain tissue damage. Therefore, instead of the pharmacological method, we used the DREADD inactivation method, in which CNO or saline was injected intraperitoneally.

We have added the number of CNO and saline sessions in the Materials and methods of the revised manuscript. We have mentioned in the Discussion the limitation of DREADD inactivation: “it should be noted that our chemogenetic manipulation did not produce a complete inactivation of neuronal activity and thus the negative effect of mPFC/OFC inactivation may need to be further confirmed using experiments with more complete inactivation”.

2) The electrophysiological data are summarized in Figure 5 as correlations, but the overall description of the data is rather limited. I think the authors could give a more extended analysis of spiking activity across trial time, including showing example neurons. I imagine that similar effects might be found in multiple other brain regions, if they were recorded.

We thank the reviewer for this suggestion. We have added spike rasters and PSTHs of example M2 neurons, with the responses grouped by choice or by previous stimulus (Figure 5A-D).

We also performed a sliding window analysis of correlation between the preference for left-right choice in current trial and the preference for stimulus in last trial (or the trial before last) (Figure 5—figure supplement 6).

As our electrophysiological recordings were performed from M2, we do not know whether similar effects are found from other brain regions.

3) I am somewhat concerned by the choice of stimuli presented to the mice. I read that the type of visual stimulus depends upon the boundary frequency. For example: "For the low boundary block, gratings at 0.03 and 0.095 cycles/o were presented for 90% of trials, gratings at the other frequencies were presented for 10% of trials, and the boundary frequency was between 0.065 and 0.095 cycles/o. For the high-boundary block, gratings at 0.095 and 0.3 cycles/o were presented for 90% of trials, and the boundary frequency was between 0.095 and 0.139 cycles/o." I think the statistics of presented stimuli will change perceptual thresholds. Why not use the same stimulus set throughout? This would seem to be fairer.

Our visual task is similar to the auditory flexible categorization task described in a previous study (Jaramillo et al., 2014). For the visual system, the perceived size (or SF) of a visual object changes with viewing distance, and categorization of a visual stimulus as low or high SF may be adaptive to the change of viewing distance. Consistent with such change, the stimulus statistics differed between the low-boundary and high-boundary blocks in our task.

We found that mice depended on sensory history to correctly change stimulus-action association when categorical boundary switched between blocks. As the switch in categorization boundary in our task was accompanied by a change in stimulus statistics, this may promote a strategy that involves sensory history.

We have clarified the above issue in the revised manuscript.

https://doi.org/10.7554/eLife.54474.sa2

Article and author information

Author details

  1. Tian-Yi Wang

    1. Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
    2. University of Chinese Academy of Sciences, Beijing, China
    Contribution
    Conceptualization, Formal analysis, Investigation, Visualization, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6488-339X
  2. Jing Liu

    1. Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
    2. University of Chinese Academy of Sciences, Beijing, China
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  3. Haishan Yao

    1. Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
    2. Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
    Contribution
    Conceptualization, Supervision, Funding acquisition, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    haishanyao@ion.ac.cn
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4974-9197

Funding

Chinese Academy of Sciences (XDB32010200)

  • Haishan Yao

Shanghai Municipal Science and Technology Commission (2018SHZDZX05)

  • Haishan Yao

National Natural Science Foundation of China (31571079)

  • Haishan Yao

National Natural Science Foundation of China (31771151)

  • Haishan Yao

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Ning-long Xu and Tianming Yang for advice on the modelling of behavior and data analysis. We thank Dechen Liu for discussion on electrode implantation, Taorong Xie for the initial version of Matlab scripts of data acquisition and Yaping Li for technical assistance. This work was supported by the Strategic Priority Research Program of Chinese Academy of Sciences (grant No. XDB32010200), Shanghai Municipal Science and Technology Major Project (grant No. 2018SHZDZX05) and the National Natural Science Foundation of China (31571079, 31771151).

Ethics

Animal experimentation: Animal use procedures were approved by the Animal Care and Use Committee at the Institute of Neuroscience, Chinese Academy of Sciences (approval number NA-013-2019), and were in accordance with the guidelines of the Animal Advisory Committee at the Shanghai Institutes for Biological Sciences.

Senior Editor

  1. Timothy E Behrens, University of Oxford, United Kingdom

Reviewing Editor

  1. Naoshige Uchida, Harvard University, United States

Reviewers

  1. Alex C Kwan, Yale University, United States
  2. Carl CH Petersen, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland

Version history

  1. Received: December 16, 2019
  2. Accepted: June 24, 2020
  3. Accepted Manuscript published: June 24, 2020 (version 1)
  4. Version of Record published: July 8, 2020 (version 2)

Copyright

© 2020, Wang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,598
    Page views
  • 226
    Downloads
  • 10
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Tian-Yi Wang
  2. Jing Liu
  3. Haishan Yao
(2020)
Control of adaptive action selection by secondary motor cortex during flexible visual categorization
eLife 9:e54474.
https://doi.org/10.7554/eLife.54474

Further reading

    1. Neuroscience
    Amanda J González Segarra, Gina Pontes ... Kristin Scott
    Research Article

    Consumption of food and water is tightly regulated by the nervous system to maintain internal nutrient homeostasis. Although generally considered independently, interactions between hunger and thirst drives are important to coordinate competing needs. In Drosophila, four neurons called the interoceptive subesophageal zone neurons (ISNs) respond to intrinsic hunger and thirst signals to oppositely regulate sucrose and water ingestion. Here, we investigate the neural circuit downstream of the ISNs to examine how ingestion is regulated based on internal needs. Utilizing the recently available fly brain connectome, we find that the ISNs synapse with a novel cell-type bilateral T-shaped neuron (BiT) that projects to neuroendocrine centers. In vivo neural manipulations revealed that BiT oppositely regulates sugar and water ingestion. Neuroendocrine cells downstream of ISNs include several peptide-releasing and peptide-sensing neurons, including insulin producing cells (IPCs), crustacean cardioactive peptide (CCAP) neurons, and CCHamide-2 receptor isoform RA (CCHa2R-RA) neurons. These neurons contribute differentially to ingestion of sugar and water, with IPCs and CCAP neurons oppositely regulating sugar and water ingestion, and CCHa2R-RA neurons modulating only water ingestion. Thus, the decision to consume sugar or water occurs via regulation of a broad peptidergic network that integrates internal signals of nutritional state to generate nutrient-specific ingestion.

    1. Neuroscience
    Lucas Y Tian, Timothy L Warren ... Michael S Brainard
    Research Article

    Complex behaviors depend on the coordinated activity of neural ensembles in interconnected brain areas. The behavioral function of such coordination, often measured as co-fluctuations in neural activity across areas, is poorly understood. One hypothesis is that rapidly varying co-fluctuations may be a signature of moment-by-moment task-relevant influences of one area on another. We tested this possibility for error-corrective adaptation of birdsong, a form of motor learning which has been hypothesized to depend on the top-down influence of a higher-order area, LMAN (lateral magnocellular nucleus of the anterior nidopallium), in shaping moment-by-moment output from a primary motor area, RA (robust nucleus of the arcopallium). In paired recordings of LMAN and RA in singing birds, we discovered a neural signature of a top-down influence of LMAN on RA, quantified as an LMAN-leading co-fluctuation in activity between these areas. During learning, this co-fluctuation strengthened in a premotor temporal window linked to the specific movement, sequential context, and acoustic modification associated with learning. Moreover, transient perturbation of LMAN activity specifically within this premotor window caused rapid occlusion of pitch modifications, consistent with LMAN conveying a temporally localized motor-biasing signal. Combined, our results reveal a dynamic top-down influence of LMAN on RA that varies on the rapid timescale of individual movements and is flexibly linked to contexts associated with learning. This finding indicates that inter-area co-fluctuations can be a signature of dynamic top-down influences that support complex behavior and its adaptation.