Change point estimation by the mouse medial frontal cortex during probabilistic reward learning

  1. Department of Psychiatry, Yale University School of Medicine, New Haven, USA
  2. Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, USA
  3. Department of Neuroscience, Yale University School of Medicine, New Haven, USA
  4. Meinig School of Biomedical Engineering, Cornell University, Ithaca, USA
  5. Department of Psychiatry, Weill Cornell Medicine, New York, USA
  6. Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Jeffrey Erlich
    Sainsbury Wellcome Centre, London, United Kingdom
  • Senior Editor
    Michael Frank
    Brown University, Providence, United States of America

Reviewer #1 (Public review):

Summary:

In this manuscript, the authors train mice on a two-armed bandit task, in which the reward value associated with the arms suddenly switches in a pseudorandom fashion. Their first finding is that the mice are able to anticipate the reward value switch points after long blocks, evident both prior to the switch point with higher rates of switching to the less-rewarded arm, and after the switch point with faster transition to the more-rewarded arm. They next find that unilateral ACAd/MO lesion / optogenetic silencing (surprisingly) causes greater anticipation of reward switch points, both prior to and after the switch point. They use behavioral modeling to argue that the unilateral ACAd/MO lesion effects are due to an increase in the contralateral hazard rate. Finally, they found that bilateral lesions did not have any effect on the hazard rate, suggesting that the unilateral lesion effect is due to balancing between hemispheres. This manuscript employed a clever behavioral design and analysis approach, though the effects were somewhat difficult to interpret and the author's interpretation relies heavily on the accuracy of their underlying behavioral model.

Strengths:

This paper employs a well-designed task that allows the researchers to detect whether mice have noticed a change in reward value both before and after the change takes place. The use of unilateral and bilateral inactivation experiments allowed the authors to test the role of the ACAd/MO region in the change point estimation. They found that unilateral inactivation, but not bilateral inactivation, had a significant effect on behavior. They performed sophisticated behavioral analysis to determine how ACAd/MO perturbations affect decision-making variables. This topic is of interest to the field, and the results are presented clearly and generally convincing.

Weaknesses:

The observed effects of the lesions are somewhat counterintuitive, with lesions appearing to affect persistence within a block more than change point detection itself-the mice actually adjusted more quickly to changes in reward values. Moreover, they had no issue detecting change points after bilateral inactivation. As a result, I'm not sure if the main framing of the article (including the title) is supported by their findings. Finally, I was unsure how the differences between unilateral and bilateral inactivation could be explained by their behavioral model.

Reviewer #2 (Public review):

Summary:

The manuscript by Murphy et al. titled "Change point estimation by the mouse medial prefrontal cortex during probabilistic reward learning" investigated the role of the mPFC in the exploitation of task structure. Previous work had shown that monkeys and humans exploit predictable task structures (e.g., switching rapidly when heavily trained a reversal learning task), but whether this was also the case for mice was not known. To test this, Murphy et al. trained head-fixed mice on a two-armed bandit task in which the contingencies reversed when mice met a performance criterion (10 trials choosing the better option) plus an additional random number of trials (referred to as Lrandom). They found that as the length of Lrandom increased, mice began to exhibit pre-emptive switching in their choices as if they were expecting and/or anticipating the reversal to occur. They report that unilateral lesions of the mPFC (ACC + MO) led to earlier pre-emptive switching (although I found this part of the manuscript the most challenging to understand) and faster post-reversal switching that they argue reflects an impairment in the proper estimation of the reversal. They also report that this requires inter-hemispheric coordination because bilateral lesions did not further impair this estimation. Optogenetic inhibition just prior to the mouse making a choice recapitulated some of the behavioral metrics observed in the mPFC lesioned animals. Finally, the authors developed a novel hybrid belief-choice kernel model to provide a computational approach to quantifying these behavioral differences.

Strengths:

The paper is extremely well written and was an absolute pleasure to read. The results are novel and provide exciting (although not surprising) evidence that mice exploit task structures to earn rewards. Moreover, the experiments were well-designed and included appropriate controls and/or control conditions that support their findings.

Weaknesses:

Some of the results need to be clarified and/or language changed to ensure that readers will understand. Restricting analyses to expert mice that show the predicted effect is problematic.

Reviewer #3 (Public review):

Summary:

The authors examine the role of the medial frontal cortex of mice in exploiting statistical structure in tasks. They claim that mice are "proactive": they predict upcoming changes, rather than responding in a "model-free" way to environmental changes. Further, they speculate that the estimation of future change (i.e., prediction of upcoming events, based on learning temporal regularities) might be "a main ... function of dorsal medial frontal cortex (dmFC)." Unfortunately, the current manuscript contains flaws such that the evidence supporting these claims is inadequate.

Strengths:

Understanding the neural mechanisms by which we learn about statistical structure in the world is an important goal. The authors developed an interesting task and used model-based techniques to try to understand the mechanisms by which perturbation of dmFC influenced behavior. They demonstrate that lesions and optogenetic silencing of dmFC influence behavior, showing that this region has a causal influence on the task.

Weaknesses:

I was concerned that the main behavioral effects shown in Figure 1F were a statistical artifact. By requiring the Geometric block length to be preceded by a performance-based block, the authors introduce a dependence that can generate the phenomena they describe as anticipation.

To demonstrate this, I simulated their task with an agent that does not have any anticipation of the change point (Reviewer image 1). The agent repeats the previous action with probability `p(repeat)` (similar to the choice kernel in the author's models). If the agent doesn't repeat then the next choice depends on the previous outcome. If the previous choice was rewarded, it stays with `P(WS)` and chooses randomly with `1-P(WS)`. If the previous choice was unrewarded, it switches with `P(LS)` and chooses randomly with `1-P(LS)`.

Review image 1.

An agent with `P(WS)=P(LS)=P(repeat)=0.85` shows the same phenomena as the mice: a difference in performance before the block switch and "earlier" crossing of the midpoint after the switch. https://imgdrop.io/image/aHn6y. The phenomena go away in the simulations when a fixed block length of 20 trials is followed by a Geometric block length.

The authors did not completely rely on the phenomena of Figure 1F for their conclusions. They did a model comparison to provide evidence that animals are anticipating the switch. Unfortunately, the authors did not use state-of-the-art methods in this section of the paper. In particular, they failed to show that under a range of generative parameters for each model class, the model selection process chooses the correct model class (i.e. a confusion matrix). A more minor point, they used BIC instead of a more robust cross-validated metric for model selection. Finally, instead of comparing their "best" anticipating model to their 2nd best model (without anticipation), they compared their best to their 4th best (Supp Fig 3.5). This seems misleading.

Given all of the the above issues, it is hard to critically evaluate the model-based analysis of the effects of lesions/optogenetics.

Author response:

We appreciate the reviewers' thoughtful and constructive comments. In this provisional response, we aim to address what we see as the key critiques, with a detailed, point-by-point reply to be provided alongside the revised manuscript. Below, we outline how we intend to address these critiques in the revised manuscript.

(1) We will revise sections of the manuscript to ensure that all results, particularly those concerning the effects of lesions, are described more clearly and with sufficient context. This includes providing additional visualizations and rewording any ambiguous statements.

(2) In this study, we examined a subset of 7,396 blocks where animals quickly adapted after block switches (achieving LCriterion in 20 or fewer trials), thereby focusing on expert-level performance and avoiding periods that might be affected by low motivation. It is valid to question whether the same observations would hold if the full dataset were analyzed. To address this, we expanded our analysis to include a supplementary figure Supplementary Figure 1.1 that illustrate the same relationships based on block length (BL) instead of LRandom, both with and without the restriction on LCriterion (n = 9,156 blocks in which the block length is under 100 trials, without any LCriterion restrictions), and based on LRandom without any LCriterion restrictions and with a less stringent LCriterion restriction (with ≤ 50 Trials for the criterion). This method allowed us to include all trials in our dataset. We observed similar effects of block length on choice behavior around switches (Figure 3), confirming the consistency of our findings across different analytical conditions.

(3) We agree that robust validation of model selection is crucial. To address this, we will generate a confusion matrix to assess whether our model selection process accurately identifies the correct model class across a range of generative parameters. Include additional model selection metrics, such as cross-validation, to complement the BIC analysis and provide a more robust comparison of models.

(4) We acknowledge the concern regarding our comparison of the "best" and the "4th best" models. The "4th best" model was chosen because it is the most widely recognized in the literature. Our intention was to demonstrate the performance of the most commonly used model, but we understand how this may have been misleading. To address this, we will revise our comparison to focus on the "best" and the "2nd best" models, ensuring greater clarity in the manuscript. Additionally, we will include supplementary simulation results and figures to provide a more comprehensive analysis on models.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation