Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorJeffrey ErlichSainsbury Wellcome Centre, London, United Kingdom
- Senior EditorMichael FrankBrown University, Providence, United States of America
Reviewer #1 (Public review):
Summary:
In this manuscript, the authors train mice on a two-armed bandit task, in which the reward value associated with the arms suddenly switches in a pseudorandom fashion. Their first finding is that the mice are able to anticipate the reward value switch points after long blocks, evident both prior to the switch point with higher rates of switching to the less-rewarded arm, and after the switch point with faster transition to the more-rewarded arm. They next find that unilateral ACAd/MO lesion / optogenetic silencing (surprisingly) causes greater anticipation of reward switch points, both prior to and after the switch point. They use behavioral modeling to argue that the unilateral ACAd/MO lesion effects are due to an increase in the contralateral hazard rate. Finally, they found that bilateral lesions did not have any effect on the hazard rate, suggesting that the unilateral lesion effect is due to balancing between hemispheres. This manuscript employed a clever behavioral design and analysis approach, though the effects were somewhat difficult to interpret and the author's interpretation relies heavily on the accuracy of their underlying behavioral model.
Strengths:
This paper employs a well-designed task that allows the researchers to detect whether mice have noticed a change in reward value both before and after the change takes place. The use of unilateral and bilateral inactivation experiments allowed the authors to test the role of the ACAd/MO region in the change point estimation. They found that unilateral inactivation, but not bilateral inactivation, had a significant effect on behavior. They performed sophisticated behavioral analysis to determine how ACAd/MO perturbations affect decision-making variables. This topic is of interest to the field, and the results are presented clearly and generally convincing.
Weaknesses:
The observed effects of the lesions are somewhat counterintuitive, with lesions appearing to affect persistence within a block more than change point detection itself-the mice actually adjusted more quickly to changes in reward values. Moreover, they had no issue detecting change points after bilateral inactivation. As a result, I'm not sure if the main framing of the article (including the title) is supported by their findings. Finally, I was unsure how the differences between unilateral and bilateral inactivation could be explained by their behavioral model.
Reviewer #2 (Public review):
Summary:
The manuscript by Murphy et al. titled "Change point estimation by the mouse medial prefrontal cortex during probabilistic reward learning" investigated the role of the mPFC in the exploitation of task structure. Previous work had shown that monkeys and humans exploit predictable task structures (e.g., switching rapidly when heavily trained a reversal learning task), but whether this was also the case for mice was not known. To test this, Murphy et al. trained head-fixed mice on a two-armed bandit task in which the contingencies reversed when mice met a performance criterion (10 trials choosing the better option) plus an additional random number of trials (referred to as Lrandom). They found that as the length of Lrandom increased, mice began to exhibit pre-emptive switching in their choices as if they were expecting and/or anticipating the reversal to occur. They report that unilateral lesions of the mPFC (ACC + MO) led to earlier pre-emptive switching (although I found this part of the manuscript the most challenging to understand) and faster post-reversal switching that they argue reflects an impairment in the proper estimation of the reversal. They also report that this requires inter-hemispheric coordination because bilateral lesions did not further impair this estimation. Optogenetic inhibition just prior to the mouse making a choice recapitulated some of the behavioral metrics observed in the mPFC lesioned animals. Finally, the authors developed a novel hybrid belief-choice kernel model to provide a computational approach to quantifying these behavioral differences.
Strengths:
The paper is extremely well written and was an absolute pleasure to read. The results are novel and provide exciting (although not surprising) evidence that mice exploit task structures to earn rewards. Moreover, the experiments were well-designed and included appropriate controls and/or control conditions that support their findings.
Weaknesses:
Some of the results need to be clarified and/or language changed to ensure that readers will understand. Restricting analyses to expert mice that show the predicted effect is problematic.
Reviewer #3 (Public review):
Summary:
The authors examine the role of the medial frontal cortex of mice in exploiting statistical structure in tasks. They claim that mice are "proactive": they predict upcoming changes, rather than responding in a "model-free" way to environmental changes. Further, they speculate that the estimation of future change (i.e., prediction of upcoming events, based on learning temporal regularities) might be "a main ... function of dorsal medial frontal cortex (dmFC)." Unfortunately, the current manuscript contains flaws such that the evidence supporting these claims is inadequate.
Strengths:
Understanding the neural mechanisms by which we learn about statistical structure in the world is an important goal. The authors developed an interesting task and used model-based techniques to try to understand the mechanisms by which perturbation of dmFC influenced behavior. They demonstrate that lesions and optogenetic silencing of dmFC influence behavior, showing that this region has a causal influence on the task.
Weaknesses:
I was concerned that the main behavioral effects shown in Figure 1F were a statistical artifact. By requiring the Geometric block length to be preceded by a performance-based block, the authors introduce a dependence that can generate the phenomena they describe as anticipation.
To demonstrate this, I simulated their task with an agent that does not have any anticipation of the change point (Reviewer image 1). The agent repeats the previous action with probability `p(repeat)` (similar to the choice kernel in the author's models). If the agent doesn't repeat then the next choice depends on the previous outcome. If the previous choice was rewarded, it stays with `P(WS)` and chooses randomly with `1-P(WS)`. If the previous choice was unrewarded, it switches with `P(LS)` and chooses randomly with `1-P(LS)`.
Review image 1.
An agent with `P(WS)=P(LS)=P(repeat)=0.85` shows the same phenomena as the mice: a difference in performance before the block switch and "earlier" crossing of the midpoint after the switch. https://imgdrop.io/image/aHn6y. The phenomena go away in the simulations when a fixed block length of 20 trials is followed by a Geometric block length.
The authors did not completely rely on the phenomena of Figure 1F for their conclusions. They did a model comparison to provide evidence that animals are anticipating the switch. Unfortunately, the authors did not use state-of-the-art methods in this section of the paper. In particular, they failed to show that under a range of generative parameters for each model class, the model selection process chooses the correct model class (i.e. a confusion matrix). A more minor point, they used BIC instead of a more robust cross-validated metric for model selection. Finally, instead of comparing their "best" anticipating model to their 2nd best model (without anticipation), they compared their best to their 4th best (Supp Fig 3.5). This seems misleading.
Given all of the the above issues, it is hard to critically evaluate the model-based analysis of the effects of lesions/optogenetics.