The Periaqueductal Gray Selectively Supports Reversal Learning During a Flexible Discrimination Task in Mice

  1. Department of Neuroscience, Rappaport Faculty of Medicine, Technion – Israel Institute of Technology, Haifa, Israel
  2. Department of Neuroscience, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
  3. Department of Psychology, New York University, New York, United States

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Shelly Flagel
    University of Michigan, Ann Arbor, United States of America
  • Senior Editor
    Kate Wassum
    University of California, Los Angeles, Los Angeles, United States of America

Reviewer #1 (Public review):

Summary:

The authors aimed to determine the neural networks involved in updating behaviour by training mice on a 'go / no go' odour discrimination task, and measuring their brain activity using functional MRI.

Strengths:

The use of the translationally relevant 'go / no go' task is a major strength, as this is a task that can be used as readily in humans as in animals such as mice. The use of fMRI in awake, behaving mice is also a major strength, as this allows the activation of multiple brain regions to be measured while behaviour is ongoing, and also facilitates comparison to human studies. The computational modelling approaches further support these translational aims, again being as readily applied to human data as to animal data.

Weaknesses:

The major weakness of the paper - and one that is potentially addressable - is that the key analysis of the paper, showing that the periaqueductal gray (PAG) is recruited for reversal learning, is only partially supported by the data presented in the paper as it stands. The authors have used a sophisticated way of analysing the behavioural data using 'signal detection theory', in which they collected behavioural data showing correct 'go' responses ('hits'), correct 'no go' responses ('correct rejections'), missed 'go' responses ('misses') and go responses when mice should have withheld a response ('false alarms'). The data presented showing a double dissociation in the activation of the nucleus accumbens for 'hits' but not 'correct rejections' and the PAG for 'correct rejections' but not 'hits' is very interesting; however, it is confounded by the fact that the nucleus accumbens may activate when the animal makes a response, and the PAG when the animal withholds a response. If the authors also included the analysis of nucleus accumbens and PAG activation for 'misses' and 'false alarms', this would allow them to determine whether the activation of these regions reflects the behavioural response or the expectation of reinforcement from the response.

Thus, the paper includes very interesting data and is impressive in its approach to analysing behaviour in a manner that is highly translatable between species. The additional analyses would markedly strengthen the paper and would add depth to the finding that the PAG appears to be involved in behavioural flexibility.

Reviewer #2 (Public review):

Summary:

In this manuscript, the authors test the hypothesis that whole-brain functional magnetic resonance imaging in behaving mice, coupled with reinforcement-learning modeling, can dissociate neural substrates of initial cue-reward acquisition versus contingency reversal, and potentially reveal underappreciated contributors to cognitive flexibility. Using a head-fixed go/no-go odor discrimination task with subsequent rule reversal in a subset of mice, they model trial-by-trial state-action values with a model-free Q-learning algorithm (hierarchical Bayesian fit) and use the model-derived decision variable as a parametric regressor in whole-brain analyses. They report that acquisition-related signals prominently involve ventral and dorsal striatal regions, whereas reversal learning additionally recruits the periaqueductal gray (negative correlation with the decision variable) and shows an apparent double dissociation between nucleus accumbens and periaqueductal gray responses for hit versus correct-rejection outcomes during reversal.

Strengths:

(1) The reversal manipulation is implemented without explicit punishment, targeting suppression of previously rewarded actions under reward omission - an underexplored regime for midbrain contributions beyond canonical threat/pain framing.

(2) The manuscript provides a credible MR-compatible olfactory/licking platform with synchronized sniff/lick/valve/reward timing and high-field imaging, supporting feasibility and broader utility for mesoscale systems neuroscience in rodents.

(3) Trial-by-trial value estimates from a Q-learning variant are fit via hierarchical Bayesian inference and explicitly integrated into subject-level general linear models with a mouse hemodynamic response function, which is appropriate for leveraging within-subject dynamics in small-N rodent fMRI.

(4) The decision-variable maps during acquisition recover expected basal ganglia involvement (including nucleus accumbens and dorsal striatum), providing face validity; the reversal-stage map yields an interpretable set of cortical/striatal/pallidal regions plus periaqueductal gray/hippocampus.

(5) The finite impulse response analysis stratified by behavioral outcomes (hit, false alarm, correct rejection, miss) adds interpretability beyond the model regressor alone, and the reported crossover interaction between nucleus accumbens and periaqueductal gray is potentially impactful if robust.

Weaknesses:

(1) The core claim regarding selective periaqueductal gray engagement rests on a subset of n = 6 mice for reversal. With permutation-based whole-brain inference and very small cluster sizes, the robustness of the periaqueductal gray effect to reasonable analytic perturbations is not yet convincing. I would suggest providing leave-one-animal-out analyses for the periaqueductal gray cluster/ROI effects and reporting how often the key findings survive.

(2) The authors note that due to temporal resolution and hemodynamics, they cannot separate stimulus, choice, and feedback and therefore model "whole trials." This limitation creates ambiguity about whether periaqueductal gray signals reflect value updating, action inhibition (no-lick), reward omission, autonomic arousal, or motor preparation/withholding, especially given the strong hit versus correct-rejection opponency. I would suggest adding targeted analyses that disambiguate "withholding" from "reversal-related updating".

(3) ROIs are defined from the whole-brain decision-variable maps and then interrogated by outcome types; the manuscript acknowledges non-independence. This can inflate apparent dissociations. It would be better if the authors define ROIs independently (anatomical periaqueductal gray/nucleus accumbens masks, or split-half ROI definition with held-out data) and repeat the key ROI conclusions.

(4) The reversal group is a subset of the acquisition cohort and also experiences a different task phase structure and additional sessions; the paper attempts to address exposure differences descriptively. I would suggest that the authors formally test whether periaqueductal gray effects are explained by session count, time-in-scanner, or learning rate differences (e.g., include these as covariates, or match sessions more strictly).

(5) The platform records sniffing and licking, but the imaging models described include motion, global, and ventricle regressors and do not clearly include trialwise lick/sniff covariates. Given the periaqueductal gray's known autonomic and defensive coordination roles, physiological state confounding is a major concern. Could the authors incorporate sniff and lick metrics (and their derivatives) as nuisance regressors and show whether the periaqueductal gray effects persist?

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation