Neural mechanisms of credit assignment for delayed outcomes during contingent learning

  1. Center for Mind and Brain, University of California Davis, Davis, USA
  2. Department of Psychology, University of California Davis, Davis, USA
  3. National Institute on Drug Abuse Intramural Research Program, National Institutes of Health, Baltimore, USA
  4. Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
  5. Google DeepMind, London, UK
  6. Faculty of Human Sciences, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
  7. Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
  8. Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford, UK
  9. Sainsbury Wellcome Centre for Neural Circuits and Behaviour, University College London, London, UK

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Xiaosi Gu
    Icahn School of Medicine at Mount Sinai, New York, United States of America
  • Senior Editor
    Michael Frank
    Brown University, Providence, United States of America

Reviewer #1 (Public review):

Summary:

The authors conducted a study on one of the fundamental research topics in neuroscience: neural mechanisms of credit assignment. Building on the original studies of Walton and his colleagues and subsequent studies on the same topic, the authors extended the research into the delayed credit assignment problem with clever task design, which compared the non-delayed (direct) and delayed (indirect) credit assignment processes. Their primary goal was to elucidate the neural basis of these processes in humans, advancing our understanding beyond previous studies.

Strengths:

(1) Innovative task design distinguishing between direct and indirect credit assignment.

(2) Use of sophisticated multivariate pattern analysis to identify neural correlates of pending representations.

(3) Well-executed study with clear presentation of results.

(4) Extension of previous research to human subjects, providing valuable comparative insights.

Considerations for Future Research:

(1) The task design, while clear and effective, might be further developed to capture more real-world complexity in credit assignment.

(2) There's potential for deeper exploration of the role of task structure understanding in credit assignment processes.

(3) The interpretation of lateral orbitofrontal cortex (lOFC) involvement could be expanded to consider its role in both credit assignment and task structure representation.

Achievement of Aims and Support of Conclusions:

The authors successfully achieved their aim of investigating direct and indirect credit assignment processes in humans. Their results provide valuable insights into the neural representations involved in these processes. The study's conclusions are generally well-supported by the data, particularly in identifying neural correlates of pending representations crucial for delayed credit assignment.

Impact on the Field and Utility of Methods:

This study makes a significant contribution to the field of credit assignment research by bridging animal and human studies. The methods, particularly the multivariate pattern analysis approach, provide a robust template for future investigations in this area. The data generated offers valuable insights for researchers comparing human and animal models of credit assignment, as well as those studying the neural basis of decision-making and learning.

The study's focus on the lOFC and its role in credit assignment adds to our understanding of this brain region's function.

Additional Context and Future Directions:

(1) Temporal ambiguity in credit assignment: While the current design provides clear task conditions, future studies could explore more ambiguous scenarios to further reflect real-world complexity.

(2) Role of task structure understanding: The difference in task comprehension between human subjects in this study and animal subjects in previous studies offers an interesting point of comparison.

(3) The authors used a sophisticated method of multivariate pattern analysis to find the neural correlate of the pending representation of the previous choice, which will be used for the credit assignment process in the later trials. The authors tend to use expressions that these representations are maintained throughout this intervening period. However, the analysis period is specifically at the feedback period, which is irrelevant to the credit assignment of the immediately preceding choice. This task period can interfere with the ongoing credit assignment process. Thus, rather than the passive process of maintaining the information of the previous choice, the activity of this specific period can mean the active process of protecting the information from interfering and irrelevant information. It would be great if the authors could comment on this important interpretational issue.

(4) Broader neural involvement: While the focus on specific regions of interest (ROIs) provided clear results, future studies could benefit from a whole-brain analysis approach to provide a more comprehensive understanding of the neural networks involved in credit assignment.

Reviewer #2 (Public review):

Summary:

The present manuscript addresses a longstanding challenge in neuroscience: how the brain assigns credit for delayed outcomes, especially in real-world learning scenarios where decisions and outcomes are separated by time. The authors focus on the lateral orbitofrontal cortex and hippocampus, key regions involved in contingent learning. By integrating fMRI data and behavioral tasks, the authors examined how neural circuits maintain a causal link between past decisions and delayed outcomes. Their findings offer insights into mechanisms that could have critical implications for understanding human decision-making.

Strengths:

(1) The experimental designs were extremely well thought-out. The authors successfully coupled behavioral data and neural measures (through fMRI) to explore the neural mechanisms of contingent learning. This integration adds robustness to the findings and strengthens their relevance.

(2) The emphasis on the interaction between the lateral orbitofrontal cortex (lOFC) and hippocampus (HC) in this study is very well-targeted. The reported findings regarding their dynamic interactions provide valuable insights into contingent learning in humans.

(3) The use of an advanced modeling framework and analytical techniques allowed the authors to uncover new mechanistic insights regarding a complex case of the decision-making process. The methods developed will also benefit analyses of future neuroimaging data on a range of decision-making tasks as well.

Weaknesses:

Given the limited temporal resolution of fMRI and that the measured signal is an indirect measure of neural activity, it is unclear the extent to which the reported causality reflects the true relationship/interactions between neurons in different regions.

Reviewer #3 (Public review):

The authors apply multivoxel decoding analyses from fMRI during reward feedback about the cues previously chosen that led to that feedback. They compare two versions of the task - one in which the feedback is provided about the current trial, and one in which the feedback is provided about the previous trial. Reward probability changes slowly over time, so subjects need to identify which cues are leading to reward at a given time. They find that evidence for recall of the cue in the lateral orbitofrontal cortex (lOFC) and hippocampus (HC). They also find that in the second condition, where feedback is for the one-back trial, this representation is mediated by the lateral frontal pole (FPl).

Overall, the analyses are clean and elegant and seem to be complete. I have only a few comments.

(1) They do find (not surprisingly) that the one-back task is harder. It would be good to ensure that the reason that they had more trouble detecting direct HC & lOFC effects on the harder task was not because the task is harder and thus that there are more learning failures on the harder one-back task. (I suspect their explanation that it is mediated by FPl is likely to be correct. But it would be nice to do some subsampling of the zero-back task [matched to the success rate of the one-back task] to ensure that they still see the direct HC and lOFC there).

(2) The evidence that they present in the main text (Figure 3) that the HC and lOFC are mediated by FPl is a correlation. I found the evidence presented in Supplemental Figure 7 to be much more convincing. As I understand it, what they are showing in SF7 is that when FPl decodes the cue, then (and only then) HC and lOFC decode the cue. If my understanding is correct, then this is a much cleaner explanation for what is going on than the secondary correlation analysis. If my understanding here is incorrect, then they should provide a better explanation of what is going on so as to not confuse the reader.

(3) I like the idea of "credit spreading" across trials (Figure 1E). I think that credit spreading in each direction (into the past [lower left] and into the future [upper right]) is not equivalent. This can be seen in Figure 1D, where the two tasks show credit spreading differently. I think a lot more could be studied here. Does credit spreading in each of these directions decode in interesting ways in different places in the brain?

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation