A mechanistic theory of planning in prefrontal cortex

  1. Sainsbury Wellcome Centre, University College London, London, United Kingdom
  2. Oxford Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom
  3. Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Srdjan Ostojic
    École Normale Supérieure - PSL, Paris, France
  • Senior Editor
    Albert Cardona
    University of Cambridge, Cambridge, United Kingdom

Reviewer #1 (Public review):

Summary:

This work builds a theory to implement planning trajectories towards a goal in a known environment, inspired by analyses of prefrontal neural recordings. Unlike standard neural architectures for this task, such as value-based learning and successor representations, their proposed theory is able to adapt to novel goal locations within a trial. The key to the theory is that future times are represented by orthogonal groups of neurons. The recurrent connectivity between groups of neurons selective to specific future times and locations reflects the learned knowledge of the task. Finally, the authors show that standard networks trained on the task approximate their proposed theory.

Strengths:

The structure of the work is clear, and the presentation of the results is very well written, which is particularly noticeable given the consequential amount of results presented. The authors are able to link their theory with experimental findings in neural recordings. The reverse-engineering of trained recurrent neural networks is very thorough, by analyzing both dynamics and connectivity. The assumptions and predictions of their model are clearly stated.

Weaknesses:

It is unclear whether their proposed theory, "space-time attractors", actually is an attractor network. The authors used recurrent neural networks with very few timesteps, and long single neuron time constants with respect to the task time scales. Attractor networks, as the ones the authors cite, refer to networks that generate nontrivial patterns of activity through recurrent interactions, after long periods of time.

The authors gloss over how the reward inputs are calculated. Computing these reward inputs should be part of the planning process, and the authors are implicitly leaving this problem aside. How does the reward input, which includes future time and location, depend on the actions that have not yet been taken by the agent? It feels like most of the planning computation is already provided by these reward inputs at the beginning of the trial. It could be that the network is only learning to process the planned sequence of actions present in the inputs.

Reviewer #2 (Public review):

This well-written manuscript proposes to use attractors in space and time (STA) as a mechanistic explanation for planning in the prefrontal cortex. The main conceptual hypothesis is that planning is implemented as attractor dynamics in a representation that encodes states at each time step jointly. Depending on inputs, the network relaxes to a trajectory that already contains future states that will be visited at each time step, rather than computing a scalar value at each point in time and space like other classical approaches from RL. The authors compare this approach to implementations such as TD learning and successor representation, and further show that trained recurrent neural networks on specific tasks involving planning develop structured subspaces resembling the ones postulated in STA.

The idea of treating attracting trajectories unfolding in time as the computational substrate for planning is very interesting and potentially important. The explicit construction of a state x time representational space and its implementation via recurrent dynamics are appealing and convincing in the idealized tasks considered. I found the manuscript to be refreshingly explicit regarding several of the assumptions and limitations of the models, for example, the fact that certain advantages can be viewed as properties of the state space itself and not necessarily of a fundamentally new planning mechanism.

Overall, the manuscript presents a cool attractor model that extends in time and explores its performance in a subset of illustrative tasks involving planning. My doubts concern mostly the interpretation and scope of the claims made in the manuscript. Here are a few comments where I detail my questions/concerns:

(1) The authors nicely discuss that much of the difference between STA and classical TD or SR agents is "in some sense a property of the state space rather than the decision making algorithm," and that TD and SR could in principle be implemented in a comparable space x time representation. This is fair, but it also suggests that the central contribution of the manuscript lies primarily in the representational factorization (state x time tiling) and its dynamical implementation via attractors, rather than in a fundamentally new planning algorithm or theory, mechanistic or not. I think theory should be distinguished from mechanism, and it would therefore help the reader to describe the conceptual advancement more as a novel mechanism or implementation than a novel (mechanistic) theory for decision/planning.

(2) Related to my previous point, I think it would be helpful to position STA more explicitly relative to computational/theoretical literature in which attractor networks encode temporally ordered patterns (so effectively including future times). For example, classical extensions of Hopfield networks with asymmetric connectivity implement retrieval of sequences and ordered transitions between patterns (Sompolinsky & Kanter, 1986). More recently, sequential attractors and limit-cycle dynamics have been constructed in structured recurrent networks by the Morrison group (Parmelee et al., 2021). These works do not implement an explicit discretized state x future-time tiling as in STA and do not specifically discuss the usage for planning. However, they do provide concrete precedents for attractor dynamics over temporally structured trajectories in terms of mechanism. It would be useful to discuss this literature and clarify a little what's new mechanistically in the view of the authors.

(3) A central claim of the manuscript is that space-time trajectories are attractors of the STA dynamics. The manuscript does provide empirical evidence consistent with attractor-like behavior. However, it is not explicitly shown whether trajectory representations persist in the absence of sustained external inputs. So it's not clear to me whether the trajectories should be interpreted as intrinsic attractors of the recurrent system, which can be selected by delivering transient inputs, or whether they must be stabilized by a specific continuous external drive. It would be useful if the author could clarify/discuss this point.

As far as I understand it, reward information is provided as input to specific populations encoding future time steps, and that's essential for rapid adaptation without rewiring connectivity. How such future-time-specific reward inputs would be generated and routed to distinct neural populations isn't entirely clear to me. Since this seems to be an essential component of the model, I think it would be important to discuss more deeply the source and plausibility of these reward signals related to different timesteps.

(4) The authors note that vanilla STA scales linearly with planning horizon, and discuss potentially hierarchical extensions for longer horizons. They acknowledge that learning abstractions remains an open challenge, yet the examples of planning in the manuscript are restricted to very short temporal horizons and limited branching complexity. It is not obvious to me in what cases the current implementation and interpretation of STA remains viable (for example, in terms of relaxation iterations) as the horizon and branching factor increase. Relatively simple planning can be managed by simpler, less costly models/algorithms, whereas complex planning is a lot harder to deal with, and it's something that a mechanistic "theory" should address. In the context of the claims of the paper in its present form, I think this is possibly the most important conceptual and practical limitation in the manuscript.

(5) The RNN analyses show that trained networks develop structured subspaces aligned with future time indices and exhibit perturbation behavior consistent with attractor-like dynamics. The manuscript also explicitly notes differences between the trained RNN and the handcrafted STA (e.g., long-range couplings between subspaces and differences in behavior of lower-value trajectories under perturbation), which I much appreciated. My doubt is on the specificity of this result, as trained RNNs on fixed-horizon tasks can develop latent dimensions correlated with temporal progress within a trial or time-to-goal. I think it would help the reader to clarify whether the results demonstrate that STA-like computations emerge in RNNs trained on planning tasks, or that RNNs generally develop some kind of structured spacetime representations when tasks involve future timesteps and some degree of flexibility in the decisions.

A few more minor points, mainly concerning clarity:

(1) The main dynamical equation combines a log-domain recurrent term, a floor operation, and a log-sum-exp normalization step, followed by exponentiation. The intuition/logic behind this specific formulation could be clarified for the reader. For example tt would be helpful to explain why the recurrent input appears inside a log, and also whether/how these operations relate to any multiplicative constraint.

(2) While the computational cost of successor representation in an expanded NT x NT representation is discussed, the corresponding scaling of STA in terms of number of units and connections (as a function, for example, of the planning horizon) isn't clear to me. Perhaps the authors could compare costs more explicitly.

(3) In the RNN analyses, structured subspaces aligned with future time indices are shown. I couldn't find a quantification of how much variance is captured by the subspaces, relative to other latent dimensions. Adding it would help get a feeling for the strength of the alignment.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation