Endotaxis: A neuromorphic algorithm for mapping, goal-learning, navigation, and patrolling
eLife assessment
This valuable work proposes a framework inspired by chemotaxis for understanding how the brain might implement behaviors related to navigating toward a goal. The evidence supporting the conceptual claim is convincing. The article proposes a hypothesis that would be of interest to the broad systems neuroscience community, although it was noted the relationship to existing similar hypotheses could be clarified.
https://doi.org/10.7554/eLife.84141.3.sa0Valuable: Findings that have theoretical or practical implications for a subfield
- Landmark
- Fundamental
- Important
- Valuable
- Useful
Convincing: Appropriate and validated methodology in line with current state-of-the-art
- Exceptional
- Compelling
- Convincing
- Solid
- Incomplete
- Inadequate
During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments
Abstract
An animal entering a new environment typically faces three challenges: explore the space for resources, memorize their locations, and navigate towards those targets as needed. Here we propose a neural algorithm that can solve all these problems and operates reliably in diverse and complex environments. At its core, the mechanism makes use of a behavioral module common to all motile animals, namely the ability to follow an odor to its source. We show how the brain can learn to generate internal “virtual odors” that guide the animal to any location of interest. This endotaxis algorithm can be implemented with a simple 3-layer neural circuit using only biologically realistic structures and learning rules. Several neural components of this scheme are found in brains from insects to humans. Nature may have evolved a general mechanism for search and navigation on the ancient backbone of chemotaxis.
Introduction
Animals navigate their environment to look for resources – such as shelter, food, or a mate – and exploit such resources once they are found. Efficient navigation requires knowing the structure of the environment: which locations are connected to which others (Tolman, 1948). One would like to understand how the brain acquires that knowledge, what neural representation it adopts for the resulting map, how it tags significant locations in that map, and how that knowledge gets read out for decision-making during navigation.
Experimental work on these topics has mostly focused on simple environments – such as an open arena (Wilson and McNaughton, 1993), a pond (Morris et al., 1982), or a desert (Müller and Wehner, 1988) – and much has been learned about neural signals in diverse brain areas under these conditions (Sosa and Giocomo, 2021; Collett and Collett, 2002). However, many natural environments are highly structured, such as a system of burrows, or of intersecting paths through the underbrush. Similarly, for many cognitive tasks, a sequence of simple actions can give rise to complex solutions.
One algorithm for finding a valuable resource is common to all animals: chemotaxis. Every motile species has a way to track odors through the environment, either to find the source of the odor or to avoid it (Baker et al., 2018). This ability is central to finding food, connecting with a mate, and avoiding predators. It is believed that brains originally evolved to organize the motor response in pursuit of chemical stimuli. Indeed, some of the oldest regions of the mammalian brain, including the hippocampus, seem organized around an axis that processes smells (Jacobs, 2012; Aboitiz and Montiel, 2015).
The specifics of chemotaxis, namely the methods for finding an odor and tracking it, vary by species, but the toolkit always includes a search strategy based on trial-and-error: try various actions that you have available, then settle on the one that makes the odor stronger (Baker et al., 2018). For example, a rodent will weave its head side-to-side, sampling the local odor gradient, then move in the direction where the smell is stronger. Worms and maggots follow the same strategy. Dogs track a ground-borne odor trail by casting across it side-to-side. Flying insects perform similar casting flights. Bacteria randomly change direction every now and then, and continue straight as long as the odor improves (Berg, 1988). We propose that this universal behavioral module for chemotaxis can be harnessed to solve general problems of search and navigation in a complex environment, even when tell-tale odors are not available.
For concreteness, consider a mouse exploring a labyrinth of tunnels (Figure 1A). The maze may contain a source of food that emits an odor (Figure 1A1). That odor will be strongest at the source and decline with distance along the tunnels of the maze. The mouse can navigate to the food location by simply following the odor gradient uphill. Suppose that the mouse discovers some other interesting locations that do not emit a smell, like a source of water, or the exit from the labyrinth (Figures 1A2—3). It would be convenient if the mouse could tag such a location with an odorous material, so it may be found easily on future occasions. Ideally, the mouse would carry with it multiple such odor tags, so it can mark different targets each with its specific recognizable odor.
Here we show that such tagging does not need to be physical. Instead, we propose a mechanism by which the mouse’s brain may compute a ‘virtual odor’ signal that declines with distance from a chosen target. That neural signal can be made available to the chemotaxis module as though it were a real odor, enabling navigation up the gradient toward the target. Because this goal signal is computed in the brain rather than sensed externally, we call this hypothetical process endotaxis.
The developments reported here were inspired by a recent experimental study with mice navigating a complex labyrinth (Rosenberg et al., 2021) that includes 63 three-way junctions. Among other things, we observed that mice could learn the location of a resource in the labyrinth after encountering it just once, and perfect a direct route to that target location after encounters. Furthermore, they could navigate back out of the labyrinth using a direct route they had not traveled before, even on the first attempt. Finally, the animals spent most of their waking time patrolling the labyrinth, even long after they had perfected the routes to rewarding locations. These patrols covered the environment efficiently, avoiding repeat visits to the same location. All this happened within a few hours of the animal’s first encounter with the labyrinth. Our modeling efforts here are aimed at explaining these remarkable phenomena of rapid spatial learning in a new environment: one-shot learning of a goal location, zero-shot learning of a return route, and efficient patrolling of a complex maze. In particular we want to do so with a biologically plausible mechanism that could be built out of neurons.
Results
A neural circuit to implement endotaxis
Figure 1B presents a neural circuit model that implements three goals: mapping the connectivity of the environment; tagging of goal locations with a virtual odor; and navigation toward those goals. The model includes four types of neurons: resource cells, point cells, map cells, and goal cells.
Resource cells
These are sensory neurons that fire when the animal encounters an interesting resource, for example, water or food, that may form a target for future navigation. Each resource cell is selective for a specific kind of stimulus. The circuitry that produces these responses is not part of the model.
Point cells
This layer of cells represents the animal’s location. (We avoid the term ‘place cell’ here because [1] that term has a technical meaning in the rodent hippocampus, whereas the arguments here extend to species that do not have a hippocampus; and [2] all the cells in this network have a place field, but it is smallest for the point cells.) Each neuron in this population has a small response field within the environment. The neuron fires when the animal enters that response field. We assume that these point cells exist from the outset as soon as the animal enters the environment. Each cell’s response field is defined by some conjunction of external and internal sensory signals at that location.
Map cells
This layer of neurons learns the structure of the environment, namely how the various locations are connected in space. The map cells get excitatory input from point cells in a one-to-one fashion. These input synapses are static. The map cells also excite each other with all-to-all connections. These recurrent synapses are modifiable according to a local plasticity rule. After learning, they represent the topology of the environment.
Goal cells
Each goal cell serves to mark the locations of a special resource in the map of the environment. The goal cell receives excitatory input from a resource cell, which gets activated whenever that resource is present. It also receives excitatory synapses from map cells. Such a synapse is strengthened when the presynaptic map cell is active at the same time as the resource cell.
After the map and goal synapses have been learned, each goal cell carries a virtual odor signal for its assigned resource. The signal increases systematically as the animal moves closer to a location with that resource. A mode switch selects one among many possible virtual odors (or real odors) to be routed to the chemotaxis module for odor tracking. (The mode switch effectively determines the animal’s behavioral policy. In this report, we do not consider how or why the animal chooses one mode or another.) The animal then pursues its chemotaxis search strategy to maximize that odor, which leads it to the selected tagged location.
Why does the circuit work?
The key insight is that the output of the goal cell declines systematically with the distance of the animal from the target location. This relationship holds even if the environment is constrained with a complex connectivity graph (Figure 1A4). Here we explain how this comes about, with mathematical details to follow.
In a first phase, the animal explores the environment while the circuit builds a map. When the animal moves from one location to an adjacent one, those two point cells fire in rapid succession. That leads to a Hebbian strengthening of the excitatory synapses between the two corresponding map cells (Figure 2A and B). In this way, the recurrent network of map cells learns the connectivity of the graph that describes the environment. To a first approximation, the matrix of synaptic connections among the map cells will converge to the correlation matrix of their inputs (Dayan and Abbott, 2001; Galtier et al., 2012), which in turn reflects the adjacency matrix of the graph (Equation 1). Now the brain can use this adjacency information to find the shortest path to a target.
After this map learning, the output of the map network is a hump of activity, centered on the current location of the animal and declining with distance along the various paths in the graph of the environment (Figure 2C). If the animal moves to a different location , the map output will change to another hump of activity, now centered on (Figure 2D). The overlap of the two hump-shaped profiles will be large if nodes and are close on the graph, and small if they are distant. Fundamentally the endotaxis network computes that overlap.
Suppose the animal visits and finds water there. Then the water resource cell fires, triggering synaptic learning in the goal synapses. That stores the current profile of map activity in the synapses onto the goal cell that responds to water (Figure 2D, Equation 9). When the animal subsequently moves to a different location , the goal cell receives the current map output filtered through the previously stored synaptic template (Figure 2E). This is the desired measure of overlap (Equation 10). Under suitable conditions, this goal signal declines monotonically with the shortest graph distance between and , as we will demonstrate both analytically and in simulations (sections ‘Theory of endotaxis’ and ‘Acquisition of map and targets during exploration’).
Theory of endotaxis
Here we formalize the processes of Figure 2 in a concrete mathematical model. The model is simple enough to allow some exact predictions for its behavior. The present section develops an analytical understanding of endotaxis that will help guide the numerical simulations in subsequent parts.
The environment is modeled as a graph consisting of nodes, with adjacency matrix
We suppose the graph is undirected, meaning that every link can be traversed in both directions,
Movements of the agent are modeled as a sequence of steps along that graph. During exploration, the agent performs a walk that tries to cover the entire environment. In the process, it learns the adjacency matrix . During navigation, the agent uses that knowledge to travel to a known target.
For an agent navigating a graph, it is very useful to know the shortest graph distance between any two nodes
Given this information, one can navigate the shortest route from to : for each of the neighbors of , look up its distance to and step to the neighbor with the shortest distance. Then repeat that process until is reached. Thus, the shortest route can be navigated one step at a time without any high-level advanced planning. This is the core idea behind endotaxis.
The network of Figure 1B effectively computes the shortest graph distances. We implement the circuit as a textbook linear rate model (Dayan and Abbott, 2001). Each map unit has a synaptic input that it converts to an output ,
where is the gain of the units. The input consists of an external signal summed with a recurrent feedback through a connection matrix
where is the synaptic strength from unit to .
The point neurons are one-hot encoders of location. A point neuron fires if the agent is at that location; all the others are silent:
where is the Kronecker delta.
So the vector of all map outputs is
where is the one-hot input from point cells.
Now consider goal cell number that is associated to a particular location because its resource is present at that node. The goal cell sums input from all the map units , weighted by its goal synapses . So with the agent at node , the goal signal is
where we write for the kth row vector of the goal synapse matrix . This is the set of synapses from all map cells onto the specific goal cell in question.
Suppose now that the agent has learned the structure of the environment perfectly, such that the map synapses are a copy of the graph’s adjacency matrix (1),
Similarly, suppose that the agent has acquired the goal synapses perfectly, namely proportional to the map output at the goal location :
Then as the agent moves to another location , the goal cell reports a signal
where the matrix
It has been shown (Meister, 2023) that for small values of the elements of the resolvent matrix
are monotonically related to the shortest graph distances . Specifically,
Building on that, the matrix becomes
The limit is dominated by the term with the smallest exponent, which occurs when lies on a shortest path from to
where we have used the undirected nature of the graph, namely .
Therefore,
where is the smallest number of steps needed to get from node to node .
Figure 3 illustrates this relationship with numerical results on a binary tree graph. As expected, for small the goal signal decays exponentially with graph distance (Figure 3B). Therefore, an agent that makes local turning decisions to maximize that goal signal will reach the goal by the shortest possible path.
The exponential decay of the goal signal represents a challenge for practical implementation with biological circuits. Neurons have a finite signal-to-noise ratio, so detecting minute differences in the firing rate of a goal neuron will be unreliable. Because the goal signal changes by a factor of across every link in the graph, one wants to set the map neuron gain as large as possible. However, there is a critical gain value that sets a strict upper limit:
For larger , the goal signal no longer represents graph distances (Meister, 2023). The largest eigenvalue of the adjacency matrix in turn is related to the number of edges per node. For graphs with 2–4 edges per node, is typically about 0.3. The graph in Figure 3A has , and indeed becomes erratic as approaches that value (Figure 3C).
To implement the finite dynamic range explicitly, we add some noise to the goal signal of Equation 11:
where the noise has a Gaussian distribution with full width :
The scale of this noise is expressed relative to the maximum value of the goal signal. If the agent must decide between two goal signals separated by less than , the noise will take a toll on the resulting navigation performance.
Of course, neurons everywhere within the network will carry some noise. We lump the cumulative effects of that into the final readout step because that allows for efficient calculations (see section ‘Average navigated distance’). (In the circuit of Figure 1B, one can envision that the readout noise gets added after the mode switch.) What is a reasonable value for this effective readout noise? For reference, humans and animals can routinely discriminate sensory stimuli that differ by only 1%, for example, the pitch of tones or the intensity of a light, especially if they occur in close succession. Clearly the neurons all the way from receptors to perception must represent those small differences. Thus, we will use as a reference noise value in many of the results presented here.
The process of navigation toward a chosen goal signal is formalized in Algorithm 1. At each node, the agent inspects the goal signal that would be obtained at all the neighboring nodes, corrupted by the readout noise . Then it steps to the neighbor with the highest value. Suppose the agent starts at node and navigates following the goal signal for node . The resulting navigation route has steps. Navigation is perfect if this equals the shortest graph distance, . We will assess deviations from perfect performance by the excess length of the routes.
Algorithm 1 Navigation. |
Parameters: gain , noise Input: map synapse matrix , goal synapse vector start navigation at node while not at goal do stop when goal node is found for all nodes that neighbor do for every point cell point cell output with agent at node map output noisy goal signal, end for choose the neighbor node with the highest goal signal end while |
Figure 3D and E illustrate how the navigated path distance depends on the noise level and the gain . For small gain or high noise, the goal signal extends only over a graph distance of 5–6 links. Beyond that, the navigated distance begins to exceed the graph distance . As the gain increases, the goal signal extends further through the graph and navigation becomes reliable over longer distances (Figure 3D). Eventually, however, the goal signal loses its monotonic distance dependence (Figure 3C). At that stage, navigation across the graph may fail because the agent gets trapped in a local maximum of the goal signal. This can happen even before the critical gain value is reached (Figure 3C). For the example in Figure 3, the highest useful gain is whereas .
For any given value of the gain, navigation improves with lower noise levels, as expected (Figure 3E). At the reference value of , navigation is perfect even across the 12 links that separate the most distant points on this graph.
In summary, this analysis spells out the challenges that need to be met for endotaxis to work properly. First, during the learning phase, the agent must reliably extract the adjacency matrix of the graph and copy it into its map synapses. Second, during the navigation phase, the agent must evaluate the goal signal with enough resolution to distinguish the values at alternative nodes. The neuronal gain plays a central role: with too small, the goal signal decays rapidly with distance and vanishes into the noise just a few steps away from the goal. But at large the network computation becomes unstable.
Acquisition of map and targets during exploration
As discussed above, the goal of learning during exploration is that the agent acquires a copy of the graph’s adjacency matrix in its map synapses, , and stores the map output at a goal location in the goal synapses . Here we explore how the rules for synaptic plasticity in the map and goal networks allow that to happen. Algorithm 2 spells out the procedure we implemented for learning from a random walk through the environment.
Algorithm 2 Map and goal-learning. |
Parameters: Input: adjacency matrix , resource signals initiate map synapses at 0 initiate goal synapses at 0 counts the steps start random walk at while learning do a random neighbor of continue the random walk for every point cell point cell output map cell output for all map cell pairs do if and then threshold on pre- and post-synaptic activity on a directed graph only increment end if end for goal signals for every goal neuron do if then the agent is at a location that contains resource for every map neuron do update goal synapses end for end if end for end while |
The map synapses start out at zero strength. When the agent moves from node at time to node , the map cells and are excited in close succession. When that happens, the agent potentiates the synapses between those two neurons to . Of course, a map cell can also get activated through the recurrent network, and we must distinguish that from direct input from its point cell. We found that a simple threshold criterion is sufficient. Here is a threshold applied to both the pre- and postsynaptic activity, and the map synapse gets established only if both neurons respond above threshold. The tuning requirements for this threshold are discussed below.
The goal synapses similarly start out at zero strength. Consider a particular goal cell , and suppose its corresponding resource cell has activity when the agent is at location . When a positive resource signal arrives, that means the agent is at a goal location. If the goal signal received from the map output is smaller than the resource signal , then the goal synapses get incremented by something proportional to the current map output. Learning at the goal synapses saturates when the goal signal correctly predicts the resource signal. The learning rate sets how fast that will happen. Note that both the learning rules for map and goal synapses are Hebbian and strictly local: each synapse is modified based only on signals available in the pre- and postsynaptic neurons.
To illustrate the process of map and goal-learning, we simulate an agent exploring a simple ring graph by a random walk (Figure 4). At first, there are no targets in the environment that can deliver a resource (Figure 4A). Then we add one target location, and later a second one. Finally, we add a new link to the graph that makes a connection clear across the environment. As the agent explores the graph, we will track how its representations evolve by monitoring the map synapses and the profile of the goal signal.
At the outset, every time the agent steps to a new node, the map synapse corresponding to that link gets potentiated (Figure 4B). After enough steps, the agent has executed every link on the graph, and the matrix of map synapses resembles the full adjacency matrix of the graph (Figure 4B). At this stage, the agent has learned the connectivity of the environment.
Once a target appears in the environment, it takes the agent a few random steps to encounter it. At that moment, the goal synapses get potentiated for the first time, and suddenly a goal signal appears in the goal cell (Figure 4C). The profile of that goal signal is fully formed and spreads through the entire graph thanks to the pre-established map network. By following this goal signal uphill, the agent can navigate along the shortest path to the target from any node on the graph. Note that the absolute scale of the goal signal grows a little every time the agent visits the goal (Figure 4A) and eventually saturates.
Sometime later, we introduce a second target elsewhere in the environment (Figure 4D). When the agent encounters it along its random walk, the goal synapses get updated, and the new goal signal has two peaks in its profile. Again, this goal signal grows during subsequent visits. By following that signal uphill from any starting point, the agent will be led to a nearby target by the shortest possible path.
When a new link appears, the agent eventually discovers it on its random walk. At that point, the goal signal changes instantaneously to incorporate the new route (Figure 4E). An agent following the new goal signal from node 13 on the ring will now be led to a target location in just three steps, using the shortcut, whereas previously it took five steps.
This simulation illustrates how the structure of the environment is acquired separately from the location of resources. The agent can explore and learn the map of the environment even without any resources present (Figure 4B). This learning takes place among the map synapses in the endotaxis circuit (Figure 1B). When a resource is found, its location gets tagged within that established map through learning by the goal synapses. The resulting goal signal is available immediately without the need for further learning (Figure 4C). If the distribution of resources changes, the knowledge in the map remains unaffected (Figure 4D) but the goal synapses can change quickly to incorporate the new target. Vice versa, if the graph of the environment changes, the map synapses get updated, and that adapts the goal signal to the new situation even without further change in the goal synapses (Figure 1E).
What happens if a previously existing link disappears from the environment, for example, because one corridor of the mouse burrow caves in? Ideally the agent would erase that link from the cognitive map. The learning algorithm (Algorithm 2) is designed for rapid and robust acquisition of a cognitive map starting from zero knowledge and does not contain a provision for forgetting. However, one can add a biologically plausible rule for synaptic depression that gradually erases memory of a link if the agent never travels it. Details are presented in section ‘Forgetting of links and resources’ (Figure 10). For the sake of simplicity, we continue the present analysis of endotaxis based on the simple three-parameter algorithm presented above (Algorithm 2).
Choice of learning rule
The map learning rule in Algorithm 2 produces full-strength synapses and after a single co-activation of the two neurons. A more common approach to synaptic learning uses small incremental updates and stabilizes the update rule with some form of normalization, based on the average pre- or postsynaptic activity over many steps (Gerstner and Kistler, 2002). For example, presynaptic normalization leads the synaptic network to learn a transition probability matrix (Fang et al., 2023)
Instead, we adopted the instantaneous update model for two reasons: most importantly, this allows the agent to learn a route after the first traversal, which is needed to explain the rapid learning observed in experimental animals. For example, section ‘Navigating a partial map: homing behavior’ models accurate homing after the first excursion into the labyrinth. Furthermore, when we repeated the analysis of Figure 3 using the transition matrix instead of the adjacency matrix , the goal signal correlated more weakly with distance, and even with the optimal gain setting the range of correct navigation was considerably reduced.
This rapid learning rule reflects an implicit assumption that the environment is static, such that the learned transition will always be available. For adaptation to slow changes in the environment, see section ‘Forgetting of links and resources.’ Note also that the above procedure Algorithm 2 updates both synapses between neurons and . This assumes implicitly that the experienced edge on the graph can also be traversed in the opposite direction, which applies to many navigation problems. To learn a directed environment – such as a city map with one-way streets or a game in which moves cannot be reversed – one may use a directed learning rule that requires the presynaptic neuron to fire before the postsynaptic neuron. This will update only the synapse representing the edge that was actually traveled. For all simulations in this article, we will use the symmetric learning rule.
Navigation using the learned goal signal
We now turn to the ‘exploitation’ component of endotaxis, namely use of the learned information to navigate toward targets. In the simulations of Figure 5, we allow the agent to explore a graph. Every node on the graph drives a separate resource cell, thus the agent simultaneously learns goal signals to every node. After a random walk sufficient to cover the graph several times, we test the agent’s ability to navigate to the goals by ascending on the learned goal signal. For that purpose, we teleport the agent to an arbitrary start node in the graph and ask how many steps it takes to reach the goal node following the policy of Algorithm 1. In these tests, the learning of map and goal synapses was turned off during the navigation phase, so we could separately assess how learning and navigating affect the performance. However, there is no functional requirement for this, and indeed one of the attractive features of this model is that learning and navigation can proceed in parallel at all times.
Figure 5A–C shows results on a ring graph with 50 nodes. With suitable values of the model parameters – more on that later – the agent learns a goal signal that declines monotonically with distance from the target node (Figure 5A). The ability to ascend on that goal signal depends on the noise level , which determines whether the agent can sense the difference in goal signal at neighboring nodes. At a high noise level , the agent finds the target by the shortest route from up to five links away (Figure 5B); beyond that range, some navigation errors creep in. At a low noise level of , navigation is perfect up to 10 links away. Every factor of two increase in noise seems to reduce the range of navigation by about one link.
How does the process of learning the map of the environment affect the ultimate navigation performance? Figure 5C makes that comparison by considering an agent with oracular knowledge of the graph structure and target location (Equations 9 and 10). Interestingly, this barely improves the distance range for perfect navigation. By contrast, an agent performing a random walk with zero knowledge of the environment would take about 40 times longer to reach the target than by using endotaxis (Figure 5C).
The ring graph is particularly simple, but how well does endotaxis learn in a more realistic environment? Figure 5D–F shows results on a binary tree graph with six levels: this is the structure of a maze used in a recent study on mouse navigation (Rosenberg et al., 2021). In those experiments, mice learned quickly how to reach the reward location (blue dot in Figure 5D) from anywhere within the maze. Indeed, the endotaxis agent can learn a goal signal that declines monotonically with distance from the reward port (Figure 5D). At a noise level of , navigation is perfect over distances of 9 links and close to perfect over the maximal distance of 12 links that occurs in this maze (Figure 5E). Again, the challenge of having to learn the map affects the performance only slightly (Figure 5F). Finally, comparison with the random agent shows that endotaxis shortens the time to target by a factor of 100 on this graph (Figure 5F).
Figure 5G–I shows results for a more complex graph that represents a cognitive task, namely the game ‘Tower of Hanoi.’ Disks of different sizes are stacked on three pegs, with the constraint that no disk can rest on top a smaller one. The game is solved by rearranging the pile of disks from the center peg to another. In any state of the game, there are either two or three possible actions, and they form an interesting graph with many loops (Figure 5G). The player starts at the top node (all disks on the center peg) and the two possible solutions correspond to the bottom left and right corners. Again, random exploration leads the endotaxis agent to learn the connectivity of the game and to discover the solutions. The resulting goal signal decays systematically with graph distance from the solution (Figure 5G). At a noise of , navigation is perfect once the agent gets to within nine moves of the target (Figure 5H). This is not quite sufficient for an error-free solution from the starting position, which requires 15 moves. However, compared to an agent executing random moves, endotaxis speeds up the solution by a factor of 10 (Figure 5I). If the game is played with only three disks, the maximal graph distance is 7, and endotaxis solves it perfectly at .
These results show that endotaxis functions well in environments with very different structure: linear, tree-shaped, and cyclic. Random exploration in conjunction with synaptic learning can efficiently acquire the connectivity of the environment and the location of targets. With a noise level of 1%, the resulting goal signal allows perfect navigation over distances of ∼9 steps, independent of the nature of the graph. This is a respectable range: personal experience suggests that we rarely learn routes that involve more than nine successive decisions. Chess openings, which are often played in a fast and reflexive fashion, last about 10 moves.
Parameter sensitivity
The endotaxis model has only three parameters: the gain of map units, the threshold for learning at map synapses, and the learning rate at goal synapses. How does performance depend on these parameters? Do they need to be tuned precisely? And does the optimal tuning depend on the spatial environment? There is a natural hierarchy to the parameters if one separates the process of learning from that of navigation. Suppose the circuit has learned the structure of the environment perfectly, such that the map synapses reflect the adjacencies (Equation 9), and the goal synapses reflect the map output at the goal (Equation 10). Then the optimal navigation performance of the endotaxis system depends only on the gain and the noise level . For a given , in turn, the precision of map learning depends only on the threshold (see Algorithm 2). Finally, if the gain is set optimally and the map was learned properly, the identification of targets depends only on the goal-learning rate . Figure 6 explores these relationships in turn.
We simulated the learning phase of endotaxis as in the preceding section (Figure 5B, E and H), using a noise level of , and systematically varying the model parameters . For each parameter set, we measured the graph distance over which at least half of the navigated routes were perfect. We defined this distance as the range of the goal signal.
For example, on the ring graph (Figure 6A) the signal range improves with gain until performance collapses beyond a maximal gain value. This is just as predicted by the theory (Figure 3), except that the maximal gain is somewhat below the critical value . Clearly the added complications of having to learn the map and goal locations take their toll at high gain. Below the maximal cutoff, the dependence of performance on gain is rather gentle: for example, a 14% change in gain from 0.35 to 0.40 leads to a 26% change in performance. At any given gain value, there is a range of values for the threshold within which the map is learned perfectly. Note that this range is generous and does not require precise adjustment: for example, under a near-maximal gain of 0.38, the threshold can vary freely over a 35% range.
Once the gain and synaptic threshold are set so as to acquire the map synapses, the quality of goal-learning depends only on the learning rate . With large , a single visit to the goal fully potentiates the goal synapses so that they do not get updated further. This allows for a fast acquisition of that target, but at the risk of imperfect learning, because the map may not be fully explored yet. A small will update the synapses only partially over many successive visits to the goal. This leads to a poor performance after short exploration, because the weak goal signal competes with noise, but superior performance after long explorations: a tradeoff between speed of learning and accuracy. Precisely this speed-accuracy tradeoff is seen in the simulations (Figure 6A, right): a high learning rate is optimal for short explorations, but for longer ones a small learning rate wins out. An intermediate value of delivers a good compromise performance.
We found qualitatively similar behavior for the other two environments studied here: the binary maze graph (Figure 6B) and the Tower of Hanoi graph (Figure 6C). In each case, the maximal usable gain is slightly below the critical value of that graph. A learning rate of delivers intermediate results. For long explorations, a lower learning rate is best.
In summary, this sensitivity analysis shows that the optimal parameter set for endotaxis does depend on the environment. This is not altogether surprising: every neural network needs to adapt to the distribution of inputs it receives so as to perform optimally. At the same time, the required tuning is rather generous, allowing at least 10–20% slop in the parameters for reasonable performance. Furthermore, a single parameter set of performs quite well on both the binary maze and the Tower of Hanoi graphs, which are dramatically different in character.
A saturating activation function improves navigation
So far, the model of the map network used neurons with a linear activation function (Equation 3), meaning the output is simply proportional to the input, . We also explored nonlinear activation functions and found that the performance of endotaxis improves under certain conditions (Fang et al., 2023). The most important feature is that should saturate for inputs that are larger than the output of the point cells ( in Equation 4). The detailed shape matters little, so for illustration we will use a linear-flat activation curve (Figure 7A):
Figure 7B–D reports the range of navigation on the three sample graphs, defined and computed from simulations as in the preceding section (Figure 6). The effective range is the largest graph distance over which the median trajectory chooses the shortest route. As observed using linear map neurons (Figure 6), the range increases with the gain until it collapses beyond some maximal value (Figure 7B–D). However, the saturating activation function allowed for higher gain values, which led to considerable increases in the range of navigation: by a factor of 2.2 for the ring graph, and 1.5 for the Tower of Hanoi graph. On the binary maze, the saturating activation function allowed perfect navigation over the maximal distance available of 12 steps.
The enhanced performance was a result of better map learning as well as better navigation. To understand the former, consider Figure 7E: here the agent has begun to learn the ring graph by walking back and forth between a few nodes (2–5), thus establishing all their pairwise map synapses; then it steps to a new node (6). With a linear activation function (Figure 7E, left), the recurrent synapses enhance the map output, so the map signal with the agent in the explored region (2–5) is considerably larger than after stepping to the new node. This interferes with the mechanism for map learning: the learning rule must identify which of the map cells represents the current location of the agent, and does so by setting a threshold on the output signal (Algorithm 2). In the present example, this leads to erroneous synapses because a map cell that receives only recurrent input (4) produces outputs larger than the threshold (arrowhead in Figure 7E). With the saturating activation function (Figure 7E, right), the directly activated map cells always have the largest output signal, so the learning rule can operate without errors.
The saturating activation function also helps after learning is complete. In Figure 7F, the agent is given perfect knowledge of the binary maze map, then asked to use the resulting goal signals to navigate from one end node to another. With a linear activation function, the goal signal has a large local maximum that traps the agent. The nonlinear activation function produces a monotonic goal signal that leads the agent to the target.
Both these aspects of enhanced performance can be traced to the normalizing effect of the nonlinearity that keeps the peak output of the map constant. Such normalization could be performed by other mechanisms as well, for example, a global inhibitory feedback among the map neurons.
In summary, this section shows that altering details of the model can substantially extend its performance. For the remainder of this article, we will return to the linear activation curve because interesting behavioral phenomena can be observed even with the simple linear model.
Navigating a partial map: Homing behavior
We have seen that endotaxis can learn both connections in the environment and the locations of targets after just one visit (Figure 6.) This suggests that the agent can navigate well on whatever portion of the environment it has already seen, before covering it exhaustively. To illustrate this, we analyze an ethologically relevant instance.
Consider a mouse that enters an unfamiliar environment for the first time, such as a labyrinth constructed by graduate students (Rosenberg et al., 2021). Given the uncertainties about what lurks inside, the mouse needs to retain the ability to flee back to the entrance as fast as possible. For concreteness, take the mouse trajectory in Figure 8A. The animal has entered the labyrinth (location 1), made its way to one of the end nodes (3), then explored further to another end node (4). Suppose it needs to return to the entrance now. One way would be to retrace all its steps. But the shorter way is to take a left at (2) and cut out the unnecessary branch to (3). Experimentally we found that mice indeed take the short direct route instead of retracing their path (Rosenberg et al., 2021). They can do so even on the very first visit of an unfamiliar labyrinth. Can endotaxis explain this behavior?
We assume that the entrance is a salient location, so the agent dedicates a goal cell to the root node of the binary tree. Figure 8B plots the goal signal after the path in panel A, just as the agent wants to return home. The goal signal is nonzero only at the locations the agent has visited along its path. It clearly increases monotonically toward the entrance (Figure 8C). At a noise level of , the agent can navigate to the entrance by the shortest path without error. Note specifically that the agent does not retrace its steps when arriving at location (2), but instead turns toward (1).
One unusual aspect of homing is that the goal is identified first, before the agent has even entered the environment to explore it. That strengthens the goal synapse from the sole map cell that is active at the entrance. Only subsequently does the agent build up map synapses that allow the goal signal to spread throughout the map network. Still, in this situation, the single synapse onto the goal cell is sufficient to convey a robust signal for homing.
Efficient patrolling
Beside exploring and exploiting, a third mode of navigating the environment is patrolling. At this stage, the animal knows the lay of the land, and has perhaps discovered some special locations, but continues to patrol the environment for new opportunities or threats. In our study of mice freely interacting with a large labyrinth, the animals spent more than 85% of the time patrolling the maze (Rosenberg et al., 2021). This continued for hours after they had perfected the targeting of reward locations and the homing back to the entrance. Presumably, the goal of patrolling is to cover the entire environment frequently and efficiently so as to spot any changes as soon as they develop. So the ideal path in patrolling would visit every node on the graph in the smallest number of steps possible. In the binary tree maze used for our experiments, that optimal patrol path takes 252 steps: it visits every end node of the labyrinth exactly once without any repeats (Figure 9A).
Real mice do not quite execute this optimal path, but their patrolling behavior is much more efficient than random (Figure 9B). They avoid revisiting areas they have seen recently. Could endotaxis implement such an efficient patrol of the environment? The task is to steer the agent to locations that have not been visited recently. One can formalize this by imagining a resource called ‘neglect’ distributed throughout the environment. At each location, neglect increases with time, then resets to zero the moment the agent visits there. To use this in endotaxis, one needs a goal cell that represents neglect.
We add to the core model a goal cell that represents ‘neglect.’ It receives excitation from every map cell via synapses that are equal and constant in strength (see clock symbol in Figure 1B). This produces a goal signal that is approximately constant everywhere in the environment. Now suppose that the point neurons undergo a form of habituation: when a point cell fires because the agent walks through its field, its sensitivity decreases by some habituation factor. That habituation then decays over time until the point cell recovers its original sensitivity. As a result, the most recently visited points on the graph produce a smaller goal signal. Endotaxis based on this goal signal will therefore lead the agent to the areas most in need of a visit.
Figure 9B illustrates that this is a powerful way to implement efficient patrols. Here we modeled endotaxis on the binary tree labyrinth, using the standard parameters useful for exploration, exploitation, and homing in previous sections. To this, we added a habituation in the point cells with exponential recovery dynamics. Formally, the procedure is defined by Algorithm 3. Again, we turned off the learning rules (Algorithm 2) during this simulation to observe the effects of habituation in isolation. A fully functioning agent can keep the learning rules on at all times (Figure 11).
Algorithm 3 Patrolling. |
Parameters: gain , noise , habituation , recovery time Input: map synapses , for all point cells starting sensitivity of point cell at node begin patrolling at node while patrolling do habituation of point cell , for all resensitization of all point cells for all nodes that neighbor do agent tests available options for all point cell output with agent at node map output sum of map output with noise, normalized so max = 1 end for move to neighbor node with the highest patrol signal end while |
With appropriate choices of habituation and recovery time , the agent does in fact execute a perfect patrol path on the binary tree, traversing every edge of the graph exactly once, and then repeating that sequence indefinitely (Figure 9A). For this to work, some habituation must persist for the time taken to traverse the entire tree; in this simulation, we used steps on a graph that requires 252 steps. As in all applications of endotaxis, the performance also depends on the readout noise . For increasing readout noise, the agent’s behavior transitions gradually from the perfect patrol to a random walk (Figure 9B). The patrolling behavior of real mice is situated about halfway along that range, at an equivalent readout noise of (Figure 9B).
Finally, this suggests a unified explanation for exploration and patrolling: in both modes, the agent follows the output of the ‘neglect’ cell, which is just the sum total of the map output. However, in the early exploration phase, when the agent is still assembling the cognitive map, it gives the neglect signal zero or low weight, so the turning decisions are dominated by the readout noise and produce something close to a random walk. Later on, the agent assigns a higher weight to the neglect signal, so it exceeds the readout noise and shifts the behavior toward systematic patrolling. In our simulations, an intrinsic readout noise of is sufficiently low to enable even a perfect patrol path (Figure 9B).
In summary, the core model of endotaxis can be enhanced by adding a basic form of habituation at the input neurons. This allows the agent to implement an effective patrolling policy that steers towards regions which have been neglected for a while. Of course, habituation among point cells will also change the dynamics of map learning during the exploration phase. We found that both map and goal synapses are still learned effectively, and navigation to targets is only minimally affected by habituation (Figure 11).
Discussion
Summary of claims
We have presented a biologically plausible neural mechanism that can support learning, navigation, and problem solving in complex environments. The algorithm, called endotaxis, offers an end-to-end solution for assembling a cognitive map (Figure 4), memorizing interesting targets within that map, navigating to those targets (Figure 5), as well as accessory functions like instant homing (Figure 8) and effective patrolling (Figure 9). Conceptually, it is related to chemotaxis, namely the ability to follow an odor signal to its source, which is shared universally by most or all motile animals. The endotaxis network creates an internal ‘virtual odor’ which the animal can follow to reach any chosen target location (Figure 1). When the agent begins to explore the environment, the network learns both the structure of the space, namely which points are connected, and the location of valuable resources (Figure 4), even after a single experience (Figures 4 and 8). The agent can then navigate back to those target locations efficiently from any point in the environment (Figure 5). Beyond spatial navigation, endotaxis can also learn the solution to purely cognitive tasks (Figure 5) that can be formulated as search on a graph (section ‘Theory of endotaxis’). It takes as given two elementary facts: the existence of place cells that fire when the animal is at a specific location, and a behavioral module that allows the animal to follow an odor gradient uphill. The proposed circuit (Figure 1) provides the interface from the place cells to the virtual odor gradient. In the following sections, we consider how these findings relate to phenomena of animal behavior and neural circuitry, and prior art in the area of theory and modeling.
Theories and models of spatial learning
Broadly speaking, endotaxis can be seen as a form of reinforcement learning (Sutton and Barto, 2018): the agent learns from rewards or punishments in the environment and develops a policy that allows for subsequent navigation to special locations. The goal signal in endotaxis plays the role of a value function in reinforcement learning theory. From experience, the agent learns to compute that value function for every location and control its actions accordingly. Within the broad universe of reinforcement learning algorithms, endotaxis combines some special features as well as limitations that are inspired by empirical phenomena of animal learning, and also make it suitable for a biological implementation.
First, most of the learning happens without any reinforcement. During the exploratory random walk, endotaxis learns the topology of the environment, specifically by updating the synapses in the map network ( in Figure 1B). Rewards are not needed for this map learning, and indeed the goal signal remains zero during this period (Figure 4). Once a reward is encountered, the goal synapses ( in Figure 1B) get set, and the goal signal instantly spreads through the known portion of the environment. Thus, the agent learns how to navigate to the goal location from a single reinforcement (Figure 4). This is possible because the ground has been prepared, as it were, by learning a map. In animal behavior, the acquisition of a cognitive map without rewards is called latent learning. Early debates in animal psychology pitched latent learning and reinforcement learning as alternative explanations (Thistlethwaite, 1951). Instead, in the endotaxis algorithm, neither can function without the other as the goal signal explicitly depends on both the map and goal synapses (Equation 18, Algorithm 1).
More specifically, the neural signals in endotaxis bear some similarity to the so-called successor representation (Dayan, 1993; Corneil and Gerstner, 2015; Stachenfeld et al., 2017; Garvert et al., 2017; Fang et al., 2023). This is a proposal for how the brain might encode the current state of the agent, intended to simplify the mathematics of time-difference reinforcement learning. In that representation, there is a neuron for every state of the agent, and the activity of neuron is the time-discounted probability that the agent will find itself at state in the future. Similarly, the output of the endotaxis map network is related to future states of the agent and follows a similar functional dependence on distance (Meister, 2023, Equation 7). However, despite these formal similarities, the underlying logic is quite different. In the successor representation, plays the role of a temporal discount factor for rewards (Dayan, 1993); essentially it is the proportionality factor in the agent’s belief that ‘time is money.’ In this picture, varying allows the agent to make predictions with different time horizons (Fang et al., 2023; Stachenfeld et al., 2017). In endotaxis, there is no time/reward tradeoff. The agent simply wants the shortest path to the goal. The map network reflects the objective connectivity of the environment to the farthest extent possible. Here is the gain of the map neurons that, when properly chosen, allows the neural network to perform that computation. The agent may want to tune to the statistics of the environment, although we showed that a common value of works quite well across environments (Figure 6). (These differences in how the problem is formulated can lead to slightly different mathematical expressions, for example, compare the role of in Equation 7 with Equation 2 of Fang et al., 2023.)
Second, endotaxis does not tabulate the list of available actions at each state. That information remains externalized in the environment: the agent simply tries whatever actions are available at the moment, then picks the best one. This is a characteristically biological mode of action and most organisms have a behavioral routine that executes such trial-and-error. This ‘externalized cognition’ simplifies the learning task: for any given navigation policy, the agent needs to learn only one scalar function of location, namely the goal signal. By comparison, many machine learning algorithms develop a value function for state–action pairs, which then allows more sophisticated planning (Sutton and Barto, 2018; Moerland et al., 2023). The relative simplicity of the endotaxis circuit depends on the limitation to learning only state functions.
Some key elements of the endotaxis model have appeared in prior work, starting with the notion of ascending a scalar goal signal during navigation (Schmajuk and Thieme, 1992; Voicu and Schmajuk, 2000; Samsonovich and Ascoli, 2005). Several models assume the existence of a map layer, in which individual neurons stand for specific places, and the excitatory synapses between neurons represent the connections between those places (Gaussier et al., 2002; Schölkopf and Mallot, 1995; Voicu and Schmajuk, 2000; Trullier and Meyer, 2000; Martinet et al., 2011; Ponulak and Hopfield, 2013; Khajeh-Alijani et al., 2015). Then the agent somehow reads out those connections in order to find the shortest path between its current location (the start node) and a desired target (the end node).
Very different schemes have been proposed for this readout of the map. The most popular scheme is to somehow inject a signal into the desired end node, let it propagate backward through the network, and read out the magnitude or gradient of the signal near the start node (Glasius et al., 1996; Gaussier et al., 2002; Gorchetchnikov and Hasselmo, 2005; Martinet et al., 2011; Ponulak and Hopfield, 2013; Khajeh-Alijani et al., 2015). In general, this requires some accessory system that can look up which neuron in the map corresponds to the desired end node, and which neuron to the agent’s current location or its neighbors; often these accessory functions remain unspecified (Schölkopf and Mallot, 1995; Voicu and Schmajuk, 2000; Khajeh-Alijani et al., 2015). By contrast, in the endotaxis model the signal is propagated in the forward direction starting with the activity of the place cell at the agent’s current location. The signal strength is read out at the goal location: The goal neuron is the same neuron that also responds directly to the rewarding feature at the goal location. For example, the proximity to water is read out by a neuron that is also excited when the animal drinks water. In this way, the brain does not need to maintain a separate lookup table for goal neurons. If the agent wants to find water, it should simply follow the same neuron that fires when it drinks.
Another distinguishing feature of endotaxis is that it operates continuously. Many models for navigation have to separate the phase of spatial learning from the phase of goal-directed navigation. Sometimes plasticity needs to be turned off or reset during one phase or the other (Samsonovich and Ascoli, 2005; Ponulak and Hopfield, 2013). Sometimes a special signal must be injected during goal-seeking (Voicu and Schmajuk, 2000). Sometimes the rules change depending on whether the agent approaches or leaves a target (Blum and Abbott, 1996). Again this requires additional supervisory systems that often go unexplained. By contrast, endotaxis is ‘always on.’ Whether the animal explores a new environment, navigates to a target, or patrols a well-known graph, the synaptic learning rules are always the same. The animal chooses its policy by setting the mode switch that selects one of the available goal signals for the taxis module (Figure 1). Nothing has to change under the hood in the operation of the circuit. All the same signals are used for map learning, target learning, and navigation.
In summary, various components of the endotaxis model have appeared in other proposed schemes for spatial learning and navigation. The present model stands out in that all the essential functions are covered in a feed-forward and neuromorphically plausible manner, without invoking unexplained control schemes.
Animal behavior
The millions of animal species no doubt use a wide range of mechanisms to get around their environment, and it is worth specifying which types of navigation endotaxis might solve. First, the learning mechanism proposed here applies to complex environments, namely those in which discrete paths form sparse connections between points. For a rodent and many other terrestrial animals, the paths they may follow are usually constrained by obstacles or by the need to remain under cover. In those conditions, the brain cannot assume that the distance between points is given by Euclidean geometry, or that beacons for a goal will be visible in a straight line from far away, or that a target can be reached by following a known heading. As a concrete example, a mouse wishing to exit from deep inside a labyrinth (Figure 8A, Rosenberg et al., 2021) can draw little benefit from knowing the distance and heading of the entrance.
Second, we are focusing on the early experience with a new environment. Endotaxis can get an animal from zero knowledge to a cognitive map that allows reliable navigation toward goals discovered on a previous foray. It explains how an animal can return home from inside a complex environment on the first attempt (Rosenberg et al., 2021) or navigate to a special location after encountering it just once (Figures 6 and 8). But it does not implement more advanced routines of spatial reasoning, such as stringing a habitual sequence of actions together into one, or deliberating internally to plan entire routes. Clearly, given enough time in an environment, animals may develop algorithms other than the beginner’s choice proposed here.
A key characteristic of endotaxis, distinct from other forms of navigation, is the reliance on trial-and-error. The agent does not deliberate to plan the shortest path to the goal. Instead, it finds the shortest path by locally sampling the real-world actions available at its current point, and choosing the one that maximizes the virtual odor signal. In fact, there is strong evidence that animals navigate by real-world trial-and-error, at least in the early phase of learning (Redish, 2016). Lashley, 1912, in his first scientific paper on visual discrimination in the rat, reported that rats at a decision point often hesitate ‘with a swaying back and forth between the passages.’ These actions – called ‘vicarious trial and error’ – look eerily like sniffing out an odor gradient, but they occur even in the absence of any olfactory cues. Similar behaviors occur in arthropods (Tarsitano, 2006) and humans (Santos-Pata and Verschure, 2018) when poised at a decision point. We suggest that the animal does indeed sample a gradient, not of an odor, but of an internally generated virtual odor that reflects the proximity to the goal. The animal seems to use the same policy of spatial sampling that it would apply to a real odor signal.
Frequently, a rodent stopped at a maze junction merely turns its head side-to-side, rather than walking down a corridor to sample the gradient. Within the endotaxis model, this could be explained if some of the point cells in the lowest layer (Figure 1B) are selective for head direction or for the view down a specific corridor. During navigation, activation of that ‘direction cell’ systematically precedes activation of point cells further down that corridor. Therefore, the direction cell gets integrated into the map network. From then on, when the animal turns in that direction, this action takes a step along the graph of the environment without requiring a walk in ultimately fruitless directions. In this way, the agent can sample the goal gradient while minimizing energy expenditure.
Once the animal gains familiarity with the environment, it performs fewer of the vicarious trial-and-error movements, and instead moves smoothly through multiple intersections in a row (Redish, 2016). This may reflect a transition between different modes of navigation, from the early endotaxis, where every action gets evaluated on its real-world merit, to a mode where many actions are strung together into behavioral motifs. Eventually the animal may also develop an internal forward model for the effects of its own actions, which would allow for prospective planning of an entire route (Kay et al., 2020; Nyberg et al., 2022). An interesting direction for future research is to seek a neuromorphic circuit model for such action planning; perhaps it can be built naturally on top of the endotaxis circuit.
Brain circuits
The key elements in the proposed circuitry (Figure 1) are a large population of neurons with sparsely selective responses; massive convergence from that population onto a smaller set of output neurons; and synaptic plasticity at the output neurons gated by signals from the animal’s experience. A prominent instance of this motif is found in the mushroom body of the arthropod brain (Heisenberg, 2003; Strausfeld et al., 2009). Here the Kenyon cells, with their sparse odor responses (Stopfer, 2014), play the role of both point and map cells. They are strongly recurrently connected; in fact, most of the Kenyon cell output synapses are onto other Kenyon cells (Eichler et al., 2017; Takemura et al., 2017). Kenyon cells converge onto a much smaller set of mushroom body output neurons (Aso et al., 2014), which play the role of goal cells. Plasticity at the synapse between Kenyon cells and output neurons is gated by neuromodulators that encode rewards or punishments (Cohn et al., 2015). Mushroom body output neurons are known to guide the turning decisions of the insect (Aso et al., 2014), perhaps through their projections to the central complex (Li et al., 2020), an area critical to the animal’s turning behavior (Honkanen et al., 2019). Conceivably, this is where the insect’s basic chemotaxis module is implemented.
In the conventional view, the mushroom body helps with odor discrimination and forms memories of discrete odors that are associated with salient experience (Heisenberg, 2003). Subsequently, the animal can seek or avoid those odors. But the endotaxis model suggests a different interpretation: insects can also use odors as landmarks in the environment. In this more general form of navigation, the odor is not a goal in itself, but serves to mark a route toward some entirely different goal (Knaden and Graham, 2016; Steck et al., 2009). A Kenyon cell, through its sparse odor selectivity, may be active at only one place in the environment, and thus provide the required location-selective input to the endotaxis circuit. Recurrent synapses among Kenyon cells will learn the connectivity among these odor-defined locations, and the output neurons will learn to produce a goal signal that leads the insect to a rewarding location, which itself may not even have a defined odor.
Bees and certain ants rely strongly on vision for their navigation. Here the insect uses discrete panoramic views of the landscape as markers for its location (Webb and Wystrach, 2016; Buehlmann et al., 2020; Sun et al., 2020). In those species, the mushroom body receives massive input from visual areas of the brain. If the Kenyon cells respond sparsely to the landscape views, like the point cells in Figure 1, then the mushroom body can tie together these discrete vistas into a cognitive map that supports navigation toward arbitrary goal locations.
The same circuit motifs are commonly found in other brain areas, including the mammalian neocortex and hippocampus. While the synaptic circuitry there is less understood than in the insect brain, one can record from neurons more conveniently. Much of that work on neuronal signals during navigation has focused on the rodent hippocampal formation (Moser et al., 2015), and it is instructive to compare these recordings to the expectations from the endotaxis model. The three cell types in the model – point cells, map cells, and goal cells – all have place fields, in that they fire preferentially in certain regions within the graph of the environment. However, they differ in important respects.
The place field is smallest for a point cell; somewhat larger for a map cell, owing to recurrent connections in the map network; and larger still for goal cells, owing to additional pooling in the goal network. Such a wide range of place field sizes has indeed been observed in surveys of the rodent hippocampus, spanning at least a factor of 10 in diameter (Wilson and McNaughton, 1993; Kjelstrup et al., 2008). Some place cells show a graded firing profile that fills the available environment. Furthermore, one finds more place fields near the goal location of a navigation task, even when that location has no overt markers (Hollup et al., 2001). Both of those characteristics are expected of the goal cells in the endotaxis model.
The endotaxis model assumes that point cells exist from the very outset in any environment. Indeed, many place cells in the rodent hippocampus appear within minutes of the animal’s entry into an arena (Wilson and McNaughton, 1993; Frank et al., 2004). Furthermore, any given environment activates only a small fraction of these neurons. Most of the ‘potential place cells’ remain silent, presumably because their sensory trigger feature does not match any of the locations in the current environment (Alme et al., 2014; Epsztein et al., 2011). In the endotaxis model, each of these sets of point cells is tied into a different map network, which would allow the circuit to maintain multiple cognitive maps in memory (Muller et al., 1991).
Goal cells, on the other hand, are expected to have large place fields, centered on a goal location, but extending over much of the environment, so the animal can follow the gradient of their activity (Burgess and O’Keefe, 1996). Indeed, such cells have been reported in rat cortex (Hok et al., 2005). In the endotaxis model, a goal cell appears suddenly when the animal first arrives at a memorable location, the input synapses from the map network are potentiated, and the neuron immediately develops a place field (Figure 4). This prediction is reminiscent of a startling experimental observation in recordings from hippocampal area CA1: a neuron can suddenly start firing with a fully formed place field that may be located anywhere in the environment (Bittner et al., 2017). This event appears to be triggered by a calcium plateau potential in the dendrites of the place cell, which potentiates the excitatory synaptic inputs the cell receives. A surprising aspect of this discovery was the large extent of the resulting place field, which requires the animal several seconds to cover. Subsequent cellular measurements indeed revealed a plasticity mechanism that extends over several seconds (Magee and Grienberger, 2020). The endotaxis model relies on just such a plasticity rule for map learning (Algorithm 2) that can correlate events at subsequent nodes on the agent’s trajectory.
Outlook
Endotaxis is a hypothetical neural circuit solution to the problems of spatial exploration, learning, and navigation. Its compact circuit structure and all-in-one functionality suggest that it would fit in even the smallest brains. Effectively, endotaxis represents a brain module that could be interposed between a spatial-sensing module, which produces place cells, and a taxis module, which delivers the movements to ascend a goal signal. It further relies on some high-level policy that sets the ‘mode switch’ by which the animal chooses what goal to pursue. Future research might get at this behavioral control mechanism through a program of anatomical module tracing: first find the neural circuit that controls chemotaxis behavior. Then test if that module receives a convergence of goal signals from other circuits with non-olfactory information. If so, the mechanism of arbitrage that routes one or another goal signal to the taxis module should reveal the high-level coordination of the animal’s behavior. Given the recent technical developments in mapping the connectome (Dorkenwald et al., 2023), we believe that such a program of module tracing is within reach, probably first for the insect brain.
Materials and methods
Simulations
Request a detailed protocolNumerical simulations were performed as described (see Algorithms 1–4). Parameter settings are listed in the text and figure captions. The sensitivity to parameters is reported in Figure 6. Code that produced all the results is available in a public repository.
Average navigated distance
Request a detailed protocolIn the text, we often assess the performance of an endotaxis agent by considering point-to-point navigation between all pairs of points on a graph. Given the readout noise that affects the goal signal, navigation is a stochastic process with many random decisions along the route. Different random instantiations of the process will produce routes of different lengths. Fortunately, there is a way to calculate the expectation value of the route length without any Monte Carlo simulation.
Consider navigation to goal node . From the state of the network ( and ), we compute the goal signal at every node . When the agent is at node , it chooses among the neighbor nodes the one with the highest sum of goal signal and noise (Algorithm 1). Based on the goal signal and the noise , one can compute the probability for each such possible step from . This leads to a transition matrix for the random walk
Subsequent decisions along the route are independent of each other. Hence, the process is a Markov chain. Then we make use of a well-known result for first-capture times on a Markov chain to compute the expected number of steps to arrival at starting from any node .
Note the method assumes that the process is stationary Markov, such that the goal signal does not change in the course of navigation. In our analysis of patrolling (Figures 9 and 11), this assumption is violated because the habituation state of the point cells depends on what path the agent took to the current node. In those cases, we resorted to Monte Carlo simulations to estimate the distribution of route lengths.
Nonlinear activation function
Request a detailed protocolThe activation function of a map neuron is the relationship of input to output
where (Equation 4)
is the input to the map neuron. Most of the report assumes a linear activation function (Equation 3)
For Figure 7, we used a saturating function instead (Equation 20):
The recurrent network equation was solved using Python’s fsolve.
Forgetting of links and resources
Request a detailed protocolIn section ‘Acquisition of map and targets during exploration,’ we discuss the learning algorithm that acquires the connectivity of the environment and the locations of resources. It reacts rapidly to the appearance of new links in the environment: as soon as the agent travels from one point to another, the synapse between the corresponding map cells gets established. Suppose now that a previously existing link becomes blocked: How can one remove the corresponding synapse from the map? A simple solution would be to let all synapses decay over time, balanced by strengthening whenever a link gets traveled. In that case, the entire map would be forgotten when the animal goes to sleep for a few hours, whereas it is clear that animals retain such maps over many days. Instead, one wants a mode of active forgetting: memory of the link from node to should be weakened only if the agent find itself at node and repeatedly chooses not to go to . We formalize this in Algorithm 4, which differs only slightly from Algorithm 2.
Algorithm 4 Learning and forgetting. |
Parameters: gain , threshold , goal-learning rate , forgetting rate Input: adjacency matrix , resource signals initiate map synapses at 0 initiate goal synapses at 0 counts the steps start random walk at while learning do a random neighbor of continue the random walk for every point cell point cell output map cell output for all map cell pairs do if then if pre-synaptic high if then if post-synaptic also high potentiate the synapses else if post-synaptic low depress the synapses end if end if end for goal signals for every goal neuron do difference between resource signal and prediction from the map if then if the resource signal exceeds the prediction from the map for every map neuron do potentiate goal synapses end for else if resource signal less than prediction for every map neuron do depress goal synapses end for end if end for end while |
Here the added parameter determines how much a map synapse gets depressed each time the corresponding link is not chosen. Similarly, goal synapses decay if their prediction for a resource exceeds the resource signal received by the goal cell. The synaptic learning rule resembles the BCM rule (Bienenstock et al., 1982): synaptic modification is conditional on presynaptic activity and leads to either potentiation or depression depending on the level of postsynaptic activity.
Figure 10 illustrates this process with a simulation analogous to Figure 4. The agent explores a ring graph by a random walk. At some point, a new link appears clear across the ring. Later on that link disappears again. Acquisition of the link happens very quickly, within a single time step (Figure 10A and C). Forgetting that link takes longer, on the order of several hundred steps (Figure 10A, D and E). In this simulation, , so the map synapses decay by about 10% whenever a link is not traveled. One could, of course, accelerate that with a higher , but at the cost of destabilizing the entire map. Even the synapses for intact links get depressed frequently (Figure 10E) because the random choices of the agent lead it to take any given link only a fraction of the time.
One limitation of the endotaxis agent is that it does not keep a record of what actions are available at each node. Instead, it leaves that information in the environment (see ‘Discussion’) and simply tries all the actions that are available. When faced with a blocked tunnel, the endotaxis agent does not know that this was previously available. Clearly, a more advanced model of the world that includes a state–action table would allow more effective editing of the cognitive map.
Habituation in point cells
Request a detailed protocolIn section ‘Efficient patrolling,’ we discuss an extension of the core endotaxis model in which a point neuron undergoes habituation after the agent passes through its node. With every visit, the neuron’s sensitivity declines by a factor . Between visits the sensitivity gradually returns toward 1 with an exponential recovery time of steps (see Algorithm 3).
This addition to the model changes the dynamics of the network input throughout the phases of exploration, navigation, and patrolling. We explored how the resulting performance is affected by applying a strong habituation that decays slowly () and comparing to the basic model with no habituation (). During the learning phase, when the map and goal synapses are established via a random walk, the main change is that it takes somewhat longer to learn the map. This is because synaptic updates happen only when both pre- and postsynaptic map cells exceed a threshold (see Algorithm 2), and that requires that both of the respective point neurons be in a high-sensitivity state. Remarkably all the parameter settings () that support learning and navigating under standard conditions (Figure 6) also work well when habituation takes place.
To illustrate the overall effect that habituation has on performance, we simulated learning and navigation on the binary tree graph of Figure 9. For every pair of start and end nodes, we asked how the actual navigated distance compared to the shortest graph distance. Figure 11 shows that performance is affected only slightly. At the standard noise value used in other simulations, the range of navigation extends over 10 or more steps under both conditions.
Data availability
Data and code to reproduce the reported results are openly available at https://github.com/markusmeister/Endotaxis-2023 (copy archived at Meister, 2024).
References
-
Olfaction, navigation, and the origin of isocortexFrontiers in Neuroscience 9:402.https://doi.org/10.3389/fnins.2015.00402
-
Algorithms for olfactory search across speciesThe Journal of Neuroscience 38:9383–9389.https://doi.org/10.1523/JNEUROSCI.1668-18.2018
-
BookA physicist looks at bacterial ChemotaxisIn: Berg HC, editors. Cold Spring Harbor Symposia on Quantitative Biology. Elesiver. pp. 1–9.https://doi.org/10.1101/sqb.1988.053.01.003
-
A model of spatial map formation in the hippocampus of the ratNeural Computation 8:85–93.https://doi.org/10.1162/neco.1996.8.1.85
-
Memory use in insect visual navigationNature Reviews. Neuroscience 3:542–552.https://doi.org/10.1038/nrn872
-
ConferenceAttractor network Dynamics enable Preplay and rapid path planning in maze–like environmentsAdvances in Neural Information Processing Systems.
-
BookTheoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. Computational NeuroscienceCambridge, Mass: MIT Press.
-
Hippocampal plasticity across multiple days of exposure to novel environmentsThe Journal of Neuroscience 24:7681–7689.https://doi.org/10.1523/JNEUROSCI.1958-04.2004
-
Hebbian learning of recurrent connections: A geometrical perspectiveNeural Computation 24:2346–2383.https://doi.org/10.1162/NECO_a_00322
-
Mathematical formulations of Hebbian learningBiological Cybernetics 87:404–415.https://doi.org/10.1007/s00422-002-0353-y
-
A biologically inspired neural net for trajectory formation and obstacle avoidanceBiological Cybernetics 74:511–520.https://doi.org/10.1007/BF00209422
-
Mushroom body memoir: from maps to modelsNature Reviews. Neuroscience 4:266–275.https://doi.org/10.1038/nrn1074
-
Accumulation of hippocampal place fields at the goal location in an annular watermaze taskThe Journal of Neuroscience 21:1635–1644.https://doi.org/10.1523/JNEUROSCI.21-05-01635.2001
-
The insect central complex and the neural basis of navigational strategiesThe Journal of Experimental Biology 222:Suppl.https://doi.org/10.1242/jeb.188854
-
The sensory ecology of ant navigation: from natural environments to neural mechanismsAnnual Review of Entomology 61:63–76.https://doi.org/10.1146/annurev-ento-010715-023703
-
Visual discrimination of size and form in the albino ratJournal of Animal Behavior 2:310–331.https://doi.org/10.1037/h0071033
-
Synaptic plasticity forms and functionsAnnual Review of Neuroscience 43:95–117.https://doi.org/10.1146/annurev-neuro-090919-022842
-
Spatial learning and action planning in a prefrontal cortical network modelPLOS Computational Biology 7:e1002045.https://doi.org/10.1371/journal.pcbi.1002045
-
SoftwareEndotaxis-2023, version swh:1:rev:7c97e345063101f15c59ab9d321a3eea9809fa8bSoftware Heritage.
-
Place cells, grid cells, and memoryCold Spring Harbor Perspectives in Biology 7:a021808.https://doi.org/10.1101/cshperspect.a021808
-
Rapid, parallel path planning by propagating wavefronts of spiking neural activityFrontiers in Computational Neuroscience 7:98.https://doi.org/10.3389/fncom.2013.00098
-
Vicarious trial and errorNature Reviews. Neuroscience 17:147–159.https://doi.org/10.1038/nrn.2015.30
-
Human vicarious trial and error is predictive of spatial navigation performanceFrontiers in Behavioral Neuroscience 12:237.https://doi.org/10.3389/fnbeh.2018.00237
-
Purposive behavior and cognitive mapping: A neural network modelBiological Cybernetics 67:165–174.https://doi.org/10.1007/BF00201023
-
View-Based cognitive mapping and path planningAdaptive Behavior 3:311–348.https://doi.org/10.1177/105971239500300303
-
Navigating for rewardNature Reviews. Neuroscience 22:472–487.https://doi.org/10.1038/s41583-021-00479-z
-
The hippocampus as a predictive mapNature Neuroscience 20:1643–1653.https://doi.org/10.1038/nn.4650
-
Central processing in the mushroom bodiesCurrent Opinion in Insect Science 6:99–103.https://doi.org/10.1016/j.cois.2014.10.009
-
Ground plan of the insect mushroom body: functional and evolutionary implicationsThe Journal of Comparative Neurology 513:265–291.https://doi.org/10.1002/cne.21948
-
A critical review of latent learning and related experimentsPsychological Bulletin 48:97–129.https://doi.org/10.1037/h0055171
-
Animat navigation using a cognitive graphBiological Cybernetics 83:271–285.https://doi.org/10.1007/s004220000170
-
Exploration, navigation and cognitive mappingAdaptive Behavior 8:207–223.https://doi.org/10.1177/105971230000800301
-
Neural mechanisms of insect navigationCurrent Opinion in Insect Science 15:27–39.https://doi.org/10.1016/j.cois.2016.02.011
Article and author information
Author details
Funding
Simons Foundation (543015)
- Markus Meister
Simons Foundation (543025)
- Pietro Perona
National Science Foundation (1564330)
- Pietro Perona
- Pietro Perona
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by the Simons Collaboration on the Global Brain (grant 543015 to MM and 543025 to PP), NSF award 1564330 to PP, and a gift from Google to PP.
Version history
- Preprint posted:
- Received:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.84141. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2023, Zhang et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,624
- views
-
- 124
- downloads
-
- 8
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
- Developmental Biology
Understanding the principles underlying the design of robust, yet flexible patterning systems is a key problem in developmental biology. In the Drosophila wing, Hedgehog (Hh) signaling determines patterning outputs using dynamical properties of the Hh gradient. In particular, the pattern of collier (col) is established by the steady-state Hh gradient, whereas the pattern of decapentaplegic (dpp), is established by a transient gradient of Hh known as the Hh overshoot. Here we use mathematical modeling to suggest that this dynamical interpretation of the Hh gradient results in specific robustness and precision properties. For instance, the location of the anterior border of col, which is subject to self-enhanced ligand degradation is more robustly specified than that of dpp to changes in morphogen dosage, and we provide experimental evidence of this prediction. However, the anterior border of dpp expression pattern, which is established by the overshoot gradient is much more precise to what would be expected by the steady-state gradient. Therefore, the dynamical interpretation of Hh signaling offers tradeoffs between
-
- Computational and Systems Biology
- Neuroscience
Animal behaviour alternates between stochastic exploration and goal-directed actions, which are generated by the underlying neural dynamics. Previously, we demonstrated that the compositional Restricted Boltzmann Machine (cRBM) can decompose whole-brain activity of larval zebrafish data at the neural level into a small number (∼100-200) of assemblies that can account for the stochasticity of the neural activity (van der Plas et al., eLife, 2023). Here, we advance this representation by extending to a combined stochastic-dynamical representation to account for both aspects using the recurrent temporal RBM (RTRBM) and transfer-learning based on the cRBM estimate. We demonstrate that the functional advantage of the RTRBM is captured in the temporal weights on the hidden units, representing neural assemblies, for both simulated and experimental data. Our results show that the temporal expansion outperforms the stochastic-only cRBM in terms of generalization error and achieves a more accurate representation of the moments in time. Lastly, we demonstrate that we can identify the original time-scale of assembly dynamics by estimating multiple RTRBMs at different temporal resolutions. Together, we propose that RTRBMs are a valuable tool for capturing the combined stochastic and time-predictive dynamics of large-scale data sets.