Learning accurate path integration in ring attractor models of the head direction system
Abstract
Ring attractor models for angular path integration have received strong experimental support. To function as integrators, head direction circuits require precisely tuned connectivity, but it is currently unknown how such tuning could be achieved. Here, we propose a network model in which a local, biologically plausible learning rule adjusts synaptic efficacies during development, guided by supervisory allothetic cues. Applied to the Drosophila head direction system, the model learns to pathintegrate accurately and develops a connectivity strikingly similar to the one reported in experiments. The mature network is a quasicontinuous attractor and reproduces key experiments in which optogenetic stimulation controls the internal representation of heading in flies, and where the network remaps to integrate with different gains in rodents. Our model predicts that path integration requires selfsupervised learning during a developmental phase, and proposes a general framework to learn to pathintegrate with gain1 even in architectures that lack the physical topography of a ring.
Editor's evaluation
This paper will be of interest to neuroscientists studying the navigation system, and in particular, those who study the ability of animals to path integrate. This study proposes an elegant synaptic plasticity rule that maintains the connectivity required for path integration by integrating visual and selfmotion input arriving at different dendritic locations in a neuron. This idea is applied to the central complex of Drosophila, a wellcharacterized experimental system.
https://doi.org/10.7554/eLife.69841.sa0Introduction
Spatial navigation is crucial for the survival of animals in the wild and has been studied in many model organisms (Tolman, 1948; O’Keefe and Nadel, 1978; Gallistel, 1993; Eichenbaum, 2017). To orient themselves in an environment, animals rely on external sensory cues (e.g. visual, tactile, or auditory), but such allothetic cues are often ambiguous or absent. In these cases, animals have been found to update internal representations of their current location based on idiothetic cues, a process that is termed path integration (PI, Darwin, 1873; Mittelstaedt and Mittelstaedt, 1980; McNaughton et al., 1996; Etienne et al., 1996; Neuser et al., 2008; Burak and Fiete, 2009). The head direction (HD) system partakes in PI by performing one of the computations required: estimating the current HD by integrating angular velocities; namely angular integration. Furthermore, head direction cells in rodents and flies provide an internal representation of orientation that can persist in darkness (Ranck, 1984; Mizumori and Williams, 1993; Seelig and Jayaraman, 2015).
In rodents, the internal representation of heading takes the form of a localized "bump" of activity in the highdimensional neural manifold of HD cells (Chaudhuri et al., 2019). It has been proposed that such a localized activity bump could be sustained by a ring attractor network with local excitatory connections (Skaggs et al., 1995; Redish et al., 1996; Hahnloser, 2003; Samsonovich and McNaughton, 1997; Song and Wang, 2005; Stringer et al., 2002; Xie et al., 2002), resembling reverberation mechanisms proposed for working memory (Wang, 2001). Ring attractor networks used to model HD cells fall in the theoretical framework of continuous attractor networks (Amari, 1977; BenYishai et al., 1995; Seung, 1996). In this setting, HD cells can update the heading representation in darkness by smoothly moving the bump around the ring obeying idiothetic angularvelocity cues.
Interestingly, a physical ringlike attractor network of HD cells was observed in the Drosophila central complex (CX, Seelig and Jayaraman, 2015; Green et al., 2017; Green et al., 2019; Franconville et al., 2018; Kim et al., 2019; Fisher et al., 2019; TurnerEvans et al., 2020). Notably, in Drosophila (from here on simply referred to as ‘fly’), HD cells (named EPG neurons, also referred to as ‘compass’ neurons) are physically arranged in a ring, and an activity bump is readily observable from a small number of cells (Seelig and Jayaraman, 2015). Moreover, as predicted by some computational models (Skaggs et al., 1995; Samsonovich and McNaughton, 1997; Stringer et al., 2002; Song and Wang, 2005), the fly HD system also includes cells (named PEN1 neurons) that are conjunctively tuned to head direction and head angular velocity. We refer to these neurons as head rotation (HR) cells because of their putative role in shifting the HD bump across the network according to the head’s angular velocity (TurnerEvans et al., 2017; TurnerEvans et al., 2020).
A model for PI needs to both sustain a bump of activity and move it with the right speed and direction around the ring. The latter presents a great challenge, since the bump has to be ‘pushed’ for the right amount starting from any location and for all angular velocities. Therefore, ring attractor models that act as path integrators require that synaptic connections are precisely tuned (Hahnloser, 2003). If the circuit was completely hardwired, the amount of information that an organism would need to genetically encode connection strenghts would be exceedingly high. Additionally, it would be unclear how these networks could cope with variable sensory experiences. In fact, remarkable experimental studies in rodents have shown that when animals are placed in an augmented reality environment where visual and selfmotion information can be manipulated independently, PI capabilities adapt accordingly (Jayakumar et al., 2019). These findings suggest that PI networks are able to selforganize and to constantly recalibrate. Notably, in mature flies there is no evidence for such plasticity (Seelig and Jayaraman, 2015) — however, the presence of plasticity has not been tested in young animals.
Here, we propose that a simple local learning rule could support the emergence of a PI circuit during development and its recalibration once the circuit has formed. Specifically, we suggest that accurate PI is achieved by associating allothetic and idiothetic inputs at the cellular level. When available, the allothetic sensory input (here chosen to be visual) acts as a ‘teacher’ to guide learning. The learning rule is an example of selfsupervised multimodal learning, where one sense acts as a teaching signal for the other and the need for an external teacher is obviated. It exploits the relation between the allothetic heading of the animal (given by the visual input) and the idiothetic selfmotion cues (which are always available), to learn how to integrate the latter.
The learning rule is inspired by previous experimental and computational work on mammalian cortical pyramidal neurons, which are believed to associate inputs to different compartments through an inbuilt cellular mechanism (Larkum, 2013; Urbanczik and Senn, 2014; Brea et al., 2016). In fact, it was shown that in layer 5 pyramidal cells internal and external information about the world arrive at distinct anatomical locations, and active dendritic gating controls learning between the two (Doron et al., 2020). In a similar fashion, we propose that learning PI in the HD system occurs by associating inputs at opposite poles of compartmentalized HD neurons, which we call ‘associative neurons’ (Urbanczik and Senn, 2014; Brea et al., 2016). Therefore, to accomplish PI the learning rule relies on structural inductive biases in terms of the morphology and arborization of HD cells.
In summary, here we show for the first time how a biologically plausible synaptic plasticity rule enables to learn and maintain the complex circuitry required for PI. We apply our framework to the fly HD system because it is well characterized; yet our model setting is general and can be used to learn PI in other animal models once more details about the HD circuit there are known (Abbott et al., 2020). We find that the learned network is a ring attractor with a connectivity that is strikingly similar to the one found in the fly CX (TurnerEvans et al., 2020) and that it can accurately pathintegrate in darkness for the entire range of angular velocities that the fly displays. Crucially, the learned network accounts for several key findings in the experimental literature, and it generates predictions, including the presence of plasticity in young animals, that could be tested experimentally.
Results
To illustrate basic principles of how PI could be achieved, we study a computational model of the HD system and show that synaptic plasticity could shape its circuitry through visual experience. In particular, we simulate the development of a network that, after learning, provides a stable internal representation of head direction and uses only angularvelocity inputs to update the representation in darkness. The internal representation of heading (after learning) takes the form of a localized bump of activity in the ring of HD cells. All neurons in our model are ratebased, i.e., spiking activity is not modeled explicitly.
Model setup
The gross model architecture closely resembles the one found in the fly CX (Figure 1A). It comprises HD cells organized in a ring, and HR cells organized in two wings. One wing is responsible for leftward and the other for rightward movement of the internal heading representation. HD cells receive visual input from the socalled ‘ring’ neurons; this input takes the form of a disinhibitory bump centered at the current HD (Figure 1B, Omoto et al., 2017; Fisher et al., 2019). The location of this visual bump in the network is controlled by the current head direction. We simulate head movements by sampling headturning velocities from an OrnsteinUhlenbeck process (Materials and methods), and we provide the corresponding velocity input to the HR cells (Figure 1C). HR cells provide direct input to HD cells, and HR cells also receive input from HD cells (Figure 1A). Both HR and HD cells receive global inhibition, which is in line with a putative ‘local’ model of HD network organization (Kim et al., 2017). The connections from HR to HD cells (${W}^{\mathrm{HR}}$) and the recurrent connections among HD cells (${W}^{\text{rec}}$) are assumed to be plastic. The goal of learning is to tune these plastic connections so that the network can achieve PI in the absence of visual input.
The unit that controls plasticity in our network is an ‘associative neuron’. It is inspired by pyramidal neurons of the mammalian cortex whose dendrites act, via backpropagating action potentials, as coincidence detectors for signals arriving from different layers of the cortex and targeting different compartments of the neuron (Larkum et al., 1999). Paired with synaptic plasticity, coincidence detection can lead to longlasting associations between these signals (Larkum, 2013). To map the morphology of a cortical pyramidal cell to the one of a HD cell in the fly, we first point out that all relevant inputs arrive at the dendrites of HD cells within the ellipsoid body (EB) of the fly (Xu, 2020) moreover, the soma itself is externalized in the fly brain, and it is unlikely to contribute considerably to computations (Gouwens and Wilson, 2009; Tuthill, 2009). We thus link the dendrites of the pyramidal associative neuron to the axondistal dendritic compartment of the associative HD neuron in the fly, and we link the soma of the pyramidal associative neuron to the axonproximal dendritic compartment of the associative HD neuron in the fly. Furthermore, we assume that the axonproximal compartment is electrotonically closer to the axon initial segment, and therefore, similarly to the somatic compartment in pyramidal neurons, inputs there can more readily initiate action potentials. Note that our model does not require active backpropagation of action potentials — passive spread of voltage to the axondistal compartment would be sufficient (for details, see Materials and methods and Discussion). We also assume that associative HD cells receive visual input (${I}^{\text{vis}}$) in the axonproximal compartment, and both recurrent input (${W}^{\text{rec}}$) and HR input (${W}^{\text{HR}}$) in the axondistal compartment; accordingly, we model HD neurons as twocompartment units (Figure 1D). The associative neuron can learn the synaptic weights of the incoming connections in the axondistal compartment, therefore, as mentioned, we let ${W}^{\text{rec}}$ and ${W}^{\text{HR}}$ be plastic.
We find that the assumption of spatial segregation of postsynapses of HD cells is consistent with our analysis of EM data from the fly (Xu, 2020). For an example HD (EPG) neuron, Figure 1E depicts that head rotation and recurrent inputs (mediated by PEN1 and PEN2 cells, respectively [TurnerEvans et al., 2020]) contact the EPG cell in locations within the EB that are distinct compared to those of visually responsive neurons R2 and R4d (Omoto et al., 2017; Fisher et al., 2019), as hypothesized. The same pattern was observed for a total of 16 EPG neurons (one for each ‘wedge’ of the EB) that we analyzed (Figure 1—figure supplement 1A). To further support the assumption that visual inputs are separated from recurrent and HRtoHD inputs, we perform binary classification between the two classes, using SVMs (for details, see Materials and methods). Figure 1—figure supplement 1B shows that predicting class identity from spatial location alone in heldout test data is excellent (test accuracy >0.95 across neurons and model runs).
The connections from HD to HR cells (${W}^{\text{HD}}$) are assumed to be fixed, and HR cells are modeled as singlecompartment units. Projections are organized such that each wing neuron receives input from only one specific HD neuron for every HD (Figure 1A). This simple initial wiring makes HR cells conjunctively tuned to HR and HD, and we assume that it has already been formed, for example, during circuit assembly. We note that the conditions for 1to1 wiring and constant amplitude of the HDtoHR connections can be relaxed, because the learning rule can balance asymmetries in the initial architecture (see Appendix 3). In addition, the connections carrying the visual and angular velocity inputs are also assumed to be fixed. Although plasticity in the visual inputs has been shown to exist (Fisher et al., 2019; Kim et al., 2019), here we focus on how the pathintegrating circuit itself originally selforganizes. Therefore, to simplify the setting and without loss of generality, we assume a fixed anchoring to environmental cues as the animal moves in the same environment (for details, see Discussion).
In our model, the visual input acts as a supervisory signal during learning (as in D’Albis and Kempter, 2020), which is used to change weights of synapses onto the axondistal compartment of HD cells. We utilize the learning rule proposed by Urbanczik and Senn, 2014 (for details, see Materials and methods), which tunes the incoming synaptic connections in the axondistal compartment in order to minimize the discrepancy between the firing rate of the neuron $f({V}^{a})$ (where ${V}^{a}$ is the axonproximal voltage, primarily controlled by the visual input) and the prediction of the firing rate by the axondistal compartment from axondistal inputs alone, $f(p{V}^{d})$ (where $p$ is a constant and ${V}^{d}$ is the axondistal voltage, which depends on head rotation velocity). From now on, we refer to this discrepancy as ‘learning error’, or simply ‘error’ (Equation 18; in units of firing rate). The synaptic weight change $\mathrm{\Delta}{W}_{\mathit{\text{pre,post}}}$ from a presynaptic (HD or HR) neuron to a postsynaptic HD neuron is then given by:
where $\eta $ is the constant learning rate and ${\text{P}}_{\text{pre}}$ is the postsynaptic potential from the presynaptic neuron. When implementing this learning rule, we lowpass filter the prospective weight change $\mathrm{\Delta}{W}_{\mathit{\text{pre,post}}}$ to ensure smoothness of learning.
Importantly, this learning rule is biologically plausible because the firing rate of an associative neuron $f({V}^{a})$ is locally available at every synapse in the axondistal compartment due to the (passive or active) backpropagation of axonal activity to the axondistal dendrites. The other two signals that enter the learning rule are the voltage of the axondistal compartment ${V}^{d}$ and the postsynaptic potential P, which are also available locally at the synapse; for details, see Materials and methods. Furthermore, recent behavioral experiments show that conditioning in Drosophila (Zhao et al., 2021) is not well explained by classical correlationbased plasticity, but it can be well accounted for by predictive synaptic plasticity. The latter is in line with the learning rule utilized here.
Mature network can pathintegrate in darkness
Figure 2A shows an example of the performance of a trained network, for the light condition (i.e. when visual input is available; yellow overbars) and for PI in darkness (purple overbars); the performance is quantified by the PI error (in units of degrees) over time. PI error refers to the accumulated difference between the internal representation of heading and the true heading, and it is different from the learning error introduced previously.
A unique bump of activity is clearly present at all times in the HD network (Figure 2A, top), in both light and darkness conditions, and this bump moves smoothly across the network for a variable angular velocity (Figure 2A, bottom). The position of the bump is defined as the population vector average (PVA) of the neural activity in the HD network. The HD bump also leads to the emergence of bumps in the HR network, separately for LHR and RHR cells (Figure 2A, second and third panel from top). In light conditions (0–20 s in Figure 2A), the PVA closely tracks the head direction of the animal in HD, LHR, and RHR cells alike, which is expected because the visual input guides the network activity. Importantly, however, in darkness (20–50 s in Figure 2A), the selfmotion input alone is enough to track the animal’s heading, leading to a small PI error between the internal representation of heading and the ground truth. This error is corrected after the visual input reappears (at 50 s in Figure 2A). Such PI errors in darkness are qualitatively consistent with data reported in the experimental literature (Seelig and Jayaraman, 2015). The correction of the PI error also reproduces in silico the experimental finding that the visual input (whenever available) exerts stronger control on the bump location than the selfmotion input (Seelig and Jayaraman, 2015), which suggests that even the mature network does not rely on PI when visual cues are available.
To quantify the accuracy of PI in our model, we draw 1,000 trials, each 60 s long, for constant synaptic weights and in the absence of visual input. We also limit the angular velocities in these trials to retain only velocities that flies realistically display (see dashed green lines in Figure 2C and Methods). We then plot the distribution of PI errors every 10 s (Figure 2B). We find that average absolute PI errors (widths of distributions) increase with time in darkness, but most of the PI errors at 60 s are within 60 deg of the true heading. This vastly exceeds the PI performance of flies (Seelig and Jayaraman, 2015). In flies, the correlation between the PVA estimate and the true heading in darkness varied widely across animals in the range [0.3, 0.95] (Seelig and Jayaraman, 2015), whereas for the model it is close to 1. However, it should be noted that the model here corresponds to an ideal scenario that serves as a proof of principle. We will later incorporate irregularities owing to biological factors (asymmetry in the weights, biological noise) that bring the network’s performance closer to the fly’s behavior.
To further assess the network’s ability to integrate different angular velocities, we simulate the system both with and without visual input in 5 s intervals during which the angular velocity is constant. We then compute the average movement velocity of the bump across the network, that is the neural velocity, and compare it to the real velocity provided as input. Figure 2C shows that the network achieves a PI gain (defined as the ratio between neural and real velocity) close to 1 both with and without supervisory visual input, meaning that the neural velocity matches very well the angular velocity of the animal, for all angular velocities that are observed in experiments ($v<500$ deg/s for walking and flying) (Geurten et al., 2014; Stowers et al., 2017). Although expected in light conditions, the fact that gain 1 is achieved in darkness shows that the network predicts the missing visual input from the velocity input, that is, the network path integrates accurately. Note that PI is impaired in our model for very small angular velocities (Figure 2C, flat purple line for $v<30$ deg/s), similarly to previous handtuned theoretical models (TurnerEvans et al., 2017). This is a direct consequence of the fact that maintaining a stable activity bump and moving it across the network at very small angular velocities are competing goals. Crucially, it has been reported that such an impairment of PI for small angular velocities exists in flies (Seelig and Jayaraman, 2015). Note that if we increase the number of HD neurons from 60 (∼50 were reported in the fly by TurnerEvans et al., 2020; Xu, 2020) to 120 or 240, this flat region is no longer observed (data not shown).
The network is a quasicontinuous attractor
A continuous attractor network (CAN) should be able to maintain a localised bump of activity in virtually a continuum of locations around the ring of HD cells. To prove that the learned network approximates this property, we seek to reproduce in silico experimental findings in Kim et al., 2017. There it was shown that local optogenetic stimulation of HD cells in the ring can cause the activity bump to jump to a new position and persist in that location — supported by internal dynamics alone.
To reproduce the experiments by Kim et al., 2017, we simulate optogenetic stimulation of HD cells in our network as visual input of increased strength and extent (for details, see Materials and methods). We find that the strength and extent of the stimulation needs to be increased relative to that of the visual input; only in this case, a bump at some other location in the network can be suppressed, and a new bump emerges at the stimulated location. The stimuli are assumed to appear instantaneously at random locations, but we restrict our set of stimulation locations to the discrete angles represented by the finite number of HD neurons. Furthermore, the velocity input is set to zero for the entire simulation, signaling lack of head movement.
Figure 2D shows network activity in response to several stimuli, when the stimulation location changes abruptly every 5 s. During stimulation (2 s long, red overbars), the bump is larger than normal due to the use of a stronger than usual visuallike input to mimic optogenetic stimulation. The way in which the network responds to a stimulation depends on how far away from the ‘current’ location it is stimulated: for shorter distances, the bump activity shifts to the new location, as evidenced by the transient dynamics at the edges of the bump resembling a decay from an initial to a new location (see Figure 2D at {5,15,20} s). However, for larger phase shifts $\mathrm{\Delta}\theta $ the bump first emerges in the new location and subsequently disappears at the initial location, a mechanism akin to a ‘jump’ (Figure 2D, all other transitions). Similar effects have been observed in the experimental literature (Seelig and Jayaraman, 2015; Kim et al., 2017). The way the network responds to stimulation indicates that it operates in a CAN manner, and not as a winnertakesall network where changes in bump location would always be instantaneous (Carpenter and Grossberg, 1987; Itti et al., 1998; Wang, 2002). That is to say, the network operates as expected from a quasicontinuous attractor. Furthermore, we find that the transition strategy in our model changes from predominantly smooth transitions to jumps at $\mathrm{\Delta}\theta \approx 90\text{deg}$, which matches experiments well (Kim et al., 2017).
Following a 2 s stimulation, the network activity has converged to the new cued location. After the stimulation has been turned off, the bump remains at the new location (within the angular resolution $\mathrm{\Delta}\varphi $ of the network), supported by internal network dynamics alone (Figure 2D). We confirmed in additional simulations that the bump does not drift away from the stimulated location for extended periods of time (3 min duration tested, only 3 s shown), and for all discrete locations in the HD network (only six locations shown). Therefore, we conclude that the HD network is a quasicontinuous attractor that can reliably sustain a heading representation over time in all HD locations. Note that for the network size used (${N}^{\text{HD}}=60$) we still obtain discrete attractors with separated basins of attraction; however it is expected that with increasing ${N}^{\text{HD}}$ adjacent attractors will merge when the intrinsic noise overcomes the barrier separating them. Indeed, we find that for ${N}^{\text{HD}}={N}^{\text{HR}}=120$ it is easier to diffuse to adjacent attractors in the presence of synaptic input noise; for the impact of noise, see Appendix 1—figure 1C. In reality, the bump may drift away due to asymmetries in the connectivity of the biological circuit as well as intrinsic noise (Burak and Fiete, 2012) see also Appendix 1. In flies, for instance, the bump can stay put only for several seconds (Kim et al., 2017).
Learning results in synaptic connectivity that matches the one in the fly
To gain more insight into how the network achieves PI and attains CAN properties, we show how the synaptic weights of the network are tuned during a developmental period (Figure 3). Figure 3A and B shows the learned recurrent synaptic weights among the HD cells, ${W}^{\text{rec}}$, and the learned synaptic weights from HR to HD cells, ${W}^{\text{HR}}$, respectively. Circular symmetry is apparent in both matrices, a crucial property for a symmetric ring attractor. Therefore, we also plot the profiles of the learned weights as a function of receptive field difference in Figure 3C. Note that pixelized appearance in these plots is due to the fact that two adjacent HD neurons are tuned for the same HD, and develop identical synaptic strengths.
First, we discuss the properties of the learned weights. Local excitatory connections have developed along the main diagonal of ${W}^{\text{rec}}$, similar to what is observed in the CX (TurnerEvans et al., 2020). This local excitation can be readily seen in the weight profile of ${W}^{\text{rec}}$ in Figure 3C, and it is the substrate that allows the network to support stable activity bumps in virtually any location. In addition, we observe inhibition surrounding the local excitatory profile in both directions. This inhibition emerges despite the fact that we provide global inhibition to all HD cells (${I}_{inh}^{\text{HD}}$ parameter, Materials and methods), in line with suggestions from previous work (Kim et al., 2017). Surrounding inhibition was a feature we observed consistently in learned networks of different sizes and for different global inhibition levels. Finally, the angular offset of the two negative sidelobes in the connectivity depends on the size and shape of the entrained HD bump (for details, see Appendix 5).
Furthermore, we find a consistent pattern of both LHR and RHR populations to excite the direction for which they are selective (Figure 3C), which is also similar to what is observed in the CX (TurnerEvans et al., 2020). Excitation in one direction is accompanied by inhibition in the reverse direction in the learned network. As a result of the symmetry in our learning paradigm, the connectivity profiles of LHR and RHR cells are mirrored versions of each other, which is also clearly visible in Figure 3C. The inhibition of the reverse direction has a width comparable to the bump size and acts as a ‘break’ to prevent the bump from moving in this direction. The excitation in the selective direction, on the other hand, has a wider profile, which allows the network to path integrate for a wide range of angular velocities, that is for high angular velocities neurons further downstream can be ‘primed’ and activated in rapid succession. Indeed, when we remove the wide projections from the excitatory connectivity, PI performance is impaired for the higher angular velocities exclusively (Figure 3—figure supplement 1). The even weight profile in ${W}^{\text{rec}}$ and the mirror symmetry for LHR vs. RHR profiles in ${W}^{\text{HR}}$, together with the circular symmetry of the weights throughout the ring, guarantee that there is no side bias (i.e. tendency of the bump to favor one direction of movement versus the other) during PI. Indeed, the PI error distribution in Figure 2B remains symmetric throughout the 60 s simulations.
Next, we focus our attention on the dynamics of learning. For training times larger than a few hours, the absolute learning error drops and settles to a low value, indicating that learning has converged after ∼20 hr (or 4000 cycles, each cycle lasting $1/\eta $) of training time (Figure 3D). The nonzero value of the final error is only due to errors occurring at the edges of the bump (Figure 3—figure supplement 2A, top panel). An intuitive explanation of why these errors persist is that the velocity pathway is learning to predict the visual input; as a result, when the visual input is present, the velocity pathway creates errors that are consistent with PI velocity biases in darkness.
Figure 3E and F shows the weight development history for the entire simulation. The first structure that emerges during learning is the local excitatory recurrent connections in ${W}^{\text{rec}}$. For these early stages of learning, the initial connectivity is controlled by the autocorrelation of the visual input, which gets imprinted in the recurrent connections by means of Hebbian coactivation of adjacent HD neurons. As a result, the width of the local excitatory profile mirrors the width of the visual input. Once a clear bump is established in the HD ring, the HR connections are learned to support bump movement, and negative sidelobes in ${W}^{\text{rec}}$ emerge. To understand the shape of the learned connectivity profiles and the dynamics of their development, we study a reduced version of the full model, which follows learning in bumpcentric coordinates (see Appendix 5). The reduced model produces a connectivity strikingly similar to the full model, and highlights the important role of nonlinearities in the system.
So far, we have shown results in which our model far outperforms flies in terms of PI accuracy. To bridge this gap, we add noise to the weight connectivity in Figure 3A and B and obtain the connectivity matrices in Figure 3—figure supplement 3A,B, respectively. This perturbation of the weights could account for irregularities in the fly HD system owning to biological factors such as uneven synaptic densities. The resulting neural velocity gain curve in Figure 3—figure supplement 3E is impaired mainly for small angular velocities (Figure 2C). Interestingly, it now bears greater similarity to the one observed in flies, because the previously flat area for small angular velocities is wider (flat for $v<60$ deg/s, cf. extended data fig. 7G,J in Seelig and Jayaraman, 2015). This happens because the noisy connectivity is less effective in initiating bump movement. Finally, the PI errors in the network with noisy connectivity grow much faster and display a strong side bias (Figure 3—figure supplement 3D, Figure 2B). The latter can be attributed to the fact that the noise in the connectivity generates local minima that are easier to transverse from one direction vs. the other. Side bias can also emerge if the learning rate $\eta $ in Equation 16 is increased, effectively forcing learning to converge faster to a local minimum, which results in slight deviations from circularly symmetric connectivity (data not shown). It is therefore expected that different animals will display different degrees and directions of side bias during PI, owning either to fast learning or asymmetries in the underlying neurobiology. Since the exact behavior of the network with noise in the connectivity depends on the specific realization, we also generate multiple such networks and estimate the diffusion coefficient during path integration, which quantifies how fast the width of the PI error distribution in Figure 3—figure supplement 3D increases. We find the grand average to be $82.3\pm 15.7{\text{deg}}^{2}/\text{s}$, which is considerably larger (Student’s ttest, 95% conf. intervals for a total of 12 networks) than the diffusion coefficient for networks without a perturbation in the weights ($24.5{\text{deg}}^{2}/\text{s}$ in Figure 2B). Finally, in Appendix 1 we also incorporate random Gaussian noise to all inputs, which can account for noisy percepts or stochasticity of spiking, and show that learning is not disrupted even for high noise levels.
Fast adaptation of neural velocity gain
Having shown how PI and CAN properties are learned in our model, we now turn our attention to the flexibility that our learning setup affords. Motivated by augmentedreality experiments in rodents where the relative gain of visual and selfmotion inputs is manipulated (Jayakumar et al., 2019), we test whether our network can rewire to learn an arbitrary gain between the two. In other words, we attempt to learn an arbitrary gain $g$ between the idiothetic angular velocity $v$ sensed by the HR cells and the neural velocity $g\cdot v$ dictated by the allothetic visual input. This simulates the conditions in an augmented reality environment, where the speed at which the world around the animal rotates is determined by the experimenter, but the proprioceptive sense of head angular velocity remains the same.
Starting with the learned network shown in Figure 3, which displayed gain $g=1$, we suddenly switch to a different gain, that is we learn weights for $g\in \{0.25,0.5,1.5,2\}$. In all cases, we observe that the network readily rewires to achieve the new gain. The mean learning error after the gain switch is initially high, but reaches a lower, constant level after at most 3 hr of training (Figure 4A). We note that convergence is much faster compared to the time it takes for the gain1 network to emerge from scratch (compare to Figure 3D), especially for the smaller gain changes. Importantly, Figure 4B shows that PI performance in the resulting networks is excellent for the new gains, with some degradation only for very low and very high angular velocities. There are two reasons why high angular velocities are not learned that well: limited training of these velocities, and saturation of HR cell activity. Both reasons are by design and do not reflect a fundamental limit of the network.
In Appendix 2, we show that without the aforementioned limitations the network learns to pathintegrate up to an angular velocity limit set by synaptic delays and that the bump width sets a tradeoff between location and velocityintegration accuracy in the HD system.
Figure 4C and D compare the weight profiles of the circularly symmetric matrices ${W}^{\text{rec}}$ and ${W}^{\text{HR}}$ resulting from the initial gain $g=1$, with the weight profiles resulting from adaptation to the most extreme gains shown in Figure 4, that is $g\in \{0.25,2\}$. An increase in gain slightly suppresses the recurrent connections and slightly amplifies the HRtoHD connections, while a decrease in gain substantially amplifies the recurrent connections and slightly suppresses the HRtoHD connections. The latter explains why the flat region for small angular velocities in Figure 4B has been extended for $g\in \{0.25,0.5\}$: it is now harder for small angular velocities to overcome the attractor formed by stronger recurrent weights and move the bump.
Finally, we address the limits of the ability of the network to rewire to new gains (Figure 4—figure supplement 1). We find that after rewiring the performance is excellent for gains between 0.25 and 4.5. The network can even reverse its gain to $g=1$, that is, when allothetic and idiothetic inputs are signaling movement in opposite directions. However, for larger gain changes, learning takes longer.
Discussion
The ability of animals to navigate in the absence of external cues is crucial for their survival. Head direction, place, and grid cells provide internal representations of space (Ranck, 1984; Moser et al., 2008) that can persist in darkness and possibly support path integration (PI) (Mizumori and Williams, 1993; Quirk et al., 1990; Hafting et al., 2005). Extensive theoretical work has focused on how the spatial navigation system might rely on continuous attractor networks (CANs) to maintain and update a neural representation of the animal’s current location. Special attention was devoted to models representing orientation, with the ring attractor network being one of the most famous of these models (Amari, 1977; BenYishai et al., 1995; Skaggs et al., 1995; Seung, 1996). So far, modelling of the HD system has been relying on handtuned synaptic connectivity (Zhang, 1996; Xie et al., 2002; TurnerEvans et al., 2017; Page et al., 2019) without reference to its origin; or has been relying on synaptic plasticity rules that either did not achieve gain1 PI (Stringer et al., 2002) or were not biologically plausible (Hahnloser, 2003).
Summary of findings
Inspired by the recent discovery of a ring attractor network for HD in Drosophila (Seelig and Jayaraman, 2015), we show how a biologically plausible learning rule leads to the emergence of a circuit that achieves gain1 PI in darkness. The learned network features striking similarities in terms of connectivity to the one experimentally observed in the fly (TurnerEvans et al., 2020), and reproduces experiments on CAN dynamics (Kim et al., 2017) and gain changes between external and selfmotion cues in rodents (Jayakumar et al., 2019). Furthermore, an impairment of PI for small angular velocities is observed in the mature network, which is a feature that has been reported in experiments (Seelig and Jayaraman, 2015). Finally, the proposed learning rule can serve to compensate deviations from circular symmetry in the synaptic weight profiles; such deviations are expected in biological systems and — if not compensated — could lead to large PI errors.
The mature circuit displays two properties characteristic of CANs: (1) it can support and actively maintain a local bump of activity at a virtual continuum of locations, and (2) it can move the bump across the network by integrating selfmotion cues. Note that we did not explicitly train the network to achieve these CAN properties, but they rather emerged in a selforganized manner.
To achieve gain1 PI performance, our network must attribute learning errors to the appropriate weights. The learning rule we adopt in Equation 1 is a ‘deltalike’ rule, with a learning error that gates learning in the network, and a Hebbian component that comes in the form of the postsynaptic potential and assigns credit to synapses that are active when errors are large. The learning rule leads to the emergence of both symmetric local connectivity between HD cells (which is required for bump maintenance and stability), and asymmetric connectivity from HR to HD cells (which is required for bump movement in darkness). The first happens because adjacent neurons are coactive due to correlated visual input; the second because only one HR population is predominantly active during rotation: the population that corresponds to the current rotation direction. Crucial to the understanding of the learning dynamics of the model was the development of a reduced model, which follows learning in bumpcentric coordinates and is analytically tractable (see Appendix 5). The reduced model can be extended to higher dimensional manifolds (Gardner et al., 2022), and therefore it offers a general framework to study how activitydependent synaptic plasticity shapes CANs.
Relation to experimental literature
Our work comes at a time at which the fly HD system receives a lot of attention (Seelig and Jayaraman, 2015; TurnerEvans et al., 2020; Kim et al., 2017; Kim et al., 2019; Fisher et al., 2019), and suggests a mechanism of how this circuit could selforganize during development. Synaptic plasticity has been shown to be important in this circuit for anchoring the visual input to the HD neurons when the animal is exposed to a new environment (Kim et al., 2019; Fisher et al., 2019). This has also been demonstrated in models of the mammalian HD system (Skaggs et al., 1995; Zhang, 1996; Song and Wang, 2005). Here, we assume that an initial anchoring of the topographic visual input to the HD neurons with arbitrary offset with respect to external landmarks already exists prior to the development of the PI circuit; such an anchoring could even be prewired. In our model, it is sufficient that the visual input tuning is local and topographically arranged. Once the PI circuit has developed, visual connections could be anchored to different environments, as shown by Kim et al., 2019 and Fisher et al., 2019. Alternatively, the HD system itself could come prewired with an initial gross connectivity, sufficient to anchor the visual input; in this case, our learning rule would enable fine tuning of this connectivity for gain1 PI. In either case, for the sake of simplicity and without loss of generality, we study the development of the pathintegrating circuit while the animal moves in the same environment, and keep the visual input tuning fixed. Therefore, the present work addresses the important question of how the PI circuit itself could be formed, and it is complementary to the problem of how allothetic inputs to the PI circuit are wired (Fisher et al., 2019; Kim et al., 2019). The interplay of the two forms of plasticity during development would be of particular future interest.
A requirement for the learning rule we use is that information about the firing rate of HD neurons is available at the axondistal compartment. There is no evidence for active backpropagation of APs in EPG neurons in the fly, but passive backpropagation would suffice in this setting. In fact, passive spread of activity has been shown to attenuate weakly in central fly neurons (Gouwens and Wilson, 2009). In HD neurons, the axonproximal and axondistal compartments belong to the same dendritic tuft (Figure 1E), and since we assume that the axon initial segment is close to the axonproximal compartment, the generated AP would need to propagate only a short distance compared to the effective electrotonic length. This means that APs would not be attenuated much on their way from the axon initial segment to the axondistal compartment, and thus would maintain some of their highfrequency component, which could be used at synapses to differentiate them from slower postsynaptic potentials.
In Figure 4, we show that our network can adapt to altered gains much faster than the time required to learn the network from scratch. Our simulations are akin to experiments where rodents are placed in a VR environment and the relative gain between visual and proprioceptive signals is altered by the experimenter (Jayakumar et al., 2019). In this scenario, Jayakumar et al., 2019 found that the PI gain of place cells can be recalibrated rapidly. In contrast, Seelig and Jayaraman, 2015 found that PI gain in darkness is not significantly affected when flies are exposed to different gains in light conditions. We note, however, that Seelig and Jayaraman, 2015 tested mature animals (8–11 days old), whereas plasticity in the main HD network is presumably stronger in younger animals. Also note that the manipulation we use to address adaptation of PI to different gains differs from the one in Kim et al., 2019 who used optogenetic stimulation of the HD network combined with rotation of the visual scene to trigger a remapping of the visual input to the HD cells in a Hebbian manner. The findings in Jayakumar et al., 2019 can only be reconciled by plasticity in the PI circuit, and not in the sensory inputs to the circuit.
In order to address the core mechanisms that underlie the emergence of a path integrating network, we use a model that is a simplified version of the biological circuit. For example, we did not model inhibitory neurons explicitly and omitted some of the recurrent connectivity in the circuit, whose functional role is uncertain (TurnerEvans et al., 2020). We also choose to separate PI from other complex processes that occur in the CX (Raccuglia et al., 2019). Finally, we do not force the network to obey Dale’s law and do not model spiking explicitly.
Nevertheless, after learning, we obtain a network connectivity that is strikingly similar to the one of the fly HD system. Indeed, the mature model exhibits local excitatory connectivity in the HD neurons (Figure 3A and C), which in the fly is mediated by the excitatory loop from EPG to PEG to PEN2 and back to EPG (TurnerEvans et al., 2020), a feature that handtuned models of the fly HD system did not include (TurnerEvans et al., 2017). Furthermore, the HR neurons have excitatory projections towards the directions they are selective for (Figure 3B and C), similar to PEN1 neurons in the fly. Interestingly, these key features that we uncover from learning have been utilized in other handtuned models of the system (TurnerEvans et al., 2017; Kim et al., 2017; Kim et al., 2019). Future work could endeavor to come closer to the architecture of the fly HD system and benefit from the incorporation of more neuron types and the richness of recurrent connectivity that has been discovered in the fly (TurnerEvans et al., 2020).
Compared to the fly, our network achieved better PI performance. As a simple way to match the performances, we added noise to the learned connectivity in the model; however, this is not an explanation why the fly performs worse. Indeed, there could be multiple reasons why PI performance is worse in the biological circuit. For instance, a confounder that would affect performance but not necessarily learning could be the presence of inputs that are unrelated to path integration, for example, inputs related to circadian cycles and sleep (Raccuglia et al., 2019). In the presence of such confounders, a precise tuning of the weights might be crucial in order to reach the performance of the fly. In other words, only if the model outperforms the biological circuit in a simplified setting, it has a chance to perform as well in a realistic setting, with all the additional complexities the latter comes with.
Relation to theoretical literature
A common problem with CANs is that they require fine tuning: even a slight deviation from the optimal synaptic weight tuning leads to catastrophic drifting (Goldman et al., 2009). A way around this problem is to sacrifice the continuity of the attractor states in favor of a discrete number of stable states that are much more robust to noise or weight perturbations (Kilpatrick et al., 2013). In our network, the small number of HD neurons enables a coarsegrained representation of heading; the network is a CAN only in a quasicontinuous manner, and the number of discrete attractors corresponds to the number of HD neurons. This makes it harder to transition to adjacent attractors, since a ‘barrier’ has to be overcome in the quasicontinuous case (Kilpatrick et al., 2013). The somewhat counterintuitive conclusion follows that a CAN with more neurons and, as a result, finer angular resolution, will not be as potent in maintaining activity, and diffusion to nearby attractors will be easier since the barrier will be lower. Indeed, we found that doubling the number of neurons produces a CAN that is less robust to noise. Overall, the quasicontinuous and coarse nature of the attractor shields the internal representation of heading against the everpresent biological noise, which would otherwise lead to diffusion of the bump with time. The fact that the network can still pathintegrate accurately with this coarsegrained representation of heading is remarkable.
Seminal theoretical work on ring attractors has proven that in order to achieve gain1 PI, the asymmetric component of the network connectivity (corresponding here to ${W}^{\text{HR}}$) needs to be proportional to the derivative of the symmetric component (corresponding to ${W}^{\text{rec}}$) (Zhang, 1996). However, this result rests on the assumption that asymmetric and symmetric weight profiles are mediated by the same neuronal population, as in the doublering architecture proposed by Xie et al., 2002 and Hahnloser, 2003, but does not readily apply to the architecture of the fly HD system where HD and HR cells are separate. In our learned network, we find that the HR weight profile is not proportional to the derivative of the recurrent weight profile, therefore this requirement is not necessary for gain1 PI in our setting. Note that our learning setup can also learn gain1 PI for a doublering architecture, which additionally obeys Dale’s law (Vafidis, 2019). Finally, we emphasize that circular symmetry is not a necessary condition for a ring attractor (Darshan and Rivkind, 2021). Rather, symmetry in our model results from the symmetry in the architecture, the symmetrically prewired weights, and the symmetric stimulus space. If any of those were to be relaxed, the resulting network would not be circular symmetric; then, the reduced model analysis that we perform in Appendix 5 would also not be feasible, because local asymmetries in the setup would result in nonlocal deviations from circular symmetry of the learned weights, which was our main assumption there. Nevertheless, we demonstrated that the full model can handle such asymmetries in the setup and learn accurate PI (see Appendix 3).
Our learning setup, inspired by Urbanczik and Senn, 2014, is similar to the one in Guerguiev et al., 2017 in the sense that both involve compartmentalized neurons that receive ‘target’ signals in a distinct compartment. It differs, however, in the algorithm and learning rule used. Guerguiev et al., 2017 use local gradient descent during a ‘target’ phase, which is separate from a forward propagation phase, akin to forward/backward propagation stages in conventional deep learning. In contrast, we use a modified Hebbian rule, and in our model ‘forward’ computation and learning happen at the same time; time multiplexing, whose origin in the brain is unclear, is not required. Our setting would be more akin to the one in Guerguiev et al., 2017 if an episode of PI in darkness would be required before an episode of learning in light conditions, which does not seem in line with the way animals naturally learn.
Previous theoretical work showed that head direction cells, head rotation cells, and grid cells emerge in neural networks trained for PI (Banino et al., 2018; Cueva and Wei, 2018). These networks were trained with backpropagation, therefore achieving gain1 PI was not their primary focus; rather, this work elegantly demonstrated that the aforementioned cell types are efficient representations for spatial navigation that could be learned from experience.
Testable predictions
We devote this section to discussing predictions of our model, and we suggest future experiments in flies and, potentially, other animal models. An obvious prediction of our model is that synaptic plasticity is critical for the development of the PI network for heading, and the lack of a supervisory allothetic sensory input (e.g. visual) during development should disrupt the formation of the PI system. Previous experimental work showed that head direction cells in rat pups displayed mature properties already in their first exploration of the environment outside their nest (Langston et al., 2010), which may seem to contradict our assumption that the PI circuit wires during development; however, directional selectivity of HD cells in the absence of allothetic inputs and PI performance were not tested in this study. In addition, it has been shown that visually impaired flies were not able to learn to accurately estimate the size of their body. This type of learning also requires visual inputs and, upon consolidation, remains stable (Krause et al., 2019).
We also predict that HD neurons have a compartmental structure where idiothetic inputs are separated from allothetic sensory inputs, which initiate action potentials more readily due to being electrotonically closer to the axon initial segment. While we already demonstrate the separation of allothetic and idiothetic inputs to EPG neurons in the fly EB (Figure 1E, Figure 1—figure supplement 1), our prediction can only be tested with electrophysiological experiments. Another model prediction that can be tested only with electrophysiology is that APs backpropagate from the axonproximal compartment (at least passively but with little attenuation) to the axondistal compartment. Then spikes could be separated from postsynaptic potentials locally at the synapse by cellular mechanisms sensitive to the spectral density of the voltage.
Finally, similarly to place cell studies in rodents (Jayakumar et al., 2019), we predict that during development the PI system can adapt to experimenterdefined gain manipulations, and that it can do so faster than the time required for the system to develop from scratch. Therefore, a suggestion from this study would be to repeat in young flies the adaptation experiments by Seelig and Jayaraman, 2015.
Outlook
The present study adds to the growing literature of potential computational abilities of compartmentalized neurons (Poirazi et al., 2003; Gidon et al., 2020; Payeur et al., 2021). The associative HD neuron used in this study is a coincidence detector, which serves to associate external and internal inputs arriving at different compartments of the cell. Coupled with memoryspecific gating of internally generated inputs, coincidence detection has been suggested to be the fundamental mechanism that allows the mammalian cortex to form and update internal knowledge about external contingencies (Doron et al., 2020; Shin et al., 2021). This structured form of learning does not require engineered ‘hints’ during training, and it might be the reason why neural circuits evolved to be so efficient at reasoning about the world, with the mammalian cortex being the pinnacle of this achievement. Here, we demonstrate that learning at the cellular level can predict external inputs (visual information) by associating firing activity with internally generated signals (velocity inputs) during training. This effect is due to the antiHebbian component of the learning rule in Equation 12, where the product of postsynaptic axondistal and presynaptic activity comes with a negative sign. Specifically, it has previously been demonstrated that antiHebbian synaptic plasticity can stabilize persistent activity (Xie and Seung, 2000) and perform predictive coding (Bell et al., 1997; Hahnloser, 2003). At the population level, this provides a powerful mechanism to internally produce activity patterns that are identical to the ones induced from an external stimulus. This mechanism can serve as a way to anticipate external events or, as in our case, as a way of ‘filling in’ missing information in the absence of external inputs.
Local, Hebblike learning rules are considered a weak form of learning, due to their inability to utilize error information in a sophisticated manner. Despite that, we show that local associative learning can be particularly successful in learning appropriate finetuned synaptic connectivity, when operating within a cell structured for coincidence detection. Therefore, in learning and reasoning about the environment, our study highlights the importance of inductive biases with developmental origin (e.g. allothetic and idiothetic inputs arrive in different compartments of associative neurons) (Lake et al., 2017).
In conclusion, the present work addresses the ageold question of how to develop a CAN that performs accurate, gain1 PI in the absence of external sensory cues. We show that this feat can be achieved in a network model of the HD system by means of a biologically plausible learning rule at the cellular level. Even though our network architecture is tailored to the one of the fly CX, the learning setup where idiothetic and allothetic cues are associated at the cellular level is general and can be applied to other PI circuits. Of particular interest is the rodent HD system: despite the lack of evidence for a topographically organized recurrent HD network in rodents, a onedimensional HD manifold has been extracted in an unsupervised way (Chaudhuri et al., 2019). Therefore, our work lays the path to study the development of ringlike neural manifolds in mammals. Finally, it has recently been shown that grid cells in mammals form a continuous attractor manifold with toroidal topology (Gardner et al., 2022). It would be interesting to see if a similar mechanism underlies the emergence of PI in place and grid cells. Our model can be extended to higher dimensional CAN manifolds and provides a framework to interrogate this assumption.
Materials and methods
In what follows, we describe our computational model for learning a ring attractor network that accomplishes accurate angular PI. The model described here focuses on the HD system of the fly; however, the proposed computational setup is general and could be applied to other systems. Unless otherwise stated, the simulation parameter values are the ones summarized in Table 1. Simulation results for a given choice of parameters are very consistent across runs, hence most figures are generated from a single simulation run, unless otherwise stated.
.
Network architecture
Request a detailed protocolWe model a recurrent neural network comprising ${N}^{\text{HD}}=60$ headdirection (HD) and ${N}^{\text{HR}}=60$ headrotation (HR) cells, which are close to the number of EPG and PEN1 cells in the fly central complex (CX), respectively (TurnerEvans et al., 2020; Xu, 2020). A scaleddown version of the network for ${N}^{\text{HR}}={N}^{\text{HD}}=12$ is shown in Figure 1A. The average spiking activity of HD and HR cells is modelled by firingrate neurons. HD cells are organized in a ring and receive visual input, which encodes the angular position of the animal’s head with respect to external landmarks. We use a discrete representation of angles and we model two HD cells for each head direction, as observed in the biological system (TurnerEvans et al., 2017). Therefore the network can represent head direction with an angular resolution $\mathrm{\Delta}\varphi =12\text{deg}$.
Motivated by the anatomy of the fly CX (Green et al., 2017; TurnerEvans et al., 2020), HR cells are divided in two populations (Figure 1A): a ‘leftward’ (LHR) population (with increased velocity input when the head turns leftwards) and a ’rightward’ (RHR) population (with increased velocity input when the head turns rightwards). After learning, these two HR populations are responsible to move the HD bump in the anticlockwise and clockwise directions, respectively.
The recurrent connections among HD cells and the connections from HR to HD cells are assumed to be plastic. On the contrary, connections from HD to HR cells are assumed fixed and determined as follows: for every head direction, one HD neuron projects to a cell in the LHR population, and the other to a cell in the RHR population. Because HD cells project to HR cells in a 1to1 manner, each HR neuron is simultaneously tuned to a particular head direction and a particular head rotation direction. The synaptic strength of the HDtoHR projections is the same for all projections (these restrictions on the HDtoHR connections are relaxed in Appendix 3). Finally, HR cells do not form recurrent connections.
Neuronal model
Request a detailed protocolWe assume that each HD neuron is a ratebased associative neuron (Figure 1D), that is, a twocompartmental neuron comprising an axonproximal and an axondistal dendritic compartment (Urbanczik and Senn, 2014; Brea et al., 2016). The two compartments model the dendrites of that neuron that are closer to or further away from the axon initial segment. Note that here the axonproximal compartment replaces the somatic compartment in the original model by Urbanczik and Senn, 2014. This is because the somata of fly neurons are typically electrotonically segregated from the rest of the cell and they are assumed to contribute little to computation (Gouwens and Wilson, 2009; Tuthill, 2009). We also note that to fully capture the input/output transformations that HD neurons in the fly perform, more compartments than two might be needed (Xu, 2020). Finally, only HD cells are associative neurons, whereas HR cells are simple ratebased point neurons.
HD cells receive an input current ${\mathit{I}}^{d}$ to the axondistal dendrites, which obeys
where ${\mathit{I}}^{d}$ is a vector of length ${N}^{\text{HD}}$ with each entry corresponding to one HD cell. In Equation 2, ${\tau}_{s}$ is the synaptic time constant, ${W}^{\text{rec}}$ is a ${N}^{\text{HD}}\times {N}^{\text{HD}}$ matrix of the recurrent synaptic weights among HD cells, ${W}^{\text{HR}}$ is a ${N}^{\text{HD}}\times {N}^{\text{HR}}$ matrix of the synaptic weights from HR to HD cells, ${\mathit{r}}^{\text{HR}}$ and ${\mathit{r}}^{\text{HD}}$ are vectors of the firing rates of HR and HD cells respectively, ${I}_{inh}^{\text{HD}}$ is a constant inhibitory input common to all HD cells, and ${\mathit{n}}^{d}$ is a random noise input to the axondistal compartment. ${\mathit{n}}^{\mathit{d}}$ is drawn IID from $N(0,1)$, and its variance is scaled by ${\sigma}_{n}^{2}$. Note that in the main text we set ${\sigma}_{n}$ to zero, but we explore different values for this parameter in Appendix 1. The constant current ${I}_{inh}^{\text{HD}}$ is in line with a globalinhibition model with local recurrent connectivity, as opposed to having longrange inhibitory recurrent connectivity (Kim et al., 2017). The inhibitory current ${I}_{inh}^{\text{HD}}$ suppresses HD bumps in general; however the exact strength of this inhibition is not important in our model.
Since several electrophysiological parameters of the fly neurons modeled here are unknown, we use dimensionless conductance values. Therefore, in Equation 2, which describes the dynamics of the axondistal input of HD cells, currents (e.g. ${\mathit{I}}^{d}$, ${I}_{inh}^{\text{HD}}$, and ${\mathit{n}}^{d}$) are dimensionless. Membrane voltages are also chosen to be dimensionless, and because we measure firing rates in units of 1 /s, all synaptic weights (e.g. ${W}^{\text{rec}}$ and ${W}^{\text{HR}}$) then have, strictly speaking, the unit ‘seconds’ (s), even though we mostly suppress this unit in the text. Importantly, all time constants (e.g. ${\tau}_{s}$), which define the time scale of dynamics, are measured in units of time (in seconds).
Our model incorporates several time scales, whose interplay is not obvious. To facilitate understanding, we summarize the parameters that define the time scales in Appendix 4—table 1, and discuss their relation in Appendix 4.
The axondistal voltage ${\mathit{V}}^{d}$ of HD cells is a lowpass filtered version of the input current ${\mathit{I}}^{d}$, that is,
where ${\tau}_{l}$ is the leak time constant of the axondistal compartment. The voltage ${\mathit{V}}^{d}$ and the current ${\mathit{I}}^{d}$ have the same unit (both dimensionless), which means that the leak resistance of the axondistal compartment is also dimensionless, and we assume that it is unity for simplicity. We choose values of ${\tau}_{l}$ and ${\tau}_{s}$ (for specific values, see Table 1) so that their sum matches the phenomenological time constant of HD neurons (EPG in the fly), while ${\tau}_{s}$ equals to the phenomenological time constant of HR neurons (PEN1 in the fly, TurnerEvans et al., 2017). Note that ${\mathit{V}}^{d}$ is the lowfrequency component of the axondistal voltage originating from postsynaptic potentials, that is excluding occasional highfrequency contributions from backpropagating action potentials.
The axonproximal voltage ${\mathit{V}}^{\mathit{a}}$ of HD cells is then given by
where $C$ is the capacitance of the membrane of the axonproximal compartment, ${g}_{L}$ is the leak conductance, ${g}_{D}$ is the conductance of the coupling from axondistal to axonproximal dendrites, ${\mathit{I}}^{vis}$ is a vector of visual input currents to the axonproximal compartment of HD cells, ${\mathit{I}}_{exc}^{\text{HD}}$ is an excitatory input to the axonproximal compartment, and ${\mathit{n}}^{\mathit{a}}$ is a random noise vector injected to the axonproximal compartment, drawn IID from $N(0,1)$. The excitatory current ${\mathit{I}}_{exc}^{\text{HD}}$ is assumed to be present only in light conditions. The values of $C$, ${g}_{L}$, and ${g}_{D}$ in the fly HD (EPG) neurons are unknown, thus we keep these parameters unitless, and set their values to the ones in Urbanczik and Senn, 2014. Note that since conductances are dimensionless here, $C$ is effectively a time constant.
Following Hahnloser, 2003, the visual input to the ith HD cell is a localized bump of activity at angular location ${\theta}_{i}$:
where $M$ scales the bump’s amplitude, $\sigma $ controls the width of the bump, ${\theta}_{i}$ is the preferred orientation of the ith HD neuron , ${\theta}_{0}(t)$ is the position of a visual landmark at time $t$ in headcentered coordinates, and ${I}_{o}^{vis}<0$ is a constant inhibitory current that acts as the baseline for the visual input. We choose $M$ so that the visual input can induce a weak bump in the network at the beginning of learning, and we choose $\sigma $ so that the resulting bump after learning is ∼60 deg wide. Note that the bump in the mature network has a square shape (Figure 3—figure supplement 2B); therefore we elect to make it slightly narrower than the average full width at half maximum of the experimentally observed bump (∼80 deg; Seelig and Jayaraman, 2015; Kim et al., 2017; TurnerEvans et al., 2017). In addition, the current ${I}_{o}^{vis}$ is negative enough to make the visual input purely inhibitory, as reported (Fisher et al., 2019). The visual input is more inhibitory in the surround to suppress activity outside of the HD receptive field. Therefore, the mechanism in which the visual input acts on the HD neurons is disinhibition.
The firing rate of HD cells, which is set by the voltage in the axonproximal compartment, is given by
where
is a sigmoidal activation function applied elementwise to the vector ${\mathit{V}}^{\mathit{a}}$. The variable ${f}_{max}$ sets the maximum firing rate of the neuron, $\beta $ is the slope of the activation function, and ${x}_{1/2}$ is the input level at which half of the maximum firing rate is attained. The value of ${f}_{max}$ is arbitrary, while $\beta $ is chosen such that the activation function has sufficient dynamic range, and ${x}_{1/2}$ is chosen such that for small negative inputs the activation function is nonzero.
We note that the saturation of the activation function $f$ in Equation 7 is an essential feature of our model, especially for the convergence of the plasticity rule in Equation 12; see also the section ‘Synaptic Plasticity Rule’. Even though, to the best of our knowledge, it is currently not known whether EPG neurons actually reach saturation, other Drosophila neurons are known to reach saturation with increasing inputs, instead of some sort of depolarization block (Wilson, 2013; Brandão et al., 2021). Saturation with increasing inputs may be due to, for instance, shortterm synaptic depression: beyond a certain frequency of incoming action potentials, the synaptic input current is almost independent of that frequency (Tsodyks and Markram, 1997; Tsodyks et al., 1998).
The firing rates of the HR cells are given by
where ${\mathit{r}}^{\text{HR}}$ is the vector of length ${N}^{\text{HR}}$ of firing rates of HR cells, the ${N}^{\text{HR}}\times {N}^{\text{HD}}$ matrix ${W}^{\text{HD}}$ encodes the fixed connections from the HD to the HR cells, ${\mathit{r}}_{\text{LP}}^{\text{HD}}$ is a lowpass filtered version of the firing rate of the HD cells where the filter accounts for delays due to synaptic transmission in the incoming synapses from HD cells, ${\mathit{I}}^{vel}$ is the angular velocity input, ${I}_{inh}^{\text{HR}}$ is a constant inhibitory input common to all HR cells, and ${\mathit{n}}^{\text{HR}}$ is a random noise input to the HR cells drawn IID from $N(0,1)$. We set ${I}_{inh}^{\text{HR}}$ to a value that still allows sufficient activity in the HR cell bump, even when the animal does not move. The lowpass filtered firingrate vector ${\mathit{r}}_{\text{LP}}^{\text{HD}}$ is given by
and the angularvelocity input to the ith HR neuron is given by
where $k$ is the proportionality constant between head angular velocity and velocity input to the network, $v(t)$ is the head angular velocity at time $t$ in units of deg/s, and the factor $q$ is chosen such that the left (right) half of the HR cells are primarily active during leftward (rightward) head rotation. Note that the same ${\tau}_{s}$ is in both Equation 2 and Equation 9. Finally, as mentioned earlier, the matrix ${W}^{\text{HD}}$ encodes the hardwired 1to1 HDtoHR connections, i.e., ${W}_{ij}^{\text{HD}}={w}^{\text{HD}}$ if the jth HD neuron projects to the ith HR neuron, and ${W}_{ij}^{\text{HD}}=0$ otherwise. Specifically, for $j$ odd, HD neuron $j$ projects to LHR neuron $i=\frac{j+1}{2}$, whereas for $j$ even, HD neuron $j$ projects to RHR neuron $i=30+\frac{j}{2}$. The synaptic strength ${w}^{\text{HD}}$ is chosen such that the range of the firing rates of the HD cells is mapped to the entire range of firing rates of the HR cells. Specifically, we set ${w}^{\text{HD}}=\frac{{A}_{active}}{{f}_{max}}$, where ${A}_{active}$ is the range of inputs for which $f$ has not saturated, i.e., the input values for which $f$ remains between about 7% and 93% of its maximum firing rate ${f}_{max}$ (see Equation 7). Finally, the proportionality constant $k$ is set so that the firing rate of HR neurons does not reach saturation for the range of velocities relevant for the fly (approx. [500, 500] deg/s), given all other inputs they receive.
Synaptic plasticity rule
Request a detailed protocolIn our network, the associative HD neurons receive direct visual input in the axonproximal compartment and indirect angular velocity input in the axondistal compartment through the HRtoHD connections (Figure 1D). We hypothesize that the visual input acts as a supervisory signal that controls the axonproximal voltage ${\mathit{V}}^{\mathit{a}}$ directly, and the latter initiates spikes. Therefore, the goal of learning is for the axondistal voltage ${\mathit{V}}^{d}$ to predict the axonproximal voltage by changing the synaptic weights ${W}^{\text{rec}}$ and ${W}^{\text{HR}}$. This change is achieved by minimizing the difference between the firing rate $f({\mathit{V}}^{\mathit{a}})$ in the presence of visual input and the axondistal prediction $f({\mathit{V}}^{ss})$ of the firing rate in the absence of visual input. In the latter case and at steadystate, the voltage ${V}_{i}^{ss}$ for the ith HD neuron is an attenuated version of the axondistal voltage,
with conductance ${g}_{D}$ of the coupling from the axondistal to axonproximal dendrites and leak conductance ${g}_{L}$ of the axonproximal compartment, as explained in Equation 4, and $p=\frac{{g}_{D}}{{g}_{D}+{g}_{L}}$ in Equation 1. Therefore, following Urbanczik and Senn, 2014, we define the plasticityinduction variable ${\text{PI}}_{ij}$ for the connection between the jth presynaptic neuron and ith postsynaptic neuron as
where ${\text{P}}_{j}$ is the postsynaptic potential of neuron $j$, which is a lowpass filtered version of the presynaptic firing rate r_{j}. That is,
where * denotes convolution. The transfer function
is derived from the filtering dynamics in Equation 2 and Equation 3 and accounts for the delays introduced by the synaptic time constant ${\tau}_{s}$ and the leak time constant ${\tau}_{l}$. In Equation 14, $u(t)$ denotes the Heaviside step function, that is, $u(t)=1$ for $t>0$ and $u(t)=0$ otherwise. The plasticityinduction variable is then lowpass filtered to account for slow learning dynamics,
and the final weight change is given by
where $\eta $ is the learning rate and ${W}_{ij}$ is the connection weight from the jth presynaptic neuron to the ith postsynaptic neuron. Note that the synaptic weight ${W}_{ij}$ is an element of either the matrix ${W}^{\text{rec}}$ or the matrix ${W}^{\text{HR}}$ depending on whether the presynaptic neuron $j$ is an HD or an HR neuron, respectively. The value of the plasticity time constant ${\tau}_{\delta}$ is not known, therefore we adopt the value suggested by Urbanczik and Senn, 2014.
Equation 12 is a ‘deltalike’ rule that can be interpreted as an extension of the Hebbian rule; compared to a generic Hebbian rule, we have replaced the postsynaptic firing rate $f({V}_{i}^{a})$ by the difference between $f({V}_{i}^{a})$ and the predicted firing rate $f({V}_{i}^{ss})$ of the axondistal compartment of the postsynaptic neuron. This difference drives plasticity in the model. We note that $f({V}_{i}^{a})$ is a continuous approximation of the spike train of the postsynaptic neuron, which could be available at the axondistal compartment via backpropagating action potentials (Larkum, 2013). Furthermore, the axondistal voltage ${V}_{i}^{d}$ and postsynaptic potentials are by definition available at the synapses arriving at the axondistal compartment. Note that even though $f({V}_{i}^{ss})$ is the firing rate in the absence of visual input, it can still be computed at the axondistal compartment when the visual input is available; ${V}_{i}^{d}$ is the local voltage and therefore only a constant multiplicative factor (Equation 11) and the static nonlinearity $f$ need to be computed to obtain $f({V}_{i}^{ss})$. Therefore, the learning rule is biologically plausible because all information is locally available at the synapse.
The learning rule used here differs from the one in the original work of Urbanczik and Senn, 2014 because we utilize a ratebased version instead of the original spikebased version. Even though spike trains can introduce Poisson noise to $f({V}_{i}^{a})$, Urbanczik and Senn, 2014 show that once learning has converged, asymmetries in the weights due the spiking noise are on average canceled out.
Another difference in our learning setup is that, unlike in Urbanczik and Senn, 2014, the input to the axonproximal compartment does not reach zero in equilibrium (see, e.g. Figure 3D, and Appendix 5). Therefore, an activation function with a saturating nonlinearity, as in Equation 7, is crucial for convergence, which could not be achieved with a less biologically plausible thresholdlinear activation function. This lack of strict convergence in our setup is responsible for the square form of the bump (Figure 3—figure supplement 2B and Appendix 5).
Training protocol
Request a detailed protocolWe train the network with synthetically generated angular velocities, simulating head turns of the animal. ${W}^{\text{rec}}$ and ${W}^{\text{HR}}$ are both initialized with random connectivity drawn from a normal distribution with mean 0 and standard deviation $1/\sqrt{{N}^{\text{HD}}}$, as common practise in the modeling literature. In further simulations with various other initial conditions (e.g. in the simulations with gain changes in Figure 4 or in simulations in which we randomly shuffled weights after learning, not shown), we confirmed that the final PI performance is virtually independent of the initial distribution of weights ${W}^{\text{rec}}$ and ${W}^{\text{HR}}$.
The network dynamics are updated in discrete time steps $\mathrm{\Delta}t$ using forward Euler integration. The entrained angular velocities cover the range of angular velocities exhibited by the fly, which are at maximum ∼500 deg/s during walking or flying (Geurten et al., 2014; Stowers et al., 2017). The angular velocity $v(t)$ is modeled as an OrnsteinUhlenbeck process given by
where $\alpha =\mathrm{\Delta}t/{\tau}_{v}$ and ${\tau}_{v}$ is the time constant with which $v(t)$ decays to zero, $n(t)$ is noise drawn from a normal distribution with mean 0 and standard deviation 1 at each time step, and ${\sigma}_{v}$ scales the noise strength.
We pick ${\sigma}_{v}$ and ${\tau}_{v}$ so that the resulting angular velocity distribution in Figure 3—figure supplement 2C and its time course, for example in Figure 2A, are similar to what has been reported in flies during walking or flying (Geurten et al., 2014; Stowers et al., 2017). Finally, note that we train the network for angular velocities a little larger than what flies typically display (up to ±720 deg/s).
Quantification of the mean learning error
Request a detailed protocolIn Equation 12 we have used the learning error
which controls learning in the ith associative HD neuron. To quantify the mean learning error $\text{err}(t)$ in the whole network at time $t$, we average ${E}_{i}$ across all HD neurons and across a small time interval $[t,t+{t}_{w}]$, that is,
with ${t}_{w}=10$ s. In Figure 3D, we plot this mean error at every 1% of the simulation, for 12 simulations, and averaged across the ensemble of the simulations. Note that individual simulations occasionally display ‘spikes’ in the error. Large errors occur if the network happens to be driven by very high velocities that the network does not learn very well because they are rare; larger errors also occur for very small velocities, that is, when the velocity input is not strong enough to overcome the local attractor dynamics, as seen, for example, in Figure 2C. On average, though, we can clearly see that the mean learning error decreases with increasing time and settles to a small value (e.g. Figure 3D and Figure 4A).
Population vector average
Request a detailed protocolTo decode from the activity of HD neurons an average HD encoded by the network, we use the population vector average (PVA). We thus first convert the tuning direction ${\theta}_{i}$ of the ith HD neuron to the corresponding complex number ${e}^{j{\theta}_{i}}$ on the unitary circle, where $j$ is the imaginary unit. This complex number is multiplied by the firing rate ${r}_{i}^{\text{HD}}$ of the ith HD neuron, and then averaged across neurons to yield the PVA
The PVA is a vector in the 2D complex plane and points to the center of mass of activity in the HD network. Finally, we take the angle $\theta $ of the PVA as a measure for the current heading direction represented by the network.
Diffusion coefficient
Request a detailed protocolTo quantify the variability of heading direction in the trained networks, we define the diffusion coefficient $D$ as:
where $\mathrm{\Delta}\theta $ is the change in heading direction in a time interval ${t}_{\text{sim}}$. Therefore, $D$ is given by the variance of the distribution of displacements in a given time interval, divided by the time interval.
In the main text, we estimate $D$ during PI, i.e. with velocity inputs only. In this setting, $D$ is the rate at which the variance of the PI errors increases (see e.g. Figure 2B). Deviations from gain1 PI contribute to this estimate; hence, to single out the effects of noise during training on the stability of the learned attractor in Appendix 1, we also estimate $D$ in the presence of test noise when no inputs are received at all.
Fly connectome analysis
Request a detailed protocolOur model assumes the segregation of visual inputs to HD (EPG) cells from head rotation and recurrent inputs to the same cells. To test this hypothesis, we leverage on the fly hemibrain connectome (Xu, 2020; Clements et al., 2020). First, we randomly choose one EPG neuron per wedge of the EB, for a total of 16 EPG neurons. We reasoned this sample would be sufficient because the way EPG neurons in the same wedge are innervated is expected to be similar. We then find all incoming connections to these neurons from visually responsive ring neurons R2 and R4d (Omoto et al., 2017; Fisher et al., 2019). These are the connections that arrive at the axonproximal compartment in our model. We then find all incoming connections from PEN1 cells, which correspond to the HR neurons, and from PEN2 cells, which are involved in a recurrent excitatory loop from EPG to PEG to PEN2 and back to EPG (TurnerEvans et al., 2020). These are the connections that arrive at the axondistal compartment in our model.
To further support the assumption that visual inputs are separated from recurrent and HRtoHD inputs in the Drosophila EB, we perform binary classification between the two classes (R2 and R4d vs. PEN1 and PEN2). We use SVMs with Gaussian kernel, and perform nested 5fold cross validation, for a total of 30 model runs for every neuron tested (Figure 1—figure supplement 1).
Quantification of PI performance
Request a detailed protocolTo quantify PI performance of the network and compare to fly performance, we use the measure defined by Seelig and Jayaraman, 2015 and estimate the correlation coefficient between the unwrapped PVA and true heading in darkness. We estimate the correlation in 140 s long trials and report the point estimate and 95% confidence intervals (Student’s ttest, $N=100$).
Resource availability
Request a detailed protocolAll code used in this work is available at https://github.com/panvaf/LearnPI (copy archived at swh:1:rev:c6e354f80bf435114e577af70892db41c3ce5315, Vafidis, 2022). The files required to reproduce the figures can be found at https://gin.gnode.org/pavaf/LearnPI.
Appendix 1
Robustness to noise
In the main text, the only source of stochasticity in the network came from the angular velocity noise in the OrnsteinUhlenbeck process (Materials and methods, Equation 17). Biological HD systems, however, are subject to other forms of biological noise like randomness of ion channels. To address that, we include Gaussian IID synaptic current noise to every location in the network where inputs arrive: the axonproximal and axondistal compartments of HD cells and the HR cells (see Materials and methods, parameterized by ${\sigma}_{n}$ in Equation 2, Equation 4, and Equation 8). We then ask how robustly can the network learn in the presence of such additional stochasticity.
To quantify the network’s robustness to noise, we need to define a comparative measure of useful signals vs. noise in the network. By ‘signals’ we refer to the velocity/visual inputs and any network activity resulting from them, whereas ‘noise’ is the aforementioned Gaussian IID variables. We thus define the signaltonoise ratio (SNR) to be the squared ratio of the active range ${A}_{active}$ of the activation function $f$ (defined in Equation 7) over two times the standard deviation of the Gaussian noise, ${\sigma}_{n}$, i.e.
This definition is motivated by the fact that ${A}_{active}$ determines the useful range that signals in the network can have. If any of the signals exceed this range, they cannot impact the network in any meaningful way because the neuronal firing rate has saturated, unless they are counterbalanced by other signals reliably present. The factor 2 in the denominator is due to the fact that the noise can extend to both positive and negative values, whereas ${A}_{active}$ denotes the entire range of useful inputs.
Here, we vary the SNR and observe its impact on learning and network performance. Appendix 1—figure 1A–D shows the performance of a network that has been trained with $\text{SNR}\approx 2$. The resulting network connectivity remains circularly symmetric and maintains the required asymmetry in the HRtoHD connections for L and RHR cells (data not shown). Therefore we plot only the profiles in Appendix 1—figure 1B, which look very similar to the ones in Figure 3C trained with $\text{SNR}=\mathrm{\infty}$. The peak of the local excitatory connectivity in ${W}^{\text{rec}}$ is not as pronounced. This happens because the noise corrupts autocorrelations of firing during learning.
The network activity still displays a clear bump that smoothly follows the ground truth in the absence of visual input (Appendix 1—figure 1A). There are only minor differences compared to the network without noise (Figure 2A). The presence of the noise is most obvious in the HR cells, since HD cells that do not participate in the bump are deep into inhibition, and therefore synaptic input noise does not affect as much their activity. We note that the network can no longer sustain a bump in darkness when $\text{SNR}=1$, that is when the standard deviation of the noise covers the full active range of inputs (data not shown).
Finally, the neural velocity slightly overestimates the head angular velocity (Appendix 1—figure 1C compared to Figure 2C), and the PI errors diffuse faster in the network with noise ($88.1{\text{deg}}^{2}/\text{s}$ in Appendix 1—figure 1D compared to $24.5{\text{deg}}^{2}/\text{s}$ in Figure 2B); these values are also indicated in Appendix 1—figure 1E (triangles). Importantly, we find that the diffusion assumption holds, because the estimation of the diffusion coefficient when varying simulation time (between 10 and 60 s) is consistent.
So far we have addressed the diffusivity of PI in networks that receive velocity input, and we have compared the performance of networks that were trained with and without synaptic input noise. However, in mature networks, that is during testing, it is unclear how large the impact of synaptic input noise is, compared to noise that originates from imperfect PI. To disentangle these two noise contributions during testing, we study diffusivity in networks that do not receive velocity inputs at all (i.e. without PI). In the absence of velocity inputs, we vary the level of synaptic input noise during testing (called ‘test noise’), and estimate diffusion coefficients. Specifically, for each test noise magnitude ${\sigma}_{n}$ we run 1000 simulations of ${t}_{\text{sim}}=10$ s, each time randomly initializing the network at one of the angular locations ${\theta}_{i}$ for which HD neurons are tuned for. For ${\sigma}_{n}=0$, we run one simulation per ${\theta}_{i}$, since the simulation is deterministic.
Our results in Appendix 1—figure 1E show that the diffusion coefficients obtained in networks without velocity inputs (dots) are always much smaller that those with velocity inputs (triangles). Thus, imperfect integration of velocity inputs is by far the dominating source of noise in trained, mature networks. Omitting the velocity inputs, we detected small differences between networks that were trained without synaptic input noise (blue dots) and with such noise (orange dots, train noise ${\sigma}_{n}=0.7$). The network trained with noise is sligthly more diffusive up to the level of test noise at which is was trained (${\sigma}_{n}=0$ – 0.7), and it is slightly less diffusive beyond that level (${\sigma}_{n}>0.7$).
We conclude that learning PI is robust to synaptic input noise during learning, and that synaptic input noise during testing degrades performance much less than errors due to deviations from perfect gain1 PI, which are already quite small.
Appendix 2
Synaptic delays set a neural velocity limit during path integration
In the main text we trained networks for a set of angular velocities that cover the full range exhibited by the fly ($v<500$ deg/s), and we showed that the mature network can account for several key experimental findings. However, the ability of any continuous attractor network to pathintegrate is naturally limited for high angular velocities, due to the synaptic delays inherent in any such network (Zhong et al., 2020). To evaluate the ability of our network to integrate angular velocities, we sought to identify a limit of what velocities could be learned.
The width of the HD bump in our network is here termed BW, and it is largely determined by the width $\sigma $ of the visual receptive field. This is because during training we force the network to produce a bump with a width matching that of the visual input, and this width is then maintained when the latter is not present. The reason for this behavior is that the width of the learned local excitatory connectivity profile in ${W}^{\text{rec}}$ that guarantees such stable bumps of activity will be similar to the width of the bump, because recurrent connections during learning are only drawn from active neurons (nonzero ${\text{P}}_{j}$ in Equation 12). As mentioned in the main text, this emphasizes the Hebbian component of our learning rule (fire together — wire together). As a result, the width of local excitatory recurrent connections should be approximately BW.
In Figure 3—figure supplement 1 we show that the higher angular velocities are served by the longrange excitatory connections in ${W}^{\text{HR}}$. However, these connections might not be strong enough to move the bump by themselves; a contribution from HD cells might still be needed to move the bump at such high angular velocities. In that case, the width of the connectivity bump in ${W}^{\text{rec}}$ might limit how far away from the current location the bump can be moved. In addition, there is a limitation in how quickly the bump can be moved: the learning rule in Equation 1 tries to predict the next state of the network from the current state; but to activate the next HD neurons in line, current HD and HR cell activity must go through the synaptic delay ${\tau}_{s}$. Therefore, the maximum velocity that the network can achieve without external guidance (i.e. without visual input) should be inversely proportional to ${\tau}_{s}$, i.e.
where we assume that $b$ might reflect an effective HD connectivity bump width, which depends on $\sigma $ but also on the angular resolution of the HD network $\mathrm{\Delta}\varphi $, due to discretization effects. In reality, the HRtoHD connectivity profiles in ${W}^{\text{HR}}$ likely also have a bearing on $b$.
We then systematically vary ${\tau}_{s}$ and test what velocities the network can learn. We indeed find that networks can pathintegrate all angular velocities up to a limit, but not higher than that. As predicted, this limit is inversely proportional to ${\tau}_{s}$, for a wide range of delays (Appendix 1—figure 1A). Furthermore, $b$ matches BW reasonably well. Fitting Equation 23 to the data we obtain $b(0.25,6\text{deg})\approx \text{BW}=96\text{deg}$ for ${N}^{\text{HD}}={N}^{\text{HR}}=120$ and $b(0.15,12\text{deg})=75\text{deg}$, $\text{BW}=60\text{deg}$ for ${N}^{\text{HD}}={N}^{\text{HR}}=60$.
As mentioned in the main text, there are two limitations other than synaptic delays why the network could not learn high angular velocities: limited training of these velocities, and saturation of HR cell activity. These limitations kick in for $\tau}_{s}<150\text{}\text{ms$, for which ${v}_{max}$ matches the maximum velocity the fly displays (500 deg/s). Therefore to create Appendix 1—figure 1A for these delays, we increased the standard deviation of the velocity noise in the OrnsteinUhlenbeck process to ${\sigma}_{v}=800\text{deg/s}$ to address the first limitation, and we increased the dynamic range of angular velocity inputs by decreasing the proportionality constant in Equation 10 to $k=1/540\text{s/deg}$ to address the second.
The velocity gain plot for an example network with high synaptic delays (${\tau}_{s}=190\text{ms}$) is shown in Appendix 1—figure 1B. Interestingly, we notice that the performance drop at the velocity limit is not gradual; instead, the neural velocity abruptly drops to a nearzero value once past the velocity limit. Further investigation reveals that for velocities higher than this limit, the network can no longer sustain a bump (Appendix 1—figure 1C). This happens because the HD network cannot activate neurons downstream fast enough to keep the bump propagating, and therefore the bump disappears and the velocity gain plot becomes flat.
Equation 23 is similar to a relationship reported in TurnerEvans et al., 2017 (their page 35, 1st paragraph), where it was demonstrated that the phase shift of the HR population bump compared to the HD bump limits angular velocity. Our result hence generalizes this finding in the case where recurrent connections between HD neurons are also allowed.
Finally, we note that so far we only tested the limits of network performance when increasing ${\tau}_{s}$. To demonstrate that smaller delays also work, as an extreme example we show PI performance in a network where ${\tau}_{s}=1\text{ms}$ in Appendix 1—figure 1D and E. A potential issue with such small synaptic delays is that the network would not be able to distinguish rightward from leftward rotation for small angular velocities, because the motion direction offset of the HR bumps would be small, and the activity in the two HR populations comparable. In such a setting it is harder to learn the asymmetries in the HRtoHD connections required to differentiate leftward from rightward movement. Indeed, this effect is visible in Appendix 1—figure 1E where the amplitude of HRtoHD connections has been suppressed, and in Appendix 1—figure 1D where the flat region for small angular velocities has been extended compared to Figure 2C.
Overall, these results indicate that the network learns to pathintegrate angular velocities up to a fundamental limit imposed by the architecture of the HD system in the fly. Furthermore, we conclude that the phenomenological delays observed in the fly HD system in TurnerEvans et al., 2017 are not fundamentally limiting the system’s performance, since they can support PI for angular velocities much higher than the ones normally displayed by the fly.
Appendix 3
Robustness to architectural asymmetries
The networks we have trained in the Results had a circular symmetric initial architecture, including the hardwired HDtoHR connections ${W}^{\text{HD}}$. However, such symmetry is unrealistic for any biological system that is assembled by imperfect processes; deviations from symmetry should be expected. Therefore, in this Appendix we let ${W}^{\text{HD}}$ vary randomly, and observe how PI performance is affected.
First, we remind the reader that the magnitude ${w}^{\text{HD}}$ of the HDtoHR connections is chosen so that we take advantage of the full dynamic range of HR neurons; however, the exact magnitude should not be critical for our model. Homeostatic plasticity could adjust the magnitude, but for simplicity we have not incorporated such plasticity rules in our model. Instead, to see whether the exact values of synaptic weights, their circular symmetry, and the 1to1 nature of the HDtoHR connections is crucial for our model, we draw connection strengths randomly.
In a first approach, we let HD neurons project also to adjacent HR neurons. Specifically, if $U(a,b)$ denotes the uniform distribution in the interval $(a,b)$, we sample the magnitude of weights from $U(\frac{{w}^{\text{HD}}}{2},{w}^{\text{HD}})$ for the main diagonal and $U(0,\frac{3{w}^{\text{HD}}}{4})$ for the side diagonals of ${W}^{\text{HD}}$ (Appendix 3—figure 1A). We then adjust the network connectivity (${W}^{\text{rec}}$ and ${W}^{\text{HR}}$) in a learning phase, similar to the one illustrated in Figure 3E and F. After learning, as shown in Appendix 3—figure 1CE, PI is still excellent because the learning rule can balance out any deviations from circular symmetry in ${W}^{\text{HD}}$. It does so by introducing deviations from circular symmetry in the learned weights, mainly in ${W}^{\text{HR}}$ (Appendix 3—figure 1B). Thus, small deviations from circular symmetry in the learned weights are essential for PI.
We illustrate the necessity to counterbalance small deviations of circular symmetry again in a second example that is based on the symmetric network studied in the Results. Here, we use the connectivity of the network illustrated in Figure 3A–C and also preserve the 1to1 nature of HDtoHR connections, but now we randomly vary their magnitude, while maintaining the same average connection strength; specifically, we sample the magnitude of weights from $U(\frac{{w}^{\text{HD}}}{2},\frac{3{w}^{\text{HD}}}{2})$. PI performance in this network is considerably impaired compared to the original network (compare Appendix 3—figure 2 to Figure 2A and C). This is a further argument in favor of synaptic plasticity operating to finetune connectivity, because as mentioned we expect that such anatomical asymmetries are indeed present in the biological circuit. Therefore, even if the circular symmetric synaptic weights were passed down genetically with great accuracy, PI performance in flies should be considerably degraded for a biological circuit with anatomical asymmetries when no learning is involved.
To better quantify the effect of anatomical asymmetries, we incorporate both noise in the learned connectivity as in Figure 3—figure supplement 3A,B and noise in the HDtoHR connections as in Appendix 3—figure 1A. We tune the noise independently for each weight matrix: for ${W}^{\text{rec}}$ and ${W}^{\text{HR}}$ we set the variance of the Gaussian noise to $p$ times the variance of the individual weight matrices, while for ${W}^{\text{HD}}$ we draw the connections connections from $U((1p){w}^{\text{HD}},(1+p){w}^{\text{HD}})$ for the main diagonal and $U(0,p{w}^{\text{HD}})$ for the side diagonals. We find that for $p=0.3$ the correlation between the PVA and true heading in darkness drops to $0.27\pm 0.09$ which is below reported fly PI performance (mean correlation across animals $\sim 0.5$ in Seelig and Jayaraman, 2015), while the structure of the weights is preserved. We observed a steep decline in the correlation coefficient between $p=0.25$ (for which the correlation is $0.92\pm 0.04$) and $p=0.3$. Furthermore, we study the effect of perturbing individual weight matrices, and find that perturbing only ${W}^{\text{HD}}$ with $p=0.3$ considerably affects performance (correlation $0.39\pm 0.09$) while perturbation of ${W}^{\text{rec}}$ and ${W}^{\text{HR}}$ together has a much smaller effect on performance. This again argues in favor of learning ${W}^{\text{rec}}$ and ${W}^{\text{HR}}$ to counterbalance asymmetries in ${W}^{\text{HD}}$ (Appendix 3—figure 1). Furthermore, note that confounders other than imperfect weights might be responsible for the degradation of PI performance in the fly, which further argues in favor of learning.
As a final test for the capability of the learning rule to balance anatomical asymmetries, in Appendix 3—figure 3 we use a completely random connectivity for HDtoHR connections, drawing weights from a folded Gaussian distribution. We find that even then, PI performance of the converged network is great, albeit for a smaller range of velocities. In addition, bumps are not clearly visible in the HR populations anymore; in the main network, HR bumps were inherited from the HD bump due to the sparseness of the HDtoHR connections. However, when HDtoHR connections are random, HR cells are no longer mapped to a topographic state space.
Appendix 4
Requirements on time scales
We devote this Appendix to discuss requirements for the time scales involved in our model (see Appendix 4—table 1). Several of these time scales are well constrained by biology, and thus we chose to keep them constant. These include the membrane time constants of the axonproximal and axondistal compartments, $C/{g}_{L}$ and ${\tau}_{l}$, respectively, which should be in the order of milliseconds; and the velocity decay time constant ${\tau}_{v}$, for which we choose a value in the same order of magnitude (0.5 s) as experimentally reported (TurnerEvans et al., 2017).
In general, the learning time scale given by $1/\eta $ should be the slowest one in our network model. The time scale should be large enough so that the network samples the input statistics for a long enough time. Varying $1/\eta $ from 2s to 20s to 200s, we find that the final learned weights are virtually identical (not shown). However, $1/\eta $ should not be too large to enable fast enough learning.
The synaptic time constant ${\tau}_{s}$ is determined from phenomenological delays in the network (TurnerEvans et al., 2017). Nevertheless, we also addressed the impact of varying ${\tau}_{s}$ in Appendix 2 and found that learning PI was robust in a wide range of values.
In additional simulations, we varied the weight update filtering time constant ${\tau}_{\delta}$ from 0 (effectively removing filtering) to 1000s (which is four orders of magnitude larger than the default value). We observed almost no effect on learning dynamics, and the performance of the final networks was almost identical (not shown). Since the specific value of ${\tau}_{\delta}$ is of little consequence in our network (if $1/\eta $ is large enough), there are hardly any limitations on its value compared to other time scales. Therefore, this justifies ignoring ${\tau}_{\delta}$ in the derivation of a reduced model without noise in Appendix 5.
Appendix 5
Reduced model for a circular symmetric learned network
In this section, we derive a reduced model for the dynamics of the synaptic weights during learning. The goal is to gain an intuitive understanding of the structure obtained in the full model (Figure 3 of the main text). Such a model reduction is obtained by (1) exploiting the circular symmetry in the system; (2) averaging weight changes across different speeds and moving directions; (3) writing dynamical equations in terms of convolutions and crosscorrelations. With these methods, we derive a nonlinear dynamical system for the weight changes as a function of head direction. Finally, we simulate this dynamical system and inspect how the different variables interact to obtain the final weights. We find that the reduced model results in nearly identical connectivity and learning dynamics to the full network in the main text, and explains how the latter assigns learning errors to the correct weights. Furthermore, it drastically reduces simulation times by two orders of magnitude.
Note that in this section we use slightly different notation compared to the main text. Notably, we refer to the recurrent head direction weight matrix simply as $W$ (omitting the superscript rec), and use capital letters for functions of time and small letters for functions of heading direction.
We study the learning equation (see Equation 12–16 in the main text where the lowpass filtering with time constant ${\tau}_{\delta}$ has been ignored, since we find that the value of ${\tau}_{\delta}$ is not important for learning (see Appendix 4))
where
is the presynaptic error at the ith cell and
is the postsynaptic potential at HD cell $j$, and $H$ is a temporal filter (with time constants ${\tau}_{s}$ and ${\tau}_{l}$, see Equation 14 of the main text).
Clockwise movement
Assuming that the head turns clockwise (which equals to rightward rotation, i.e. rotation towards decreasing angles) and anticlockwise (leftward, i.e. towards increasing angles) with equal probability, we can approximate the weight dynamics by summing the average weight change ${W}_{ij}^{+}$ for clockwise movement and the average weight change ${W}_{ij}^{}$ for anticlockwise movement:
We start by assuming head movement at constant speed and we later generalize the results for multiple speeds. We compute the expected weight change $\frac{\mathrm{d}}{\mathrm{d}t}{W}_{ij}^{+}$ for one lap in the clockwise direction at speed ${v}^{+}>0$:
where
is the postsynaptic potential for clockwise movement, and
is the error for a clockwise movement. Assuming that the axonproximal voltage is at steady state (Equation 4 of the main text with the l.h.s. set to zero and ${I}_{exc}^{\text{HD}}$ absorbed into ${I}_{vis}$), the clockwise axonproximal voltage reads
where (see Equation 11 of the main text)
From Equation 2 and 3 of the main text, we can write the axondistal voltage ${V}_{i}^{d+}$ as a lowpass filtered version of the total axondistal current ${D}_{i}^{+}$ for clockwise movement (see also Equation 14 of the main text):
which yields
Importantly, the visual input ${I}^{vis}$ is translation invariant:
where ${\theta}_{j}$ and ${\theta}_{i}$ are the preferred head directions of the jth and ith cell, respectively. As a result of this translation invariance, the recurrent weight matrix $W$ develops circular symmetry:
where ${N}_{\text{HD}}$ is the number of HD cells in the system. Consequently, the postsynaptic potential ${P}_{j}^{+}$ is also translation invariant:
In this case, without loss of generality, we can rewrite Equation 28 for a single row of the matrix $\frac{\mathrm{d}}{\mathrm{d}t}{W}_{ij}^{+}$ as a function of the angle difference $\theta :={\theta}_{j}{\theta}_{0}$:
where we defined ${\u03f5}^{+}(\phi ):={E}_{0}^{+}(\phi /{v}^{+})$ and ${p}^{+}(\phi ):={P}_{0}^{+}(\phi /{v}^{+})$, and $\star $ denotes circular crosscorrelation.
From Equation 29, we derive
The approximation in Equation 45 holds when a bump exists in the network and moves with a velocity below the velocity limit, and it is valid if the temporal filter $H$ is shorter than $2\pi /{v}^{+}$, that is for $H(t)\ll 1$ for $t>2\pi /{v}^{+}$, which holds for the filtering time constants and velocity distribution we assumed (Appendix 5—figure 1). Therefore, plugging Equation 46 into Equation 43, we obtain:
By using the definition of $\u03f5{(\phi )}^{+}$ we derive
with (Equation 31)
and (Equation 34)
The approximation in Equation 52 is valid if the temporal filter $H$ is shorter than $2\pi /{v}^{+}$, which again holds true for our parameter choices (Appendix 5—figure 1).
Calculation of the axondistal input
Let us compute the axondistal current ${D}_{i}^{+}$ to the ith neuron for clockwise movement. From Equation 2 of the main text, setting the l.h.s. to zero, and splitting the rotationcell activities in the two populations (LHR and RHR), we derive
where ${W}_{ij}^{\text{R}}$ (${W}_{ij}^{\text{L}}$) are the weights from the right (left) rotation cells, and ${V}_{j}^{R+}$ (${V}_{j}^{L+}$) are the voltages of the right (left) rotation cells (see Equation 8–10 of the main text):
The function ${H}_{S}(t):=\mathrm{exp}(t/{\tau}_{s})/{\tau}_{s}$ is a temporal low pass filter with time constant ${\tau}_{s}$ and the velocity input reads (Equation 10 of the main text)
Equation 54 and Equation 55 show that the rotationcell voltages are rescaled and filtered versions of the corresponding HDcell firing rates with a baseline shift ${\overline{I}}_{vel}$ that is differentially applied to right and left rotation cells.
From Equation 53, we derive
Assuming a large number ${N}_{\text{HD}}$ of HD cells evenly spaced around the circle, the recurrent axondistal input reads
where ${\rho}_{\text{HD}}={N}_{\text{HD}}/2\pi $ is the density of the HD neurons around the circle and we used the fact that the axonproximal voltage is translation invariant (see also Equation 37):
Following a similar procedure for ${D}_{0}^{R+}$ and ${D}_{0}^{L+}$, we obtain:
where ${\rho}_{\text{HR}}={N}_{\text{HR}}/2\pi $ is the density of the HR neurons for one particular turning direction (note that we assumed ${\rho}_{\text{HR}}=2{\rho}_{\text{HD}}$ in the main text). In deriving Equation 62 we defined
where we defined the filter ${h}_{s}^{+}(\phi ):=\frac{1}{{v}^{+}}{H}_{s}(t/{v}^{+})$, and the approximations are valid if ${H}_{s}(t/{v}^{+})\ll 1$ for $t>2\pi /{v}^{+}$, which holds true for the time constant and velocity distribution assumed in the main text.
Finally, we compute the rotationcells’ weights change. For these weights, the learning rule is the same as the one for the recurrent connections, except that the postsynaptic HD input is replaced by the postsynaptic HR input. Therefore, following the same procedure as in Equation 38–Equation 46, the rotation weight changes are given by:
In summary, for clockwise movement, we obtain the following system of equations:
Anticlockwise movement
We now consider anticlockwise movements with speed ${v}^{}={v}^{+}$. First we note that the temporal filter
is a mirrored version about the origin of its clockwise counterpart ${h}^{+}$, whereas the visual input is unchanged because it is symmetric around the origin (see Equation 5 of the main text)
Let us first assume that
we shall verify the validity of this assumption selfconsistently at the end of this section. From Equation 68–Equation 70 it follows that $f({v}^{a})=f\left[\frac{{g}_{D}}{{g}_{D}+{g}_{L}}({h}^{}\ast {d}^{})+{\overline{I}}^{vis}\right]$ is a mirrored version of $f({v}^{a+})$, that is,
and, as a result,
We now compute the anticlockwise weight change for the recurrent weights
The r.h.s. of Equation 73, without the $\eta /(2\pi )$ prefactor reads:
where from Equation 75 to Equation 76 we used variable substitution. Therefore, the weight change for clockwise movement is the mirrored version around the origin of the weight change for anticlockwise movement:
meaning that, with learning, the recurrent weights develop into an even function:
Let us now study the anticlockwise weight change for the rotation weights. The rotationcell voltages during anticlockwise movement read:
Using Equation 71 in Equation 80 and Equation 81 we find
Therefore, applying the same procedure outlined in Equation 73–Equation 77, to the anticlockwise change in the rotation weights yields
meaning that, during learning, the right and left rotation weights develop mirror symmetry:
To verify that our original assumption in Equation 70 holds, we compute the axondistal input for anticlockwise movement:
Using Equation 71, 79, 80, 81, 86 in Equation 87, yields
Finally, using Equation 78, 84, and 85, the total synaptic weight changes for both clockwise and anticlockwise movement read
Averaging across speeds
So far, we have only considered head turnings at a fixed speed ${v}^{+}$ (clockwise) and ${v}^{}={v}^{+}$ (anticlockwise). However, in the full model described in the main text, velocities are sampled stochastically from an OU process. This random process generates a halfnormal distribution of speeds with spread ${\sigma}_{v}/2$ (Appendix 5—figure 1, left, see also Table 1 in the main text). We thus compute the expected weight changes with respect to this speed distribution:
where w_{v} is the weight change for speed ${v}^{+}={v}^{}=v$ and $p(v)$ is an halfnormal distribution with spread ${\sigma}_{v}/2$.
Simulation of the reduced model
In this section, we show the dynamics of the reduced model numerically simulated according to Equation 67, 89, and 90. Weight changes are computed at discrete time steps and integrated using the forward Euler method. At each time step we compute the weight changes for each speed $v$ (Equation 67 and Equation 89) and we estimate the expected weight change according to Equation 90. We then update the weights and proceed to the next step of the simulation. Note that Equation 67 requires the firing rates of HD and HR cells at the previous time step (recurrent input, first line of Equation 67). Therefore, at each time step, we save the HD and HR firing rates for every speed value $v$ and provide them as input to the next iteration of the simulation.
Appendix 5—figure 2 shows the evolution of the reduced system for 400 time steps, starting from an initial condition where all weights are zero. One can see that from time steps 75–100 the system switches from a linear regime (HD firing rates below saturation, see top panel) to a nonlinear regime (saturated HD rates). Such a switch is accompanied by peaks in the average absolute error (third panel from the top). Notably, the rotation weights start developing a structure only after such switch has occurred (see two bottom panels)—a feature that has been observed also in the full model (Figure 3E of the main text).
Development of the recurrent weights
Appendix 5—figure 3 provides an intuition for the shape of the recurrentweights profiles that emerge during learning. The first column shows the evolution of the recurrent weights in the linear regime ($t=25$), that is, before the HD rates reach saturation. In this regime, both recurrent and rotation weights are small, and the steadystate axondistal rate
is flat and close to zero. Therefore, the HD output rate $f({v}^{a})$ is dominated by the visual input ${\overline{I}}_{vis}$ (Equation 67, third line), which has the shape of a localized bump (panel A1). Thus the error $\u03f5$ has also the shape of a bump (B1). Additionally, the postsynaptic inputs ${p}^{+}$ and ${p}^{}$ are shifted and filtered versions of this bump (Equation 67, seventh line). The recurrent weight changes $\mathrm{d}{w}^{+}$ and $\mathrm{d}{w}^{}$ for clockwise and anticlockwise movement are given by the crosscorrelation of the errors ${\u03f5}^{+}$ and ${\u03f5}^{}$ with the postsynaptic inputs ${p}^{+}$ and ${p}^{}$ (panel C1; see Equation 67 seventh line and Equation 73). Note that because $a(x)\star b(x)=a(x)\ast b(x)$, the operation of crosscorrelation can be understood graphically as a convolution between the mirrored first function $a$ and the second function $b$. Such a mirroring is irrelevant in C1 (linear regime) because the error is an even function, but becomes important in C2 (nonlinear regime). As a result of this crosscorrelation, the recurrent recurrentweight changes $\mathrm{d}{w}^{+}$ and $\mathrm{d}{w}^{}$ are shifted bumps (colored lines in C1), which merge into a single central bump after summing clockwise and anticlockwise contributions (black line in C1). Therefore, in the linear regime, the recurrent weights develop a single central peak in the origin (panel D1).
The second column of shows the development of the recurrent weights in the nonlinear regime (time step 350). Panel A2 shows that in this scenario the HD firingrate bumps are broader and approach saturation due to the strong recurrent input. The coupling between the axondistal and axonproximal compartment acts as a selfamplifying signal during learning which results in the activity of all active neurons participating in the bump reaching saturation. Additionally, because the recurrent input is filtered in time (Equation 67, second line), such bumps are also shifted towards the direction of movement. Importantly, due to the lack of visual input, within the receptive field the steadystate axondistal rates are always smaller than the firing rates. As a result, the errors ${\u03f5}^{+}$ and ${\u03f5}^{}$ show small negative bumps in the direction of movement, and small positive bumps in the opposite direction (panel B2). Additionally, the postsynaptic inputs ${p}^{+}$ and ${p}^{}$ shift further apart from the origin. Consequently, the total weight change $\mathrm{d}w$ develops negative peaks around 60 deg (black line in C2, contrast to panel C1), and these peaks get imprinted in the final recurrent weights’ profiles (panel D2).
Development of the rotation weights
Appendix 5—figure 4 provides an intuitive explanation for the shape of the rotationweights profiles ${w}^{R}$ and ${w}^{L}$ that emerge during learning. The first column shows the evolution of the rotation weights in the linear regime ($t=25$), i.e., before the HD rates reach saturation. In this regime, the rotationcell firing rates are filtered versions of the HD bumps but rescaled by a factor ${A}_{\text{active}}/{f}_{\text{max}}\approx 0.013$ and baselineshifted by an amount $\pm {\overline{I}}_{vel}+{I}_{inhib}^{\text{HR}}$ (Equation 67 lines 4 and 5; panel A1, compare to panel A1). This baseline shift acts as a switch that determines from which rotation cells population connections will be mainly drawn from, depending on the direction of motion. Panel B1 shows that the errors ${\u03f5}^{+}$ and ${\u03f5}^{}$ overlap and have the shape of a bump centered at the origin (same curves as in panel B1). Additionally, the postsynaptic potentials ${p}^{R\pm}$ and ${p}^{L\pm}$ in B1 are filtered versions of the curves in A1 (Equation 67, lines 7 and 8). As a result, the weight changes $\mathrm{d}{w}^{R\pm}$ and $\mathrm{d}{w}^{L\pm}$, that is, the errors crosscorrelated by the postsynaptic potentials, appear similar to the bumps in A1, but they are smoother and further apart from the origin (panel C1). Finally, such weight changes get imprinted in the rotation weights (panel D1).
The second column shows the evolution of the rotation weights in the nonlinear regime ($t=350$), that is, after the HD rates reach saturation. In this case, the large recurrent input gives rise to larger rotation rates (A2, compare to A1) and larger postsynaptic potentials (B2, compare to B1). In panel B2, we can see that the errors ${\u03f5}^{+}$ and ${\u03f5}^{}$ show positive and negatives peaks shifted from the origin (same curves as in panel B2), which generate weight changes with both positive and negative lobes (panel C2). Such weight changes get finally imprinted in the rotation weights (panel D2).
Data availability
All code used in this work is available at https://github.com/panvaf/LearnPI, (copy archived at swh:1:rev:c6e354f80bf435114e577af70892db41c3ce5315). The files required to reproduce the figures can be found at https://gin.gnode.org/pavaf/LearnPI.

neuPrintID 57443. A Connectome of the Adult Drosophila Central Brain.
References

Dynamics of pattern formation in lateralinhibition type neural fieldsBiological Cybernetics 27:77–87.https://doi.org/10.1007/BF00337259

Adaptive temporal processing of odor stimuliCell and Tissue Research 383:125–141.https://doi.org/10.1007/s00441020034009

Prospective coding by spiking neuronsPLOS Computational Biology 12:e1005003.https://doi.org/10.1371/journal.pcbi.1005003

Accurate path integration in continuous attractor network models of grid cellsPLOS Computational Biology 5:e1000291.https://doi.org/10.1371/journal.pcbi.1000291

A massively parallel architecture for A selforganizing neural pattern recognition machineComputer Vision, Graphics, and Image Processing 37:54–115.https://doi.org/10.1016/S0734189X(87)800142

Recurrent amplification of gridcell activityHippocampus 30:1268–1297.https://doi.org/10.1002/hipo.23254

The role of the hippocampus in navigation is memoryJournal of Neurophysiology 117:1785–1796.https://doi.org/10.1152/jn.00005.2017

Path integration in mammals and its interaction with visual landmarksThe Journal of Experimental Biology 199:201–209.https://doi.org/10.1242/jeb.199.1.201

Saccadic body turns in walking DrosophilaFrontiers in Behavioral Neuroscience 8:365.https://doi.org/10.3389/fnbeh.2014.00365

BookNeural integrator modelsIn: Goldman M, editors. In Encyclopedia of Neuroscience. Elsevier. pp. 165–178.

Signal propagation in Drosophila central neuronsThe Journal of Neuroscience 29:6239–6249.https://doi.org/10.1523/JNEUROSCI.076409.2009

A model of saliencybased visual attention for rapid scene analysisIEEE Transactions on Pattern Analysis and Machine Intelligence 20:1254–1259.https://doi.org/10.1109/34.730558

Optimizing working memory with heterogeneity of recurrent cortical excitationThe Journal of Neuroscience 33:18999–19011.https://doi.org/10.1523/JNEUROSCI.164113.2013

Building machines that learn and think like peopleThe Behavioral and Brain Sciences 40:e253.https://doi.org/10.1017/S0140525X16001837

Deciphering the hippocampal polyglot: the hippocampus as a path integration systemThe Journal of Experimental Biology 199:173–185.https://doi.org/10.1242/jeb.199.1.173

Homing by path integration in a mammalDie Naturwissenschaften 67:566–567.https://doi.org/10.1007/BF00450672

Directionally selective mnemonic properties of neurons in the lateral dorsal nucleus of the thalamus of ratsThe Journal of Neuroscience 13:4015–4028.https://doi.org/10.1523/JNEUROSCI.130904015.1993

Place cells, grid cells, and the brain’s spatial representation systemAnnual Review of Neuroscience 31:69–89.https://doi.org/10.1146/annurev.neuro.31.061307.090723

Burstdependent synaptic plasticity can coordinate learning in hierarchical circuitsNature Neuroscience 24:1010–1019.https://doi.org/10.1038/s4159302100857x

The firing of hippocampal place cells in the dark depends on the rat’s recent experienceThe Journal of Neuroscience 10:2008–2017.

Head direction cells in the deep layer of dorsal presubiculum in freely moving ratsSociety of Neuroscience Abstract 10:599.

Path integration and cognitive mapping in a continuous attractor neural network modelThe Journal of Neuroscience 17:5900–5920.https://doi.org/10.1523/JNEUROSCI.171505900.1997

ConferenceA model of the neural basis of the rat’s sense of directionAdvances in Neural Information Processing Systems. pp. 173–180.

Virtual reality for freely moving animalsNature Methods 14:995–1002.https://doi.org/10.1038/nmeth.4399

Selforganizing continuous attractor networks and path integration: onedimensional models of head direction cellsNetwork 13:217–242.

Neural networks with dynamic synapsesNeural Computation 10:821–835.https://doi.org/10.1162/089976698300017502

Lessons from a compartmental model of a Drosophila neuronThe Journal of Neuroscience 29:12033–12034.https://doi.org/10.1523/JNEUROSCI.334809.2009

Synaptic reverberation underlying mnemonic persistent activityTrends in Neurosciences 24:455–463.https://doi.org/10.1016/s01662236(00)018683

Early olfactory processing in Drosophila: mechanisms and principlesAnnual Review of Neuroscience 36:217–241.https://doi.org/10.1146/annurevneuro062111150533

ConferenceSpikebased learning rules and stabilization of persistent neural activityAdvances in Neural Information Processing Systems. pp. 199–208.

Doublering network model of the headdirection systemPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 66:041902.https://doi.org/10.1103/PhysRevE.66.041902

Representation of spatial orientation by the intrinsic dynamics of the headdirection cell ensemble: a theoryThe Journal of Neuroscience 16:2112–2126.

Predictive olfactory learning in DrosophilaScientific Reports 11:6795.https://doi.org/10.1038/s4159802185841y

Nonequilibrium statistical mechanics of continuous attractorsNeural Computation 32:1033–1068.https://doi.org/10.1162/neco_a_01280
Decision letter

Srdjan OstojicReviewing Editor; Ecole Normale Superieure Paris, France

Ronald L CalabreseSenior Editor; Emory University, United States

Hervé RouaultReviewer; Inserm, France
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Learning accurate path integration in ring attractor models of the head direction system" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Ronald Calabrese as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Hervé Rouault (Reviewer #4).
The reviewers have discussed their reviews with one another, and believe the manuscript is potentially suitable for publication, provided a number of comments are addressed or discussed. The Reviewing Editor has drafted a merged feedback to help you prepare a revised submission.
Essential revisions:
Plasticity:
– A main question is whether the type of synaptic plasticity proposed by the authors is necessary and operative in the CX. One motivation for the plasticity rule is that, as the authors state, ring attractors "require that synaptic connections are precisely tuned (Hahnloser, 2003). Therefore, if the circuit was completely hardwired, the amount of information that an organism would need to genetically encode connection strenghts would be exceedingly high." However, in Figure 3 and S3, it is shown that the learnedPI "far outperforms" flies, so the authors add random noise to the synaptic weights in order to make a more realistic model. But this means that the synaptic weights don't actually need to be that precise. Doesn't this mean that the connectivity may in fact be genetically encoded?
– Another proposed function of the plasticity rule is in adjusting to changes in relative gain between proprioceptive signals. The authors cite Jayakumar et al. (2019) in the introduction as motivation for this, without noting until later that the study used rodents rather than flies. In the discussion, it is mentioned that Seelig and Jayaraman (2015) actually did not find evidence for gain adjustments in flies. This seems to go against the plasticity mechanism being proposed as being relevant for flies. Is there any evidence for this in flies?
– Biologically plausible learning rule: One of the main assumptions that allows the authors to claim for a biologically plausible learning rule is that the 'firing rate' of the neuron is available for all synapses at the axondistal compartment due to the assumed backpropagation (line 167). As this seems to be crucial, it will be good to give references showing that there is a backpropagation AP in EPG neurons, or to explain what the alternatives are.
– Writing style: The paper would be easier to understand if some of the key relevant equations, for example those that describe the synaptic plasticity rule, were included in the main text.
– Analysis of connectome data. The connectome data is a nice support for why using a learning rule that relies on a twocompartment model. However, some points could be clarified. What is exactly is X,Y position? Can the authors plot the shape of the neuron on the same plot, and maybe also to give access to a 3D video of this cloud points? Also, why showing only 1 example, instead of analyzing all the EPG neurons and showing all of them in a supplementary figure. Then, the authors should do a statistical analysis to support their claim, such as doing some clustering analysis. You want to at least convince the reader that you can reject the null hypothesis that inputs are not segregated.
– Predictions: It would be good if the authors could try and give more concrete predictions, or better, suggest testable predictions. While a theory doesn't have to give predictions, as mentioned in the public review, in this specific case it would be good if the authors could spend one or two paragraphs on concrete predictions and maybe even suggest new experiments to the community. In other words, while not necessary for publishing the paper, the authors could go beyond saying that PI requires supervised learning during development, which seems more like a hypothesis of the model than a real prediction.
– Network initial architecture: It is not clear how the results depend on the specific initialization of the network and how sensitive the main conclusions of the paper are to these choices. More specifically:
– It is not completely clear what the initial recurrent connectivity in the network is, and how the results depend on this choice. For example, does the symmetry, which the authors rely upon to derive their reduced model, persist if the initial connectivity is random?
– Would the results and conclusions of the paper change if the authors use a different choice of the HD>HR connections? Could they be random? Could they be connected to more than one neuron? Could they be plastic as well?
Related to this issue, in line 266 the authors claim that 'circular symmetry is a crucial property for any ring attractor'. While this might be a common knowledge in the field, please see Darshan and Rivkind 21 that argue against it. Following this, it is unclear if symmetry is needed for the authors to derive the reduced model, or if it is a principle that they think the system must have. The former is rather technical, while the latter is an important principle. It would be good if the authors could address this in their discussion.
– Saturation: It seems that saturation is crucial in this work. Is it known that EPG neurons reach saturation levels? For example, in cortex this rarely happens, where neurons are usually active around an expensive nonlinearity regime and are far from saturation. It will be beneficial if the authors could show that similar phenomena hold with different types of nonlinearities, in which neurons are not saturated. Alternatively, the authors can explain in the manuscript why they think that EPG neurons in reality are saturated (maybe give references?).
– Line 221 (and 389) "…our work reproduces this feature of the fly HD as an emergent property of learning". It is not clear what the authors mean by this. Is the impairment of PI in small angular velocities a result of the finite number of neurons in the network, or an emergent property of the learning? For example, what would happen if the authors would take 2000 neurons, will they still see this 'emergent property'? This also goes against what is written in Line 465 of the discussion.
– Timescales: there are many timescales in the model. Although it is addressed, to some extent, in the supplementary material, it would be good to add a paragraph stating what are the requirements on these timescales. For example, does the model fail if one of the timescales is smaller\larger than the others? Did the authors assume in the derivation that one timescale is slower than the rest? Why tau_{δ} can be ignored in the derivation? Can the authors show the failure modes of the network\derivation when changing the timescale?
– Randomness in network connectivity Figure S3. In case of the random connectivity, as the network is so small, the behavior of the network depends on the specific network realization and, therefore, it is hard to assess how representative FigS3DE are. it will be good to show some statistical analysis of the diffusion across these networks.
– Noise in the dynamics Figure S4. It is unclear if the diffusion is larger in this case just because the authors added noise, which stays after training, or it is something deeper than that. Could the authors compare the diffusivity in networks that were trained with and without noise? Also, in FigS4 panel E it seems that in contrast to the finitesize network effects, in this case there is no problem to integrate at very small velocities. Why is that? Does the noise contribute to smoothing the barriers and helps to go from a discrete to a more continuous representation when doing PI? Does this have to do with the small number of involved neurons in the fly central complex? One of the reviewers disagrees with the remark about a feature buildin by hand in previous models l220222. This is not something the learning rule per se allows to understand.
– What is the limitation of the adaptation in the network? For example, the authors claim that 'we test whether our network can rewire to learn an arbitrary gain between the two', but in fact only showed examples for a 50% increase or decrease in gain and only mentioned a negative gain without showing it. It will be useful to have a figure, showing how the performance varies with changing the gain. Is there a maximal\minimal gain that above\below it the network always fail to perform? Which of the network parameters are important for these gain changes?
– The authors seem to assimilate path integration with angular integration (l3140). The way insects manage to perform path integration (i.e. integrate distances) is still undetermined to our knowledge. Clearly, the angular integration performed in the central complex is not enough. The authors should try to disambiguate this point.
– The authors mention the existence of two PEN populations (PEN1 and PEN2). Could the proposed model could shed light on the presence of these two populations. Would they account for the inhibitory and excitatory part of the HR curves on Figure 3C?
– On l 625, the authors state that the constant inhibitory drive contributes to the uniqueness of the bump. One of the reviewers was not convinced of this statement as this constant inhibition does not play a role in the competition between several bumps. Only the inhibitory recurrent connectivity within the HD population or going through the HR populations would play such a role.
– In Figure S5B, the bump velocity goes to zero around 500º/s. Could this come from a defect in the network that would, for instance, abolish the presence of a bump in left and right HR populations? In a more robust behavior, only one of the left or right bumps would disappear, and the HD bump would go at a max constant velocity.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Learning accurate path integration in ring attractor models of the head direction system" for further consideration by eLife. Your revised article has been evaluated by Ronald Calabrese (Senior Editor) and a Reviewing Editor (Srdjan Ostojic).
The manuscript has been much improved but there are some remaining issues related to the lack of evidence for plasticity in CX. After consultation, the reviewers agreed that the current lack of evidence for plasticity should be pointed out earlier in the manuscript, and suggested presenting the need for plasticity in younger flies as a prediction of the model. One of the reviewers provided additional suggestions listed below.
Reviewer #2 (Recommendations for the authors):
The authors have made a number of improvements to the manuscript. My central concern, which is that there isn't any evidence for this plasticity being present in the CX, still stands. Obviously, a purely modeling study is unable to address this concern. The authors have argued in Appendix 3 that asymmetries in the architecture can severely disrupt the performance of the model and that this supports the need for plasticity. Here are a few suggestions related to this central issue.
1) The authors refer to the performance of flies from Seelig and Jayaraman (2015), stating at various points whether their models outperform the actual performance or not. This should actually be quantified in the figures and/or text. Specifically, both Figure 2 and Appendix 3 would benefit from a quantitative comparison to real fly performance.
2) It would be useful to show how much imprecision must be added to the synaptic connections before the model can account for the performance of actual flies. This would answer the question of "how much precision must be genetically encoded to account for fly behavior."
Reviewer #3 (Recommendations for the authors):
I find the paper to be timely and very interesting. The paper was well presented already in the first version, but I did have a few concerns. The authors did a great job in addressing all of these concerns and comments in the new version of their manuscript.
I strongly recommend it for publications in eLife.
https://doi.org/10.7554/eLife.69841.sa1Author response
Essential revisions:
Plasticity:
– A main question is whether the type of synaptic plasticity proposed by the authors is necessary and operative in the CX. One motivation for the plasticity rule is that, as the authors state, ring attractors "require that synaptic connections are precisely tuned (Hahnloser, 2003). Therefore, if the circuit was completely hardwired, the amount of information that an organism would need to genetically encode connection strengths would be exceedingly high." However, in Figure 3 and S3, it is shown that the learnedPI "far outperforms" flies, so the authors add random noise to the synaptic weights in order to make a more realistic model. But this means that the synaptic weights don't actually need to be that precise. Doesn't this mean that the connectivity may in fact be genetically encoded?
This is indeed an important point to expand on, and we thank the reviewer for bringing it up. A main finding of the present work is that a local synaptic plasticity rule operating in compartmentalized neurons can learn the precise synaptic connectivity required for accurate PI. Because simulation conditions are idealized,PI performance of the model can exceed that of the fly. However, there are many unknown factors that could limit the performance in the biological system, factors that, for simplicity, we do not model explicitly. Adding noise to the connectivity (as in Figure 3 —figure supplement 3; was ‘S3’ in the first submission) is a simple way to bring the model’s performance closer to the one of the fly; not an explanation of why the fly’s performance is worse. Figure 3 —figure supplement 3 thus shows how a perturbation of the learned network results in more realisticPI performance, deviating from the idealized simulation conditions.
In reality, there could be multiple reasons whyPI performance is worse in the fly than in the model. For instance, a confounder that would affect performance but not necessarily learning could be the presence of inputs that are unrelated to path integration, e.g., inputs related to circadian cycles (Raccuglia et al., 2019). In the presence of such confounders, a precise tuning of the weights might be crucial in order to reach the performance of the fly. In other words, only if the model outperforms the biological circuit in a simplified setting, it has a chance to perform as well in a realistic setting, with all the additional complexities the latter comes with. We have added these remarks in the final paragraph of “Relation to experimental literature” in the Discussion.
A further reason why synaptic plasticity might be necessary to achieve a PI performance comparable to the fly is to counteract asymmetries in the initial architecture, such as noisy hardwired connectivity: In the main text we assume that the architecture and the hardwired HD to HR connections are completely circular symmetric. However, as the reviewers have also pointed out in other comments below, this is unrealistic for a biological system. In the new Appendix 3 we show that if we add variability to these connections while maintaining the same profiles for HR to HD and recurrent connections, PI in the model is much worse than in the fly (Appendix 3 – Figure 2). This implies that even if the optimal weight connectivity was to be passed down genetically, it would still be impossible to path integrate as well as the fly, because minimal biological asymmetries in the circuit can have a big impact on PI. Instead, as we show in Appendix 3 – Figure 1, our learning rule can counteract asymmetries in the initial architecture by producing asymmetries in the learned connectivity, andPI remains excellent. Therefore the PI performance of the fly can be matched only if there is plasticity during development.
Finally, we note that since this is a modeling study, we cannot prove that plasticity is operative in the CX; instead, this is a prediction of our model that can only be tested with experiments.
– Another proposed function of the plasticity rule is in adjusting to changes in relative gain between proprioceptive signals. The authors cite Jayakumar et al. (2019) in the introduction as motivation for this, without noting until later that the study used rodents rather than flies. In the discussion, it is mentioned that Seelig and Jayaraman (2015) actually did not find evidence for gain adjustments in flies. This seems to go against the plasticity mechanism being proposed as being relevant for flies. Is there any evidence for this in flies?
To the best of our knowledge, there is no evidence of plasticity being used for gain adjustments in flies. However, it has not been tested in young animals, and this is why we propose it as a testable prediction. As noted in the Discussion (section “Relation to experimental literature”, third paragraph), in the Seelig and Jayaraman (2015) study, the experimenters used mature flies, whereas we propose that, in flies, this form of plasticity is strong only during development. In rodents, on the other hand, there is evidence that such plasticity ensues past the developmental stage.
To better distinguish properties ofPI in rodents and flies, we mention now explicitly in the Introduction that Jayakumar et al. (2019) used rodents. Furthermore we suggest (in the last paragraph of the Introduction and last paragraph of section “Testable predictions” in the Discussion) that our prediction could be tested in young flies.
– Biologically plausible learning rule: One of the main assumptions that allows the authors to claim for a biologically plausible learning rule is that the 'firing rate' of the neuron is available for all synapses at the axondistal compartment due to the assumed backpropagation (line 167). As this seems to be crucial, it will be good to give references showing that there is a backpropagation AP in EPG neurons, or to explain what the alternatives are.
Backpropagation of APs could occur with either active or passive mechanisms. In our setting, passive backpropagation would suffice, and passive spread of activity has been shown to not attenuate too fast in fly neurons (Gouwens and Wilson, 2009). The axonproximal and axondistal compartments belong to the same dendritic tuft, and we assume that the axon initial segment is close to the axonproximal compartment. Thus, the generated AP would only need to travel a short distance compared to the electrotonic length, hence it would not be attenuated considerably (e.g. see figure 5 in Gouwens and Wilson, 2009). We have added these remarks in the second paragraph of “Relation to experimental literature” in the Discussion. Finally, we note that to the best of our knowledge, there is no evidence for active backpropagation of APs in EPG neurons.
– Writing style: The paper would be easier to understand if some of the key relevant equations, for example those that describe the synaptic plasticity rule, were included in the main text.
The learning rule is the key equation, as it contains the most important variables of the model. In this revised manuscript, we have added a simplified version (i.e. without lowpass filtering the weight changes) of the learning rule reported in section “Learning rule” in Methods (see second to last paragraph of the section “Model setup” in Results).
– Analysis of connectome data. The connectome data is a nice support for why using a learning rule that relies on a twocompartment model. However, some points could be clarified. What is exactly is X,Y position? Can the authors plot the shape of the neuron on the same plot, and maybe also to give access to a 3D video of this cloud points? Also, why showing only 1 example, instead of analyzing all the EPG neurons and showing all of them in a supplementary figure. Then, the authors should do a statistical analysis to support their claim, such as doing some clustering analysis. You want to at least convince the reader that you can reject the null hypothesis that inputs are not segregated.
To clarify EPG connectivity at the EB, we provide new figures and analysis in accordance with the reviewers’ remarks. First, we include the skeleton plot of the example EPG neuron in Figure 1E, using coordinates Y and Z in line with the fly connectome. The orientation of the skeleton plot is the same as in the synapse location plot, whose placement is indicated by the zoom box. To avoid congestion, we decided to not plot the shape of the neuron on top of the synapse location plot; the shape of the neuron can be readily seen in the skeleton plot. In addition, we provide a 3D rotating video of the point cloud for this example neuron (Figure 1 – video 1).
Following the reviewer's suggestions, we plot the synapse locations for all 16 neurons we analyzed in the new Figure 1 —figure supplement 1A. Furthermore, to support the claim that visual inputs are separated from recurrent and HR to HD inputs, we perform binary classification between the two classes (R2 and R4d vs. PEN1 and PEN2), using SVMs and 5fold cross validation (for details, see the last paragraph in “Fly Connectome Analysis” in Methods). We find (Figure 1 —figure supplement 1B) that, in heldout test data, the model is excellent (test accuracy > 0.95 across neurons and model runs) at predicting class identity from location alone. We report this finding in paragraph 3 of section “Model Setup” of the results.
– Predictions: It would be good if the authors could try and give more concrete predictions, or better, suggest testable predictions. While a theory doesn't have to give predictions, as mentioned in the public review, in this specific case it would be good if the authors could spend one or two paragraphs on concrete predictions and maybe even suggest new experiments to the community. In other words, while not necessary for publishing the paper, the authors could go beyond saying thatPI requires supervised learning during development, which seems more like a hypothesis of the model than a real prediction.
We agree with the reviewers that it is very important to suggest testable predictions. Therefore we have included a new section “Testable predictions” in the Discussion, where we delineate our model predictions and put forth ways to test them. Briefly, these predictions are:
Synaptic plasticity during development is crucial in setting up the HD system
HD neurons have a compartmentalized structure where idiothetic inputs are separated from allothetic inputs
Allothetic inputs more readily control the firing rate of the HD neurons
Passive or active backpropagation of action potentials makes the output of the neuron available to the compartment that receives idiothetic inputs
The HD system can readily adapt to manipulations of gain during development
– Network initial architecture: It is not clear how the results depend on the specific initialization of the network and how sensitive the main conclusions of the paper are to these choices. More specifically:
– It is not completely clear what the initial recurrent connectivity in the network is, and how the results depend on this choice. For example, does the symmetry, which the authors rely upon to derive their reduced model, persist if the initial connectivity is random?
The reviewers correctly point out that this information was missing from the manuscript. We now describe network initialization in the first paragraph of “Training protocol” in Methods. Briefly, trainable weights for all networks are initialized with random connectivity drawn from a normal distribution with mean 0 and standard deviation 1/ sqrt(N^{HD}), where N^{HD} is the number of HD neurons in the network, as is common practice in the modeling literature.
Since this initialization results in weights that are much smaller that the final weights, to better address the question on the dependence of the results on initial conditions, we randomly shuffle the learned weights in Figure 3A,B to completely eradicate any structure and initialize the network with the shuffled weights. Author response image 1 shows that even in this scenario, learning converges to a low learning error, and PI performance is excellent, virtually indistinguishable from the one of the main text network (see Figure 2). We note however that learning is not able to completely eradicate the noise in the initial weights. Instead, it settles to a noisier connectivity and weights span a larger range (Figure 3A,B to Author response image 1A, B). Despite that, PI remains accurate (see Author response image 1F), therefore the deviations caused by individual weights should be balanced out when network activity as a whole is considered.
We now mention this also in the first paragraph of “Training protocol” in Methods (together with a hint on simulations in Figure 4 that illustrate gain changes), concluding that the final PI performance is virtually independent of the initial distribution of weights.
– Would the results and conclusions of the paper change if the authors use a different choice of the HD>HR connections? Could they be random? Could they be connected to more than one neuron? Could they be plastic as well?
To address these very relevant questions, we include the new Appendix 3; furthermore, we added in the section “Model Setup” (fourth paragraph) in the Results a brief statement that our assumption of “1to1 wiring and constant amplitude of the HD to HR connections” is uncritical. Briefly, in new simulations we drop the 1to1 and constantmagnitude assumptions of the HD to HR connections, and we find that PI performance is not impaired (Appendix 3 – Figure 1). The network can even path integrate when these connections are completely random, albeit for a smaller angular velocity range (Appendix 3 – Figure 3). Finally, we demonstrate that since circular symmetry in the hardwired HDtoHR connections is unreasonable for a biological system, the learning rule is crucial in balancing out any deviations from circular symmetry in the initial architecture; otherwise, PI performance would be considerably impaired (Appendix 3 – Figure 2, also see our answer to the 1st reviewer comment above about the necessity of synaptic plasticity).
Regarding the possibility of the HDtoHR connections being plastic, we note that our model does not incorporate plasticity for connections other than the incoming connections to associative neurons; therefore the HDtoHR connections cannot be learned with our model. We assume that they could be prewired, e.g. during prenatal circuit assembly. If some other form of plasticity operated in these synapses, the network should still learn accurate PI, provided that HDtoHR connections settle to final values after some time. This is supported by the fact that, as mentioned in the previous paragraph, the network learns accurate PI even for completely random HDtoHR connectivity.
Related to this issue, in line 266 the authors claim that 'circular symmetry is a crucial property for any ring attractor'. While this might be a common knowledge in the field, please see Darshan and Rivkind 21 that argue against it. Following this, it is unclear if symmetry is needed for the authors to derive the reduced model, or if it is a principle that they think the system must have. The former is rather technical, while the latter is an important principle. It would be good if the authors could address this in their discussion.
The reviewers are correct to point out that circular symmetry is not required for building a ring attractor in general. Therefore we have corrected the statement (first paragraph of section “Learning results in synaptic connectivity that matches the one in the fly ” in the Results) from “any ring attractor” to “a symmetric ring attractor”, and included a discussion of the suggested paper in the section “Relation to theoretical literature” in the Discussion. It follows that the results in our Mathematical Appendix only hold for the case where the symmetries in the initial learning setup result in a circular symmetric connectivity. Therefore, anatomical symmetry is an assumption of our reduced model, but not a requirement for all ring attractors.
– Saturation: It seems that saturation is crucial in this work. Is it known that EPG neurons reach saturation levels? For example, in cortex this rarely happens, where neurons are usually active around an expensive nonlinearity regime and are far from saturation. It will be beneficial if the authors could show that similar phenomena hold with different types of nonlinearities, in which neurons are not saturated. Alternatively, the authors can explain in the manuscript why they think that EPG neurons in reality are saturated (maybe give references?).
From calcium imaging videos in the original study of Seelig and Jayaraman (2015) and later ones, it seems that EPG neurons within the bump fire vigorously at similar levels, whereas neurons outside the bump fire very sparsely. Therefore the shape of the bump in the Seelig and Jayaraman (2015) experiments seems to not be very far from square, which might hint at saturation. Even though, to the best of our knowledge, it is currently not known whether EPG neurons actually reach saturation, other Drosophila neurons are known to reach saturation with increasing inputs, instead of some sort of depolarization block (Wilson, 2013; Brandao et al., 2021). Saturation with increasing inputs may come about due to, for instance, shortterm synaptic depression: it has been reported that beyond a certain frequency of incoming action potentials, the synaptic input current is almost independent of that frequency (Tsodyks and Markram, 1997; Tsodyks et al., 1998). We now better explain our assumptions underlying saturation in the section “Neuronal model” in the Methods.
– Line 221 (and 389) "…our work reproduces this feature of the fly HD as an emergent property of learning". It is not clear what the authors mean by this. Is the impairment ofPI in small angular velocities a result of the finite number of neurons in the network, or an emergent property of the learning? For example, what would happen if the authors would take 2000 neurons, will they still see this 'emergent property'? This also goes against what is written in Line 465 of the discussion.
The reviewers are right to point this out. Following their suggestion, in Author response image 2 we increase the number of neurons twofold and fourfold and we provide the velocitygain plots for these simulations. In both cases, the flat segment at small velocities has completely disappeared. We further confirm this by only looking at small velocities of + 20 deg/s, and reducing the velocity interval to 0.5 deg/s. Therefore, as suggested by the reviewer, the impairment is indeed caused by the limited number of neurons that the fly has at its disposal, and it is not an emergent property from learning. Hence we have corrected all statements regarding the emergent property. Additionally, we now mention that the flat region disappears if we increase the number of neurons (see final paragraph of the section “Mature network can pathintegrate in darkness”).
– Timescales: there are many timescales in the model. Although it is addressed, to some extent, in the supplementary material, it would be good to add a paragraph stating what are the requirements on these timescales. For example, does the model fail if one of the timescales is smaller\larger than the others? Did the authors assume in the derivation that one timescale is slower than the rest? Why tau_{δ} can be ignored in the derivation? Can the authors show the failure modes of the network\derivation when changing the timescale?
There are indeed many time scales in the model, which are briefly summarized here:
synaptic time constant τ_{s} = 65 ms
membrane time constant of axondistal compartment τ_{l} = 10 ms
membrane time constant of axonproximal compartment, C/g_{L} = 1 ms
weight update filtering time constant τ_{δ} = 100 ms
time constant of velocity decay τ_{v} = 0.5 s
learning time scale (set by 1/η = 20 s)
These time constants are now summarized in the Table 1 in the new Appendix 4. Several of these parameters are well constrained by biology, and thus we chose to keep them constant, while others have quite liberal requirements. We discuss all these in detail in the new Appendix 4.
– Randomness in network connectivity Figure S3. In case of the random connectivity, as the network is so small, the behavior of the network depends on the specific network realization and, therefore, it is hard to assess how representative FigS3DE are. it will be good to show some statistical analysis of the diffusion across these networks.
The reviewers are correct to point out that variability exists between different realizations of networks with added noise in the learned weights (now shown in Figure 3 —figure supplement 3, which is identical to the old Figure S3). A useful quantity to characterize the performance of a network is the diffusion coefficient during path integration, which quantifies how fast the width of the PI error distribution in e.g. Figure 3 —figure supplement 3D increases (see new section “Diffusion Coefficient” in Methods). Therefore, we created multiple networks with perturbed weights, and estimated the diffusion coefficient for all of them. We report the point estimate and 95 % confidence intervals (Students tTest) of the grand average to be 82.3 + 15.7 deg^2/s, which is considerably larger than the diffusion coefficient for networks without a perturbation in the weights (24.5 deg^2/s, see Appendix 1 – Figure 1E). We report these results in the section “Learning results in synaptic connectivity that matches the one in the fly” (last paragraph).
– Noise in the dynamics Figure S4. It is unclear if the diffusion is larger in this case just because the authors added noise, which stays after training, or it is something deeper than that. Could the authors compare the diffusivity in networks that were trained with and without noise? Also, in FigS4 panel E it seems that in contrast to the finitesize network effects, in this case there is no problem to integrate at very small velocities. Why is that? Does the noise contribute to smoothing the barriers and helps to go from a discrete to a more continuous representation when doing PI? Does this have to do with the small number of involved neurons in the fly central complex? One of the reviewers disagrees with the remark about a feature buildin by hand in previous models l220222. This is not something the learning rule per se allows to understand.
In the previous comment we addressed the diffusivity of path integration in networks that receive velocity input. To evaluate whether synaptic input noise during training affects learning, we now systematically explore the impact of this “training noise” on diffusivity for various levels of synaptic input noise during testing (called “test noise”). However, we find that the contribution ofPI errors to the diffusion coefficient is always greater than the one of test noise, even though thePI errors themselves are quite small. Hence, to assess robustness to noise, we compute the diffusion coefficient in networks initialized at various locations and left to diffuse without any velocity or visual input (see the new Appendix 1). We then estimate and plot in panel E of a revised Appendix 1Figure 1 (was labeled Figure S4 in the previous version) the diffusion coefficient for networks that have been trained without input noise (σ_{n} = 0 or SNR = 0, blue dots) and with input noise (σ_{n} = 0.7 or SNR = 2; orange dots), for various test noise levels. Briefly, we find that the network trained with noise is less diffusive for test noise levels above σ_{n} = 0.7 (which is equal to the amount of train noise it has been trained on). We report these findings in the last two paragraphs of Appendix 1.
The lack of “stickiness” at small angular velocities is indeed an effect that can be attributed to noise applied during testing, as suggested by the reviewer. This noise allows random transitions to adjacent attractor states. Furthermore, these random transitions are biased towards the side of the drift velocity input, even for small velocity inputs. Therefore, the flat region for small angular velocities seen before disappears, and the bump moves with the drift speed on average. Indeed the problem with integration of small velocities is related to the small number of neurons, as previously noted, and should not be considered a property that results from learning.
– What is the limitation of the adaptation in the network? For example, the authors claim that 'we test whether our network can rewire to learn an arbitrary gain between the two', but in fact only showed examples for a 50% increase or decrease in gain and only mentioned a negative gain without showing it. It will be useful to have a figure, showing how the performance varies with changing the gain. Is there a maximal\minimal gain that above\below it the network always fail to perform? Which of the network parameters are important for these gain changes?
To address the reviewer’s comment, we now test the ability of the network to rewire for a larger set of gain values. Specifically, we include 2 more gains in Figure 4, and include the new Figure 4 —figure supplement 1, where we show an even broader range of tested gains from ⅛ to 4.5 (Figure 4 —figure supplement 1A), along with extreme examples (Figure 4 —figure supplement 1B,C) to show the limits of the system. In Figure 4 —figure supplement 1DF we also include simulation results where the network was instructed to reverse its gain. We discuss all of these new results in the last paragraph of “Fast adaptation of neural velocity gain” in the Results. A metric to assess deviation from an instructed gain, as well as the relevant quantities setting the limits for gain adaptation are described in the caption of Figure 4 —figure supplement 1.
– The authors seem to assimilate path integration with angular integration (l3140). The way insects manage to perform path integration (i.e. integrate distances) is still undetermined to our knowledge. Clearly, the angular integration performed in the central complex is not enough. The authors should try to disambiguate this point.
We thank you for pointing out this imprecision. We now disambiguate between path integration and angular integration in the first paragraph of the Introduction.
– The authors mention the existence of two PEN populations (PEN1 and PEN2). Could the proposed model could shed light on the presence of these two populations. Would they account for the inhibitory and excitatory part of the HR curves on Figure 3C?
As indicated in Figure 1E and also noted in the first paragraph of “Fly Connectome Analysis” in Methods, PEN1 neurons correspond to HR cells, and PEN2 cells take part in the excitatory recurrent loop of HD cells. Furthermore, as noted in the nextto last paragraph in section “Relation to experimental literature” in the Discussion, the PEN1 neurons account for the excitatory part of the learned HRtoHD connectivity, whereas the negative part could be mediated by other neurons, or interactions between different neurons altogether.
– On l 625, the authors state that the constant inhibitory drive contributes to the uniqueness of the bump. One of the reviewers was not convinced of this statement as this constant inhibition does not play a role in the competition between several bumps. Only the inhibitory recurrent connectivity within the HD population or going through the HR populations would play such a role.
We agree with the reviewer on this pont, and the statement has been corrected (see end of second paragraph of section “Neuronal Model” in Methods). Global inhibition indeed suppresses bumps in general. Only sufficient local activity combined with local excitation can overcome the inhibition. Instead, it is the slightly negative recurrent profile for large offsets that contributes to the competition between bumps.
– In Figure S5B, the bump velocity goes to zero around 500º/s. Could this come from a defect in the network that would, for instance, abolish the presence of a bump in left and right HR populations? In a more robust behavior, only one of the left or right bumps would disappear, and the HD bump would go at a max constant velocity.
The reviewers are correct to point out that the behavior of the network would be more robust if it pathintegrated at the maximum velocity when the instructed velocity is higher than that. The observed behavior however is not the outcome of some defect; rather, it reflects the fact that, as we detail in Appendix 2, the velocity of bump movement is limited by how far away from the current location the bump can be propagated and how fast it can do that. In Appendix 2 we argue that contributions from HD neurons might be required to move that bump. For velocities larger than the velocity limit, there are no immediate excitatory connections between the currently active HD neurons and the HD neurons that should be activated the next instance (the phase shift required for a given maximum velocity can be obtained by solving eq. 23 for b). Instead, connections are inhibitory for such large offsets, because bump phase shifts larger than the one corresponding to the velocity limit move the bump to the negative sidelobes of the recurrent connectivity. Therefore, without enough excitation to overcome the global inhibition and keep the bump moving, the bump disappears.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
The manuscript has been much improved but there are some remaining issues related to the lack of evidence for plasticity in CX. After consultation, the reviewers agreed that the current lack of evidence for plasticity should be pointed out earlier in the manuscript, and suggested presenting the need for plasticity in younger flies as a prediction of the model. One of the reviewers provided additional suggestions listed below.
Reviewer #2 (Recommendations for the authors):
The authors have made a number of improvements to the manuscript. My central concern, which is that there isn't any evidence for this plasticity being present in the CX, still stands. Obviously, a purely modeling study is unable to address this concern. The authors have argued in Appendix 3 that asymmetries in the architecture can severely disrupt the performance of the model and that this supports the need for plasticity. Here are a few suggestions related to this central issue.
We thank the reviewer for their recommendations, and address them subsequently.
1) The authors refer to the performance of flies from Seelig and Jayaraman (2015), stating at various points whether their models outperform the actual performance or not. This should actually be quantified in the figures and/or text. Specifically, both Figure 2 and Appendix 3 would benefit from a quantitative comparison to real fly performance.
We now also quantify PI performance of the network by using the same measure as Seelig and Jayaraman (2015): the correlation coefficient between the PVA and true heading in darkness (see new section “Quantification ofPI performance” in Methods). We report this correlation for the main text network in the Results on Figure 2 (3rd paragraph of “Mature network can pathintegrate in darkness”) and extensively in Appendix 3 (see answer to next suggestion). These correlation values allowed us to compare the performance of the network model to that of the fly.
2) It would be useful to show how much imprecision must be added to the synaptic connections before the model can account for the performance of actual flies. This would answer the question of "how much precision must be genetically encoded to account for fly behavior."
We now add noise to all connections. In the new paragraph 5 of Appendix 3, we specify the amount of noise required to drop below fly performance. Furthermore, we comment on the sharpness of performance drop and which weights are more susceptible to noise.
Reviewer #3 (Recommendations for the authors):
I find the paper to be timely and very interesting. The paper was well presented already in the first version, but I did have a few concerns. The authors did a great job in addressing all of these concerns and comments in the new version of their manuscript.
I strongly recommend it for publications in eLife.
We thank the reviewer for their remarks.
https://doi.org/10.7554/eLife.69841.sa2Article and author information
Author details
Funding
German Research Foundation (SFB 1315  projectID 327654276)
 David Owald
 Richard Kempter
Emmy Noether Programme (282979116)
 David Owald
Federal Ministry of Education and Research (01GQ1705)
 Richard Kempter
Onassis Foundation
 Pantelis Vafidis
Charité – Universitätsmedizin Berlin
 David Owald
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Raquel SuárezGrimalt and Marcel Heim for helpful discussions and Louis Kang for comments on the manuscript. This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation; SFB 1315 – projectID 327654276 to RK and DO; and the Emmy Noether Programme 282979116 to DO and Germany´s Excellence Strategy – EXC2049 – 390688087 to DO), the German Federal Ministry for Education and Research (BMBF; Grant 01GQ1705 to RK), and the Onassis Foundation (PV). The funding sources were not involved in study design, data collection and interpretation, or the decision to submit the work for publication.
Senior Editor
 Ronald L Calabrese, Emory University, United States
Reviewing Editor
 Srdjan Ostojic, Ecole Normale Superieure Paris, France
Reviewer
 Hervé Rouault, Inserm, France
Version history
 Preprint posted: March 12, 2021 (view preprint)
 Received: April 28, 2021
 Accepted: June 17, 2022
 Accepted Manuscript published: June 20, 2022 (version 1)
 Version of Record published: July 15, 2022 (version 2)
Copyright
© 2022, Vafidis et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,793
 Page views

 378
 Downloads

 3
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
Consumption of food and water is tightly regulated by the nervous system to maintain internal nutrient homeostasis. Although generally considered independently, interactions between hunger and thirst drives are important to coordinate competing needs. In Drosophila, four neurons called the interoceptive subesophageal zone neurons (ISNs) respond to intrinsic hunger and thirst signals to oppositely regulate sucrose and water ingestion. Here, we investigate the neural circuit downstream of the ISNs to examine how ingestion is regulated based on internal needs. Utilizing the recently available fly brain connectome, we find that the ISNs synapse with a novel celltype bilateral Tshaped neuron (BiT) that projects to neuroendocrine centers. In vivo neural manipulations revealed that BiT oppositely regulates sugar and water ingestion. Neuroendocrine cells downstream of ISNs include several peptidereleasing and peptidesensing neurons, including insulin producing cells (IPCs), crustacean cardioactive peptide (CCAP) neurons, and CCHamide2 receptor isoform RA (CCHa2RRA) neurons. These neurons contribute differentially to ingestion of sugar and water, with IPCs and CCAP neurons oppositely regulating sugar and water ingestion, and CCHa2RRA neurons modulating only water ingestion. Thus, the decision to consume sugar or water occurs via regulation of a broad peptidergic network that integrates internal signals of nutritional state to generate nutrientspecific ingestion.

 Neuroscience
Complex behaviors depend on the coordinated activity of neural ensembles in interconnected brain areas. The behavioral function of such coordination, often measured as cofluctuations in neural activity across areas, is poorly understood. One hypothesis is that rapidly varying cofluctuations may be a signature of momentbymoment taskrelevant influences of one area on another. We tested this possibility for errorcorrective adaptation of birdsong, a form of motor learning which has been hypothesized to depend on the topdown influence of a higherorder area, LMAN (lateral magnocellular nucleus of the anterior nidopallium), in shaping momentbymoment output from a primary motor area, RA (robust nucleus of the arcopallium). In paired recordings of LMAN and RA in singing birds, we discovered a neural signature of a topdown influence of LMAN on RA, quantified as an LMANleading cofluctuation in activity between these areas. During learning, this cofluctuation strengthened in a premotor temporal window linked to the specific movement, sequential context, and acoustic modification associated with learning. Moreover, transient perturbation of LMAN activity specifically within this premotor window caused rapid occlusion of pitch modifications, consistent with LMAN conveying a temporally localized motorbiasing signal. Combined, our results reveal a dynamic topdown influence of LMAN on RA that varies on the rapid timescale of individual movements and is flexibly linked to contexts associated with learning. This finding indicates that interarea cofluctuations can be a signature of dynamic topdown influences that support complex behavior and its adaptation.