An incentive circuit for memory dynamics in the mushroom body of Drosophila melanogaster

  1. Evripidis Gkanias  Is a corresponding author
  2. Li Yan McCurdy
  3. Michael N Nitabach
  4. Barbara Webb  Is a corresponding author
  1. Institute of Perception Action and Behaviour, School of Informatics, University of Edinburgh, United Kingdom
  2. Department of Cellular and Molecular Physiology, Yale University, United States
  3. Department of Genetics, Yale University, United States
  4. Department of Neuroscience, Yale University, United States

Abstract

Insects adapt their response to stimuli, such as odours, according to their pairing with positive or negative reinforcements, such as sugar or shock. Recent electrophysiological and imaging findings in Drosophila melanogaster allow detailed examination of the neural mechanisms supporting the acquisition, forgetting, and assimilation of memories. We propose that this data can be explained by the combination of a dopaminergic plasticity rule that supports a variety of synaptic strength change phenomena, and a circuit structure (derived from neuroanatomy) between dopaminergic and output neurons that creates different roles for specific neurons. Computational modelling shows that this circuit allows for rapid memory acquisition, transfer from short term to long term, and exploration/exploitation trade-off. The model can reproduce the observed changes in the activity of each of the identified neurons in conditioning paradigms and can be used for flexible behavioural control.

Editor's evaluation

This ambitious study goes from signalling mechanisms to fly behavior through a model of a memory circuit in the fly brain. The authors call this the incentive circuit. The model draws extensively from anatomical and physiological measurements. The study makes a wide range of predictions about how this circuit mediates behaviour and learning through attractive and repulsive cues.

https://doi.org/10.7554/eLife.75611.sa0

Introduction

Animals deal with a complicated and changing world, and they need to adapt their behaviour according to their recent experience. Rapid changes in behaviour to stimuli accompanied by intense reinforcement require memories in the brain that are readily susceptible to alteration. Yet associations experienced consistently should form long-term memories (LTMs) that are hard to change. Memories that are no longer valid should be forgotten. Every neuron cannot have all of these properties, but they must be connected in a circuit, playing different roles such as supporting short-term memory (STM) or LTM, and enabling processes to form, retain, and erase memories. This complex interaction of memory processes is familiar in principle, but its implementation at the single-neuron level is still largely a mystery.

The fruit fly Drosophila melanogaster is able to form, retain, and forget olfactory associations with reinforcements, for example, electric shock. The key neural substrate is known to lie in the neuropils of their brain called the mushroom bodies (MBs) (Davis, 1993; Heisenberg, 2003; Busto et al., 2010). There are two MBs in the insect brain, one in each hemisphere, composed of intrinsic and extrinsic neurons. Extrinsic projection neurons (PN) deliver sensory input to the only intrinsic neurons of the MBs, the Kenyon cells (KCs), whose long parallel axons travel through the pendunculus and then split forming the vertical (α/α′) and medial (β/β′ and γ) MB lobes (see Figure 1). The extrinsic mushroom body output neurons (MBONs) extend their dendrites in different regions of the lobes, receiving input from the KCs and forming 15 distinct compartments (Turner et al., 2008; Tanaka et al., 2008; Campbell et al., 2013; Aso et al., 2014a). Their activity is thought to provide motivational output that modulates the default behaviour of the animal (Aso et al., 2014b). Different groups of extrinsic dopaminergic neurons (DANs) terminate their axons in specific compartments of the MB and modulate the connections between KCs and MBONs (Aso et al., 2014a). Many of the DANs respond to a variety of reinforcement signals (Mao and Davis, 2009; Schwaerzel et al., 2003; Claridge-Chang et al., 2009; Liu et al., 2012; Lin et al., 2014), and therefore, they are considered the main source of reinforcement signals in the MB. Finally, many of the MBON axons and DAN dendrites meet in the convergence zones (CZs), where they create interconnections, such that the motivational output can also influence the activity of the reinforcement neurons (Li et al., 2020).

Overview of the mushroom body circuit.

Left: the main anatomical pathways. In the illustration, the presented odour activates the Kenyon cells (KCs) through the projection neurons (PNs). The parallel axons of KCs propagate this signal to the lobes of the mushroom body. The mushroom body output neurons (MBONs) extend their dendrites in the mushroom body lobes, receiving input from the KCs. Electric shock creates a punishing signal that excites some dopaminergic neurons (DANs), whose axons terminate in the lobes and modulate the synaptic weights between KCs and MBONs. Right: schematic of potential connections between punishment/reward DANs and approach/avoidance MBONs. Note that although DANs transferring punishing signals modulate the KC activation of MBONs that encode positive motivations (decreasing attraction to the presented odour and increasing attraction to odours not present [the dopaminergic plasticity rule]), MBONs that encode negative motivations will also gain higher responses due to release of inhibition between MBONs, and the feedback connections from MBONs to other DANs. In our model, we further decompose these functions using three DANs and three MBONs for each motivation (positive or negative) and map these units to specific identified neurons and microcircuits in the brain of Drosophila. These circuits include some direct (but not mutual) MBON-MBON connections (dashed inhibitory connections).

Several computational models have tried to capture the structure and function of the MBs, usually abstracting the common features of this network across various insect species. Modellers have treated the MBs as performing odour discrimination (Huerta et al., 2004), olfactory conditioning (Balkenius et al., 2006; Smith et al., 2008; Finelli et al., 2008; Young et al., 2011; Wessnitzer et al., 2012; Peng and Chittka, 2017; Faghihi et al., 2017; Zhao et al., 2021; Springer and Nawrot, 2021; Eschbach et al., 2020; Bennett et al., 2021), or calculating the scene familiarity (Wu and Guo, 2011; Baddeley et al., 2012; Arena et al., 2013; Bazhenov et al., 2013; Ardin et al., 2016). However, it seems like they can subserve all these functions, depending on context (or experience), that is, what is driving the activity of the KCs (Cohn et al., 2015). This suggests that the output neurons of the MB do not just inform the animal whether an odour is known or attractive, or if a scene is familiar, but they actually motivate the animal to take an action like approach, avoid, escape, or forage. There is emerging evidence supporting this idea of the MBONs driving non-binary but antagonistic motivations (Schwaerzel et al., 2003; Krashes et al., 2009; Gerber et al., 2009; Waddell, 2010; Lin et al., 2014; Perisse et al., 2016; Senapati et al., 2019), which has started to be explored in recent models.

In addition to the structural and functional depiction of the MBs, a variety of plasticity rules have been used in order to explain the effect of dopamine emitted by the DANs on the KC→MBON synapses. Although the best supported biological mechanism is that coincidence of DAN and KC activity depresses the output of KCs to MBONs, most of the models mentioned before use variations of the Hebbian rule (Hebb, 2005), where the coincidence of the input (KCs) and output (MBONs) activation strengthens the synaptic weight (or weakens it for the anti-Hebbian case) and this is gated by the reinforcement (DANs). More recent approaches that try to model the activity of DANs and MBONs in the brain have used plasticity rules (Zhao et al., 2021) or circuit structures (Springer and Nawrot, 2021; Bennett et al., 2021; Eschbach et al., 2020) that implement a reward prediction error (RPE) (Rescorla and Wagner, 1972), which is the most widely accepted psychological account of associative learning with strong validation evidence in the vertebrate brain (Niv, 2009). For the MB, this plasticity rule is interpreted as the output (MBON) being the prediction of the reinforcement (DAN), so their difference (gated by the activity of the input, KC) drives the synaptic plasticity. However, details of neuronal dynamics in fruit flies (Hige et al., 2015; Dylla et al., 2017; Berry et al., 2018) suggest that neither Hebbian nor RPE plasticity rules capture the plasticity dynamics in the MBs (also in larva: Schleyer et al., 2018; Schleyer et al., 2020) as both rules need the conditional stimuli (CS) to occur (KCs to be active) for synaptic weight change. This highlights the importance of investigating new plasticity rules that are a closer approximation to the actual dopaminergic function.

In this work, we propose such a novel plasticity rule, named the dopaminergic plasticity rule (DPR), which reflects a recent understanding of the role of dopamine in depression and potentiation of synapses. Based on the evidence of existing MBON→DAN connections, we build a 12-neuron computational model, which we call the incentive circuit (IC), and uses the proposed plasticity rule. In this model, we name three types of DANs (‘discharging’, ‘charging’, and ‘forgetting’) and three types of MBONs (‘susceptible’, ‘restrained’, and ‘LTM’) for each of the two opposing motivational states. We demonstrate that the neural responses generated by this model during an aversive olfactory learning paradigm replicate those observed in the animal; and that simulated flies equipped with the IC generate learned odour preferences comparable to real flies. Finally, we suggest that such a model could work as a motif that extends the set of motivations from binary (e.g., avoidance vs. attraction) to a spectrum of motivations whose capabilities are equivalent to ‘decision-making’ in mammals.

Results

The dopaminergic plasticity rule

We implement a novel dopaminergic plasticity rule (DPR) to update the KC→MBON synaptic weights proportionally to the dopamine level, KC activity, and current state of the synaptic weight with respect to its default (rest) value. Our DPR is based on recent findings regarding the role of dopamine (and co-transmitters) in altering synaptic efficacy in the fruit fly MB (see methods section ‘Derivation of the dopaminergic plasticity rule’). Instead of calculating the error between the reinforcement and its prediction, DPR uses the reinforcement to maximise the separation between the synaptic weights of reinforced inputs, which is functionally closer to the information maximisation theory (Bell and Sejnowski, 1995; Lee et al., 1999; Lulham et al., 2011) than the RPE principle. While this rule, in combination with some specific types of circuits, can result in the prediction of reinforcements, it can also support a more flexible range of responses to stimulus-reinforcement contingencies, as we will show in what follows.

The dopaminergic learning rule is written formally as

(1) ΔWk2mij(t)=δj(t)[ki(t)+Wk2mij(t)wrest]

where ΔWk2mij is the change in the synaptic weight connecting a KC, i, to an MBON, j. The KC→MBON synaptic weight, Wk2mij0, and the KC response, ki(t)0, have a lower bound of 0, while the resting weight, wrest=1, is a fixed parameter. The rule alters the connection weight between each KC and MBON on each time-step depending on the dopaminergic factor, δj(t), which is determined by the responses of the DANs targeting this MBON. The dopaminergic factor can be positive [δj(t)>0] or negative [δj(t)<0], which we motivate from recent observations of the differential roles in synaptic plasticity of DopR1 and DopR2 receptors (Handler et al., 2019), as detailed in ‘Materials and methods’. When combined with two possible states of KC activity (active or inactive), this results in four different plasticity effects: depression, potentiation, recovery, and saturation.

These effects can be inferred directly from Equation 1. If the dopaminergic factor is zero (contributing DANs are inactive or mutually cancelling), no learning occurs. If the dopaminergic factor is negative and the KC is active (positive), the KC→MBON synaptic weight is decreased (depression effect of the plasticity rule, see Figure 2A). However, if the synaptic weight is already low, the synaptic weight cannot change further. The recovery effect takes place when the dopaminergic factor is negative and the KC is inactive (ki(t)=0), in which case the synaptic weights tend to reset to the resting weight (see Figure 2C). When the dopaminergic factor is positive and the KC is active, we have the potentiation effect, which causes an increase in the synaptic weights (see Figure 2B). In contrast to the depression effect, as the synaptic weight becomes stronger, it further enhances this effect. If the KC is inactive and the dopaminergic factor is positive, then we have the saturation effect, where if the current synaptic weight is higher than its resting weight, the synaptic weight continues to increase, while if it is lower then it continues to decrease (see Figure 2D). This effect enhances diversity in the responses of the MBON to the different past and current CS experiences, which is essential for memory consolidation (i.e., continued strengthening of a memory) and the formation of long-term memories (i.e., slower acquisition and resistance to further change).

The different effects of the dopaminergic plasticity rule, depending on the activity of the Kenyon cell (KC) (orange indicates active) and the sign of the dopaminergic factor (white arrowheads in dots).

The dopaminergic plasticity rule (DPR) can cause four different effects that work in harmony or discord to maximise the information captured in each experience and allow different types of memories to be formed for each KC→MBON synapse. In each box, time-step t=tpre shows the initial KC→MBON synaptic weights (thickness of the arrows); electric shock activates the DAN in time-step t=tlearn causing modulation of the synaptic weights (red: increase; blue: decrease), while time-step t=tpost shows the synaptic weights after the shock delivery. (A) Example of the depression effect – the synaptic weight decreases when δj(t)<0 and the KC is active. (B) Example of the potentiation effect – the synaptic weight increases when δj(t)>0 and the KC is active. (C) Example of the recovery effect – the synaptic weight increases when δj(t)<0 and the KC is inactive. (D) Example of the saturation effect – the synaptic weight increases further (when Wk2mij(t)>wrest) or decreases further (when Wk2mij(t)<wrest) when δj(t)>0 and the KC is inactive. MBON: mushroom body output neuron.

The different effects described above can work together in single KC→MBON synapses (i.e., through the influence of multiple DANs), leading to more complicated effects like the formation of short-term memories (e.g., combining the depression/potentiation and recovery effects) or long-term memories (e.g., combining the potentiation and saturation effects). However, we will see that by adding MBON→DAN feedback connections a very wide range of circuit properties can be implemented. We next introduce a set of microcircuits that have been found in the fruit fly MBs and describe how they could interlock and interact in one IC to control the motivation and hence the behaviour of the animal.

The incentive circuit

What we call the IC is a circuit in the MB of the fruit fly D. melanogaster that allows complicated memory dynamics through self-motivation. We have identified and modelled this circuit (shown in Figure 3) which consists of six MBONs that receive KC input and six DANs that modulate the KC-MBON connections. The circuit includes some MBON-MBON connections and some feedback connections from MBONs to DANs. All the neurons and connections in this circuit are mapped to identified connectivity in the MB as summarised in Table 1. We will describe each of the microcircuits and the biological justification for their assumed function in detail below, but here we provide an initial overview of the IC function.

Figure 3 with 1 supplement see all
The incentive circuit (IC) integrates the different microcircuits of the mushroom body into a unified model allowing the expression of more complicated behaviours and memory dynamics.

It combines the susceptible, restrained, reciprocal short- and long-term memories and the memory assimilation mechanism microcircuits in one circuit that is able to form, consolidate, and forget different types of memories that motivate the animal to take actions. dav and dat: avoidance- and attraction-driving discharging dopaminergic neurons (DANs); cav and cat: avoidance- and attraction-driving charging DANs; fav and fat: avoidance- and attraction-driving forgetting DANs; sav and sat: avoidance- and attraction-driving susceptible mushroom body output neurons (MBONs); rav and rat: avoidance- and attraction-driving restrained MBONs; mav and mat: avoidance- and attraction-driving long-term memory MBONs.

Figure 3—source data 1

Experimental data from Bennett et al., 2021 modified to include the predicted neuron types.

https://cdn.elifesciences.org/articles/75611/elife-75611-fig3-data1-v1.xlsx
Table 1
Connections among neurons in the Drosophila mushroom body mapped to the connections of the incentive circuit.

Connection types: ‘⊸’, modulates the synaptic weights of the KC→MBON connections terminating in that MBON; ‘→’, excitatory connection; ‘⊣’, inhibitory connection. Microcircuit – SM: susceptible memory; RM: restrained memory; RSM: reciprocal short-term memories; LTM: long-term memory; RLM: reciprocal long-term memories; MAM: memory assimilation mechanism. Evidence – A: anatomical connection is known (i.e., using light or electron microscopy); F: functional connection is known (i.e., whether activating the presynaptic neuron leads to an excitatory or inhibitory effect on the postsynaptic neuron and/or the neurotransmitter released by the presynaptic neuron); KC: Kenyon cell; MBON: mushroom body output neuron; IC: incentive circuit.

Connection in the MBConnection in the ICMicrocircuitEvidenceReferences
PPL1-γ1ped ⊸ MBON-γ1peddavsatSMA, FAso et al., 2014a; Pavlowsky et al., 2018
MBON-γ1ped ⊣ PPL1-γ1pedsatdavSMA, FAso et al., 2014a; Pavlowsky et al., 2018
PAM-γ4<γ1γ2 ⊸ MBON-γ4>γ1γ2datsavSMAAso et al., 2014a
MBON-γ4>γ1γ2 ⊣ PAM-γ4<γ1γ2savdatSMA, FAso et al., 2014a; Cohn et al., 2015
MBON-γ1ped ⊣ MBON-γ5β′2asatravRMA, FAso et al., 2014a; Felsenberg et al., 2018
MBON-γ4>γ1γ2 ⊣ MBON-γ2α′1savratRMAAso et al., 2014a
PPL1-γ2α’12 ⊸ MBON-γ2α′1cavratRSMA, FAso et al., 2014a; McCurdy et al., 2021
MBON-γ2α′1 → PAM-β′2aratcatRSMA, FAso et al., 2014a; McCurdy et al., 2021
PAM-β′2a ⊸ MBON-γ5β′2acatravRSMA, FAso et al., 2014a; McCurdy et al., 2021
MBON-γ5β′2a → PPL1-γ2α’12ravcavRSMALi et al., 2020
PPL1-γ2α’12 ⊸ MBON-α’1cavmavLTMAAso et al., 2014a
MBON-α’1 → PPL1-γ2α’12mavcavLTMAAso et al., 2014a; Li et al., 2020
PAM-β’2a ⊸ MBON-β2β′2acatmatLTMAAso et al., 2014a
MBON-β2β′2 a → PAM-β′2amatcatLTMAAso et al., 2014a; Li et al., 2020
PPL1-γ2α’11 ⊸ MBON-α’1fatmavRLMAAso et al., 2014a
MBON-α’1 → PAM-β2β′2amavfavRLMALi et al., 2020
PAM-β2β′2a ⊸ MBON-β2β′2afavmatRLMAAso et al., 2014a
MBON-β2β′2a → PPL1-γ2α’11matfatRLMALi et al., 2020
PPL1-γ2α’11 ⊸ MBON-γ2α’1fatratMAMAAso et al., 2014a
PAM-β2β′2a ⊸ MBON-γ5β′2afavravMAMAAso et al., 2014a

As presented in Figure 3, for each motivation (attraction or avoidance), the IC has three types of MBON — susceptible, restrained, and LTM — and three types of DAN — discharging, charging, and forgetting. More specifically, working from the outer edges of the model, we have ‘discharging’ DANs that respond to punishment (left side) or reward (right side) and influence the ‘susceptible’ MBONs, which by default respond to all KC inputs (not shown). These in turn inhibit the responses of the ‘restrained’ MBONs of opposite valence. When the discharging DANs depress the response of the susceptible MBONs of opposite valence, they release the restrained MBONs of the same valence, and also decrease the inhibitory feedback to the discharging DANs from the susceptible MBONs. The restrained MBONs activate their respective ‘charging’ DANs, which start to potentiate the LTM MBONs of the same valence, while also depressing the response (to KC input) of the restrained MBON of opposite valence. Similarly, the LTM MBONs enhance the activity of the charging DANs, increasing the momentum of LTM, while simultaneously activating their respective ‘forgetting’ DANs, to decrease the momentum of the opposite valence LTM. The forgetting DANs also depress the restrained MBONs, which makes space for the acquisition of new memories while preserving old ones.

In the following sections, we show in detail how each simulated neuron of this circuit responds during acquisition and forgetting in the aversive olfactory conditioning paradigm shown in Figure 4 and compare this to observed responses in the corresponding identified neurons in the fly from calcium imaging under the same paradigm. We then describe the behaviour of simulated flies under the control of this circuit and learning rule in a naturalistic setting with two odour gradients, paired singly or jointly with punishment or reward. By using more abstracted behavioural modelling, following the approach of Bennett et al., 2021, we are also able to create closely matching results for 92 different olfactory conditioning intervention experiments, that is, the observed effects on fly learning of silencing or activating specific neurons (Figure 3—figure supplement 1, Δf of the model and experiments are correlated with correlation coefficient r=0.76,p=2.2×10-18).

Figure 4 with 1 supplement see all
Description of the experimental setup and the aversive olfactory conditioning paradigms.

(A) Setup for visualising neural activity via Ca2+ imaging during aversive olfactory memory acquisition and reversal. Flies are head-fixed and cuticle dissected for ratiometric imaging of Ca2+-sensitive GCaMP6f and Ca2+-insensitive tdTomato. (B) The aversive olfactory conditioning experimental paradigm. 5 s presentations of odours A (3-octanol [OCT]; coloured pink) and B (4-methylcyclohexanol [MCH]; coloured yellow) continuously alternate, separated by fresh air, while the shock input (100 ms of 120 V) forms the different phases: one repeat of pre-training, where no shock is delivered; five repeats of acquisition, where shock (thin red line) is delivered in the last second of odour B; two repeats of reversal where shock is paired with odour A. (C) Abstract representation of the computational model as an electronic chip. The model receives the conditional (odour) and unconditional stimuli (electric shock) and produces the dopaminergic neuron (DAN) and mushroom body output neuron (MBON) responses using the incentive circuit and the dopaminergic plasticity rule. (D) The aversive olfactory conditioning experimental paradigm modified for testing the model. Odours A (coloured pink) and B (coloured yellow) are presented for two time-steps each, in alternation, separated by one time-step fresh air, while the shock input forms the different phases and forgetting conditions: one repeat of pre-training, where no shock is delivered; five repeats of acquisition, where shock is delivered in the second time-step of odour B; five repeats of forgetting that can be either extinction (lightest shade of odour colour) where shock is not presented, unpaired (mid shade of odour colour) where shock (thin red line) is paired with the fresh air ‘break’, or reversal (dark shade of odour colour) where shock is paired with odour A.

Figure 4—source data 1

Imaging data of all the recorded neurons in the Drosophila melanogaster mushroom body.

https://cdn.elifesciences.org/articles/75611/elife-75611-fig4-data1-v1.zip

Microcircuits of the mushroom body

Susceptible and restrained memories

Pavlowsky et al., 2018 identified a microcircuit in the MB, where a punishment-encoding DAN (PPL1-γ1pedc) depresses the KC synapses onto an attraction-driving MBON (MBON-γ1pedc>α/β), which in turn inhibits the same DAN. They argue that this is a memory consolidation mechanism as the drop in the MBON response will reduce its inhibition of the DAN, enhancing the formation of the memory if the same odour-punishment pairing is continued. Felsenberg et al., 2018 further showed that the same MBON directly inhibits an avoidance-driving MBON (MBON-γ5β′2a), such that its activity increases (driving avoidance) after punishment as the inhibition is released. Figure 5A shows these neurons in the MB and Figure 5B a schematic representation of their interconnections. Note that the MBON⊣MBON inhibition is not reciprocal, rather we assume (see Figure 3 and below) that there is a different microcircuit in which an avoidance-driving MBON inhibits an attraction-driving MBON. Figure 5C–E shows the responses of these neurons from experimental data (left) and from our model (right) during aversive conditioning (the paradigm shown in Figure 4), which follow a similar pattern.

Figure 5 with 3 supplements see all
The susceptible and restrained microcircuits of the mushroom body.

(A) Image of the attraction-driving susceptible and avoidance-driving restrained memory microcircuits made of the PPL1-γ1pedc, MBON-γ1pedc, and MBON-γ5β′2a neurons – created using the Virtual Fly Brain software (Milyaev et al., 2012). (B) Schematic representation of the susceptible and restrained memories microcircuits connected via the susceptible mushroom body output neuron (MBON). The responses of (C) the punishment-encoding discharging dopaminergic neuron (DAN), dav, (D) the attraction-driving susceptible MBON, sat, and (E) the avoidance-driving restrained MBON, rav, generated by experimental data (left) and the model (right) during the olfactory conditioning paradigms of Figure 4D. Lightest shades denote the extinction, mid shades the unpaired, and dark shades the reversal phase. For each trial, we report two consecutive time-steps: the off-shock (i.e., odour only) followed by the on-shock (i.e., paired odour and shock) when available (i.e., odour B in acquisition and odour A in reversal phase); otherwise, a second off-shock time-step (i.e., all the other phases).

Learning in this circuit is shown by the sharp drop (in both experimental data and model) of the response of MBON-γ1pedc>α/β (Figure 5D) to odour B already from the second trial of the acquisition phase. There is a similar drop in the response to odour A in the reversal phase. This rapid decrease is due to the depressing effect of the DAN on the KC→MBON synaptic weight. Note that we name this a ‘discharging’ DAN as the target synaptic strengths are high or ‘charged’ by default. However, due to our plasticity rule, if the unconditional stimuli (US) subsequently occurs without the CS (see unpaired phase in the model, for which we do not have experimental data), the MBON synaptic weights reset due to the recovery effect (see Figure 5—figure supplement 3A, odour B). This is consistent with the high learning rate and low retention time observed in Aso and Rubin, 2016, and it results in a memory that is easily created and erased: a ‘susceptible memory’ (SM). The response of MBON-γ5β′2a (Figure 5E) can be observed to have the opposite pattern, that is, it starts to respond to odour B from the second trial of acquisition as it is no longer ‘restrained’. Note, however, that the response it expresses, when the restraint is removed, also depends on its own synaptic weights for KC input, which, as we will see, may be affected by other elements in the IC. In Figure 5C, the experimental data shows a slight drop in the shock response (first paired with odour B, then with odour A) of the DAN, PPL1-γ1pedc, during the whole experiment, although it remains active throughout. We assume that this drop may reflect a sensory adaptation to shock but have not included it in our model. Consequently, the model data shows a positive feedback effect: the DAN causes depression of the MBON response to odour, reducing inhibition of the DAN, which increases its response, causing even further depression in the MBON. Note that this is opposite to the expected effects of reward prediction error.

Similar microcircuits in the MB can be extracted from the connectome described in Aso et al., 2014a and Li et al., 2020 (also identified in larvae; Eichler et al., 2017). This leads us to the assumption that there are exactly corresponding susceptible and restrained memory microcircuits with opposite valence, that is, a reward-encoding DAN that discharges the response to odour of an avoidance-driving MBON, which in turn releases its restraint on an attraction-driving MBON (see Figure 5—figure supplement 1B and right side of the IC in Figure 3, which mirrors the left side, with opposite valence). We further suggest specific identities for the neurons forming this circuit: PAM-γ4<γ1γ2 as the reward-encoding discharging DAN; MBON-γ4>γ1γ2 as the avoidance-driving susceptible MBON; and MBON-γ2α’1 as the attraction-driving restrained MBON (see Figure 5—figure supplement 1A). The latter identification is based on the possibility of inhibiting connections from MBONs in the γ4 compartment to the ones in the γ2 compartment suggested by Aso et al., 2019 and Cohn et al., 2015. Although MBON-γ4>γ1γ2 is characterised by the glutamate neurotransmitter, it is possible that it can inhibit MBON-γ2α′1 through glutamate-gated chloride channels (Cleland, 1996; Liu and Wilson, 2013; McCarthy et al., 2011).

Reciprocal short-term memories

McCurdy et al., 2021 suggest that the attraction-driving restrained MBON in the previous circuit (MBON-γ2α′1) indirectly decreases the synaptic weights from KCs to the avoidance-driving restrained MBON (MBON-γ5β′2a) via an attraction-encoding DAN (PAM-β′2a). This microcircuit is also supported by Felsenberg et al., 2018 and Berry et al., 2018. Cohn et al., 2015 and Li et al., 2020 suggest that the corresponding avoidance-driving restrained MBON (MBON-γ5β′2a) excites an avoidance-encoding DAN (PPL1-γ2α′1), which closes the loop by affecting the KC connections to the attraction-driving restrained MBON, forming what we call the ‘reciprocal short-term memories’ microcircuit as shown in Figure 6A (actual neurons in the MBs) and Figure 6B (schematic representation of the described connections).

Figure 6 with 2 supplements see all
The reciprocal short-term memories microcircuit of the mushroom body.

(A) Image of the reciprocal short-term memories microcircuit made of the MBON-γ5β′2a, PAM-β′2a, PPL1-γ2α′1, and MBON-γ2α′1 neurons – created using the Virtual Fly Brain software (Milyaev et al., 2012). (B) Schematic representation of the reciprocal short-term memories microcircuit (coloured) connected to the susceptible memories via the restrained mushroom body output neurons (MBONs). The responses of (C) the punishment-encoding charging dopaminergic neuron (DAN), cav, the (D) attraction-driving restrained MBON, rat, and (E) the reward-encoding charging DAN, cat, generated by experimental data (left) and the model (right) during the olfactory conditioning paradigms of Figure 4D. Lightest shades denote the extinction, mid shades the unpaired, and dark shades the reversal phase. For each trial, we report two consecutive time-steps: the off-shock (i.e., odour only) followed by the on-shock (i.e., paired odour and shock) when available (i.e., odour B in acquisition and odour A in reversal phase); otherwise, a second off-shock time-step (i.e., all the other phases).

The ‘charging’ DANs, PAM-β′2a and PPL1-γ2α′1 (named after their long-term memory charging property, i.e., potentiation effect on another KC→MBON synapse, as we describe in the long-term memory microcircuit section), should be activated directly by reinforcement as well as the restrained MBONs. This allows for memories to be affected directly by the reinforcement, but also by the expression of the opposite valence memories. The latter feature keeps the balance between the memories by automatically erasing a memory when a memory of the opposite valence starts building up and results in the balanced learning rate and retention time as observed in Aso and Rubin, 2016. Because the memories in this pair of restrained MBONs are very fragile, we predict that these MBONs store short-term memories.

The effects of this circuit, as shown in Figure 6C–E, are relatively subtle. During acquisition, the shock activates the punishment-encoding charging DAN (see Figure 6C), which decreases the synaptic weights of the KC onto the attraction-driving restrained MBON (see Figure 6—figure supplement 2C), but this cannot be seen in Figure 6D because this MBON is already strongly inhibited (i.e., by the avoidance-driving susceptible MBON). This low response means that the opposing reward-encoding charging DAN Figure 6E is largely unaffected for this conditioning paradigm. In our model, the non-zero activity level of this DAN is a consequence of input from the LTM microcircuit which we describe next and the activation is similar for both odours because our network starts in a balanced state (no preference for either odour). The different response to the two odours seen in the experimental data might therefore represent an unbalanced starting state of its LTM for these odours due to previous experiences of the fly.

Long-term memory

Ichinose et al., 2015 describe a microcircuit where a reward-encoding DAN (PAM-α1) potentiates the KC→MBON synapses of MBON-α1, and MBON-α1 in turn excites PAM-α1. Using data from Li et al., 2020, we find numerous similar microcircuits, and in particular, MBONs that appear to have this recurrent arrangement of connectivity to the ‘charging’ DANs we have introduced to the circuit in the previous section. Specifically, we assume that the reward-encoding charging DAN (PAM-β′2a) can potentiate the response of the attraction-driving MBON-β2β′2a; and similarly the punishment-encoding charging DAN (PPL1-γ2α′1) potentiates the avoidance-driving MBON-α′1 (see Figure 7A and B; Figure 7C shows these connections schematically, with the KCs omitted for convenience). Crucially, these connections form positive feedback circuits — the DAN potentiates the response of the MBON to the odour, which increases its excitation of the DAN. As a consequence, even when the reinforcement ceases, the learning momentum can continue — this is the saturation effect of the learning rule (see Figure 2D) and results in long-term memory consolidation and enhancement.

Figure 7 with 2 supplements see all
The long-term memory microcircuits of the mushroom body.

(A) Image of the avoidance-encoding long-term memory microcircuit made of the MBON-α′1 and PPL1-γ2α′1 – created using the Virtual Fly Brain software (Milyaev et al., 2012). (B) Schematic representation of the long-term memory microcircuits (coloured) connected to the reciprocal short-term memory (RSM) via the charging dopaminergic neurons (DANs). (C) Image of the attraction-encoding long-term memory microcircuit made of the MBON-β2β′2a and PAM-β2a – created using the Virtual Fly Brain software (Milyaev et al., 2012). The responses of the (D) avoidance-driving long-term memory mushroom body output neuron (MBON), mav, and (E) the attraction-driving long-term memory MBON, mat, generated by experimental data (left) and the model (right) during the olfactory conditioning paradigms of Figure 4D. Lightest shades denote the extinction, mid shades the unpaired, and dark shades the reversal phase. For each trial, we report two consecutive time-steps: the off-shock (i.e., odour only) followed by the on-shock (i.e., paired odour and shock) when available (i.e., odour B in acquisition and odour A in reversal phase); otherwise, a second off-shock time-step (i.e., all the other phases).

Figure 7D (right) demonstrates the charging of the avoidance-driving LTM MBON during the acquisition (for odour B) and its continued increase during the forgetting phases. However, these trends are not evident in the experimental data as illustrated in Figure 7D (left). We suggest this is because responses of LTM neurons depend on the overall experience of the animal and are thus hard to predict during one experiment. For example, it could be the case that the animal has already built some long-term avoidance memory for odour A, such that its presentation without reinforcement in our experiment continues its learning momentum, leading to the observed increasing response. Note that the decreasing response to odour A during acquisition in the model, as well as the observed effects in Figure 7E for the attraction-driving LTM MBON, is due to influence from additional microcircuits to be described in the next section. Figure 7—figure supplement 1 shows the responses of these neurons using only the microcircuits that have been introduced so far. In this case, the responses of both LTM MBONs saturate instantly, which shows that another mechanism must exist and regulate them in order for them to become useful for the behaviour of the animal.

Reciprocal long-term memories

As described so far, once the LTM microcircuit begins to charge, it will have a self-sustaining increase in the weights during odour delivery, preventing any subsequent adaptation to altered reward contingencies. To allow these weights to decrease, specifically, to decrease in response to charging of the LTM of opposite valence, we connect the two LTM MBONs via respective ‘forgetting’ DANs (see Figure 8B). Note that these forgetting DANs do not receive any direct reinforcement signals. Instead, as long as an LTM MBON is active, its respective forgetting DAN is also active and causes synaptic depression for the opposite LTM MBON (forgetting the learnt memory; see Figure 8C and D). This counteracts any potentiation effect due to the LTM MBON’s respective charging DAN (see Figure 8—figure supplement 1E and F). As a consequence, sustained reinforcement of one valence can gradually overcome the positive feedback of the LTM circuit of opposite valence, causing the charging momentum to drop and eventually negate. The LTMs are thus in long-term competition.

Figure 8 with 2 supplements see all
The reciprocal long-term memories microcircuit of the mushroom body.

(A) Image of the reciprocal long-term memory microcircuit in the mushroom body made of the MBON-α′1, PAM-β2β′2a, MBON-β2β′2a, and PPL1-γ2α′1 – created using the Virtual Fly Brain software (Milyaev et al., 2012). (B) Schematic representation of the reciprocal long-term memories microcircuit (coloured). The responses of (C) the punishment-encoding forgetting dopaminergic neuron (DAN), fav, the (D) reward-encoding forgetting DAN, fat, generated by experimental data (left) and the model (right) during the olfactory conditioning paradigms of Figure 4D. Lightest shades denote the extinction, mid shades the unpaired, and dark shades the reversal phase. For each trial, we report two consecutive time-steps: the off-shock (i.e., odour only) followed by the on-shock (i.e., paired odour and shock) when available (i.e., odour B in acquisition and odour A in reversal phase); otherwise, a second off-shock time-step (i.e., all the other phases).

We have identified the reciprocal LTMs microcircuit of Figure 8B in the descriptions of Aso et al., 2014a and Li et al., 2020, where MBON-α′1 is the avoidance-driving LTM MBON, MBON-β2β′2a is the attraction-driving LTM MBON, PAM-β2β′2a is the avoidance-driving forgetting DAN, and PPL1-γ2α′1 is the attraction-driving forgetting DAN, as shown in Figure 8A. One problem with this identification is that there is only one PPL1-γ2α′1 per hemisphere, and we have already suggested that it should be identified as the punishment-encoding charging DAN in our model. However, there are multiple axon terminals of this neuron in the MB (e.g., MB296B1 and MB296B2) and each one of them seems to communicate a different response (see Figure 4—figure supplement 1, row 5, columns 6 and 7). Interestingly, the responses communicated by the MB296B1 terminal are close to the ones produced by the punishment-encoding charging DAN (see Figure 6C), and the ones of the MB296B2 are close to the ones produced by the attraction-driving forgetting DAN (see Figure 8D). This implies that different axons of the same DA neuron might create responses that depend on where the axon terminates and actually work as separate processing units. Figure 8C and D shows that the reconstructed responses of these neurons from our model are surprisingly similar to the ones observed in the data.

Memory assimilation mechanism

The forgetting DANs allow the developing LTM of one valence to induce forgetting of the LTM of the opposite valence. However, the forgetting DANs can also be used for another critical function to maintain flexibility for future learning, which is to erase the memory of the same valence from their respective restrained MBONs. We thus predict that the forgetting DANs also suppress the KC synaptic weights of their respective restrained MBONs, forming the ‘memory assimilation mechanism’ (MAM) microcircuit (see Figure 9C). This effectively allows memory transfer between the restrained and the LTM MBONs, enhancing both the adaptability and the capacity of the circuit. This effect can be observed in the difference of the responses of the same neurons in Figures 7 and 8 and Figure 8—figure supplement 1, where the restrained memory becomes weaker as the LTM becomes stronger, driven by the respective forgetting and charging DANs.

Figure 9 with 3 supplements see all
The memory assimilation mechanism (MAM) microcircuit of the mushroom body.

(A) Image of the avoidance-specific MAM microcircuit in the mushroom body made of the MBON-γ5β′2a, PPL1-γ2α′1, MBON-α′1, and PAM-β2β′2a – created using the Virtual Fly Brain software (Milyaev et al., 2012). (B) Image of the attraction-specific MAM microcircuit in the mushroom body made of the MBON-γ2α’1, PAM-β’2a, MBON-β2β′2a, and PPL1-γ2α’1 – created using the Virtual Fly Brain software (Milyaev et al., 2012). (B) Schematic representation of the MAM microcircuits (coloured). The forgetting dopaminergic neurons (DANs) connect to the restrained mushroom body output neurons (MBONs) of the same valence, hence increasing long-term memory (LTM) strength reduces (assimilates) the restrained memory, constituting the MAM microcircuits.

The depression effect of the forgetting DANs on the KC→restrained MBON synapses of the same valence is supported by Aso et al., 2014a. More specifically, the avoidance-driving forgetting DAN we have identified as PAM-β2β′2a modulates the KC→MBON-γ5β′2a synapses, while for the attraction-driving forgetting DAN, PPL1-γ2α′1, modulates the KC→MBON-γ2α′1 synapses, as show in Figure 9A and B, respectively.

Modelling the behaviour

In the IC, three MBON types drive attraction and three avoidance. This results in six driving forces, for each available odour (see Figure 10). A simple ‘behavioural’ readout (used in many previous models) would be to take the sum of all attractive and aversive forces at some time point as a measure of the probability of animals ‘choosing’ odour A or B, and compare this to the standard two-arm maze choice assay used in many Drosophila studies. Following this approach and using the summarised data collected by Bennett et al., 2021, we have tested the performance of our model in 92 olfactory classical conditioning intervention experiments from 14 studies (Felsenberg et al., 2017; Perisse et al., 2016; Aso and Rubin, 2016; Yamagata et al., 2016; Ichinose et al., 2015; Huetteroth et al., 2015; Owald et al., 2015; Aso et al., 2014b; Lin et al., 2014; Plaçais et al., 2013; Burke et al., 2012; Liu et al., 2012; Aso et al., 2010; Claridge-Chang et al., 2009), that is, the observed effects on fly learning of silencing or activating specific neurons, including positive and negative reinforcements. The Δf predicted from the IC correlated with the ones reported from the actual experiments with correlation coefficient r=0.76, p=2.2×10-18 (Figure 3—figure supplement 1).

The activity of the six mushroom body output neurons (MBONs) is translated into forces that drive a simulated fly towards or away from odour sources.

For naive flies, the forces are balanced. When electric shock is paired with an odour, the balance changes towards the avoidance-driving MBONs, which drives the fly directly away from that odour. When sugar is paired with an odour, the balance changes to attraction, driving the fly towards that odour. Combining all attractive and repulsive forces for each odour source currently experienced by the fly produces an overall driving force, v, which determines the fly’s behaviour.

However, classical conditioning does not allow us to explore the full dynamics of the circuit as animals simultaneously explore, learn, express learning, and forget, while moving through a world with odours. Therefore, we further simulate the behaviour produced by the IC with simulated flies placed in a virtual arena, where they are exposed to two odour gradients, of different strengths, and variously paired with reinforcements. As we have full access to the neural responses, the synaptic weights, and the position of the simulated flies for every time-step, this allows us to identify different aspects of the produced behaviour and motivation, including the effect of the LTM on the behaviour and whether ‘choice’ occurs because the animal is attracted by one odour or repulsed by the other. We can then derive a behavioural preference index (PI) based on the time the simulated flies spent exposed in each odour during relevant time periods. Figure 11 summarises our experimental set-up and results, while details about how we generate the presented behaviours are given in the methods section ‘Modelling the behaviour’.

Figure 11 with 7 supplements see all
The behaviour of the animal controlled by its neurons during the freely moving flies simulation.

The n=100 simulated flies are exposed to a mixture of two odours, whose relative intensity depends on the position of the simulated flies in space. (A) Each experiment lasts for 100s where: the flies are placed at the centre of the arena in time-step t=20s. During the first 20s (pre-training phase, t[20,0]), the flies explore the arena without any reinforcement (blue tracks). In the next 30s (training phase, t[0,30]), they conditionally receive reinforcement under one of the six training cases shown on the right: using sugar (green) or shock (red); and reinforcing around odour A (shock + odour A), odour B (shock + odour B), or both odours (shock + odour A/B). During the last 50s (post-training phase t[30,80]), they continue being exposed to the odours without receiving a reinforcement (black tracks). We repeat this experiment (including all its phases) 10 times in order to show the effects of the long-term memory in the behaviour. (B) Behavioural summary of a subset of simulated flies that visited both odours at any time during the 10 repeats. Columns show the different conditions and the population that was recorded visiting both odours. Top row: the normalised cumulative time spent exposed in odour A (pink lines) or odour B (yellow lines; note that this line is reversed). For each repeat, we present three values (average over all the pre-training, training, and post-training time-steps, respectively) where the values associated with the training phase are marked with red or green dots when punishment or reward has been delivered to that odour, respectively. Thin lines show three representative samples of individual flies. Thick lines show the median over the simulated flies that visited both odours. Bottom row: the preference index (PI) to each odour extracted by the above cumulative times.

In Figure 11—figure supplement 4, we can see that most simulated flies do not visit any of the regions that an odour can be detected in the first repeats, and therefore, in Figure 11B, we start seeing an effect in the averaged statistics after the second repeat of the experiment. However, in the first couple of repeats, the individual paths already show a small tendency to the expected behaviour of the flies: avoid the punished region and approach the rewarded one. Due to the unpredictable behaviour of the individual flies, in Figure 11B we summarise only times from simulated flies that have visited both odours for at least 1 s. In later repeats of the experiment, the PI shows that (on average) flies prefer the non-punished and rewarded odours. When both of them are punished or rewarded, they equally prefer none or both, respectively. Note that the above result does not mean that each fly spends equal time in both odours, but that most probably some flies choose to spend more time with the one and some with the other (as shown from the individual cumulative durations in Figure 11B), but their population is equal. It is interesting that almost in every repeat the flies are neutral about the odours during pre-training (time-step before the reinforced one – marked with red or green), showing a relatively small effect during training and a bigger effect during post-training. This might be because in every repeat of the experiment they are initialised in the centre, so they spend some time randomly exploring before they detect an odour.

By looking at the PIs of Figure 11B, we see a strong effect when electric shock is paired with odour A or B, but not very strong otherwise. We also see a smaller PI for flies experiencing sugar than the ones that experience electric shock, which is in line with experimental data (Krashes and Waddell, 2011). When shock is paired with both odours, we expect that the simulated flies will try to minimise the time spent exposed to any of them, which is precisely what we see in the coloured lines. In contrast, simulated flies seem to increase the time spent in both odours when paired with sugar with a slight preference towards the reinforced odour. In general, our results show that (in time) the simulated flies seem to develop some prior knowledge about both odours when experiencing at least one of them with reinforcement (see Figure 11B and Figure 11—figure supplement 2A), which we suggest is because of their overlapping KCs associated with both odours. We believe that this leads to self-reinforcement, which means that when the animal experiences the non-reinforced odour it will automatically associate the reinforcement associated with the overlapping KCs to all the KCs associated with this odour, which is effectively a form of second-order conditioning.

From the summarised synaptic weights shown in Figure 11—figure supplement 1, we can see that the susceptible MBONs immediately block the simulated flies from approaching the punishing odours, while they allow them to approach the rewarding ones, which results in the smaller PI shown in sugar-related experiments compared to the shock-related ones, as discussed before. This is partially because of the lack of reciprocal connections between the opposing susceptible MBONs, and it can be verified through the appetitive conditioning, where the synaptic weights seem to change as the simulated flies now prefer the reinforced odour site. Susceptible MBONs convulsively break the balance between attraction and avoidance created by the restrained and LTM MBONs, also affecting their responses, and allowing STM and as a result LTM formation even without the presence of reinforcement. Figure 11—figure supplement 1 also shows that the restrained MBONs seem to play an important role during the first repeats (up to five), but then they seem to reduce their influence giving up the control to the LTM MBONs, which seem to increase their influence with time. This is partially an effect of the MAM microcircuit, which verifies its function and the role of the restrained MBONs as storing STMs. Figure 11—figure supplement 3 shows that the different types of MBONs alone are also capable of controlling the behaviour. However, they seem to better work when combined as they complement one another in different stages, for example, during early or late repeats and crucial times.

Dopaminergic plasticity rule vs. reward prediction error

We have already shown that our novel dopaminergic plasticity rule and the connectome of the incentive circuit build a powerful model for memory dynamics and behavioural control. In order to verify the importance of our DPR in the model, we run the same experiments by replacing it with the reward prediction error plasticity rule (Rescorla and Wagner, 1972).

The idea behind RPE is that the fly learns to predict how rewarding or punishing a stimulus is by altering its prediction when this does not match the actual reward or punishment experienced (Zhao et al., 2021). This can be adapted to the mushroom body circuit by assuming for a given stimulus represented by KC activation, the MBON output is the prediction, and the KC→MBON synaptic weights should be altered (for the active KC) proportionally to the difference between the MBON output and the actual reinforcement signalled by the DAN. In Equation 30, we show how our DPR can be replaced with the RPE (as described above) in our model. Note that this rule allows updates to happen only when the involved KC is active, implying synaptic plasticity even without DAN activation but not without KC activation, which is in contrast with our DPR and recent findings (Berry et al., 2018; Hige et al., 2015) (also in larva; Schleyer et al., 2018; Schleyer et al., 2020).

This effect, that is, learning when the KC is active even without DAN activation, is visible in Figure 9—figure supplement 2 and Figure 9—figure supplement 3, where we can see that, for the susceptible MBONs, the synaptic weights recover every time before the shock delivery, when the odour is presented alone, resulting in no meaningful learning and cancelling their susceptible property. Restrained MBONs look less affected (at least in this experimental set-up), while the LTM MBONs lose their charging momentum obtained by the saturation effect, resulting in more fragile memories. Furthermore, due to the KC (instead of dopamine) gating of this plasticity rule, the responses during the unpaired and extinction conditions look identical in all neurons, while the reversal makes a difference only on the responses to odour A. In general, the responses reproduced using the RPE plasticity rule have none of the properties of our model that have been shown earlier and also they cannot explain the dynamics of the responses recorded from animals.

In contrast to the responses, the behaviour of the simulated flies (as shown in Figure 11—figure supplement 5 and Figure 11—figure supplement 6) is less affected by the plasticity rule: we still see a preference to the non-punished or rewarded odours. However, there are some details in the behaviour that are different and some properties of the model that need to be mentioned. First, we see that the simulated flies now spend more time in the punished odours (compared to the non-punished ones), which might look like adaptation (in PI level), but it is actually forgetting about the odour. Figure 11—figure supplement 7 shows that synaptic weights targeting the restrained and LTM MBONs are dramatically depressed during the first three repeats and are unable to recover whatsoever, which means that this part of the circuit is knocked out by then. Hence, the behaviour is controlled solely by the susceptible MBONs, which now look more like LTM MBONs that are not reciprocally connected. Furthermore, the synaptic weights associating the odours to both motivations seem to constantly decrease, which makes us believe that both susceptible MBONs will have the same future as the restrained and LTM ones, but it will just take longer. Therefore, we see that although the RPE predicts a reasonable behaviour for inexperienced (or minor experienced) simulated flies, it could gradually result in a meaningless behaviour for experienced flies.

Discussion

We have shown that the combination of our novel dopaminergic plasticity rule (DPR) with the incentive circuit (IC) of the mushroom body is able to generate similar neural responses and behaviours to flies in associative learning and forgetting paradigms. Regarding our model, we provide evidence for the existence of all hypothesised connections and suggest that at least three types of MB output (susceptible, restrained, and LTM) and three types of DA neurons (discharging, charging, and forgetting) exist in the fruit fly brain, discriminated by their functionality. As we show, this forms a unified system for rapid memory acquisition and transfer from STM to LTM, which could underlie the ability to make exploration/exploitation trade-offs. Box 1 summarises a number of prediction yielded by this computational modelling study.

Box 1

Summary of predictions.

The model yields predictions that can be tested using established experimental protocols:

  1. MBON-γ2α’1 and MBON-γ5β′2a should exhibit short-term memories (STMs), while MBON-α′1 and MBON-β2β′2a long-term memories (LTMs). MBON-γ1pedc>α/β and MBON-γ4>γ1γ2 should exhibit susceptible memories. Restrained and susceptible mushroom body output neurons (MBONs) should show more consistent responses across flies. LTM MBONs should have more variable responses because they encode all previous experiences of the animal.

  2. Activating MBON-γ2α′1 or MBON-β2β′2a should increase the responses rate of PAM-β′2a, and similarly activating MBON-γ2β′2a or MBON-α′1 should excite PPL1-γ2α′1. This would verify the excitatory STM reciprocal and LTM feedback connections of the circuit. By activating the LTM MBONs (e.g., MBON-α′1 and MBON-β2β′2a) should also excite the forgetting dopaminergic neurons (DANs) (e.g., PAM-β2β′2a and PPL1-γ2α′1, respectively). This would verify the excitatory LTM reciprocal connections of the circuit.

  3. By consistently activating one of the LTM MBONs while delivering a specific odour, the LTM MBON should show an increased response to that odour even without the use of a reinforcement. This would verify the saturation effect of the DPR and the charging momentum hypothesis. On the other hand, if we observe a reduced response rate, this would show that MBON-DAN feedback connection is inhibitory and that RPE is implemented by the circuit.

  4. Blocking the output of charging DANs (i.e., PPL1-γ2α′1 and PAM-β′2a) could reduce the acquisition rate of LTM MBONs, while blocking the output of LTM MBONs would prevent memory consolidation. Blocking the reciprocal connections of the circuit should prevent generalising amongst opposing motivations (unable to make short- or long-term alteration of responses to odours once memories have formed). Blocking the output of forgetting DANs would additionally lead to hypersaturation of LTMs, which could cause inflexible behaviour.

  5. Activation of the forgetting DANs should depress the Kenyon cell (KC)-MBON synaptic weights of the restrained and LTM MBONs of the same and opposite valence, respectively, and as a result suppress their response to KC activation. Activation of the same DANs should cause increased activity of these MBONs for silenced KCs at the time.

  6. Unpaired conditioning should involve the LTM circuit (or at least some microcircuit within the MB where the MBON excites a DAN). Second-order conditioning should involve the LTM circuit and might not require the susceptible and restrained memory circuits. Backward conditioning might not occur in all compartments as in our model it is required that the target MBON inhibits its respective DAN (susceptible memory microcircuit) and to date has only been demonstrated for microcircuits with this property.

  7. DANs that innervate more than one compartment may have different functional roles in each compartment.

Advantages of the dopaminergic plasticity rule

The proposed DPR, while remaining very simple, allows the animal to express a variety of behaviours depending on their experience. The rule remains local to the synapse, that is, it depends only on information that is plausibly available in the presynaptic area of the KC axon (Equation 1): the activity of the KC, the level of DA, and the deviation of the current ‘weight’ from a set-point ‘resting weight’. We note that it was not possible to obtain good results without this third component to the rule, although the underlying biophysical mechanism is unknown; we speculate that it could involve synapsin as it has a direct role in regulating the balance of reserve and release vesicle pools, and is required in the MB for associative learning (Michels et al., 2011). The rule also introduces a bidirectional ‘dopaminergic factor’ based on the results of Handler et al., 2019, who showed the combination of DopR1 and DopR2 receptor activity can result in depression or potentiation of the synapse. In our plasticity rule, a positive or negative dopaminergic factor combined with active or inactive KCs leads to four possible effects on the synapse: depression, potentiation, recovery, and saturation. This allows substantial flexibility in the dynamics of learning in different MB compartments.

In particular, the saturation allows LTM MBONs to consolidate their memories and makes it very hard to forget. This only occurs for consistently experienced associations, which then become strongly embedded. Only a persistent change in the valence of reinforcement experienced with a given stimuli can reset the activity of LTM MBONs through the reciprocal LTMs microcircuit, which equips the circuit with flexibility even in the LTMs. Further, the fact that the DPR allows STMs (restrained) and LTMs to interact through the memory assimilation mechanism (MAM) increases the capacity of the circuit. Whatever the restrained MBONs learn is eventually assimilated by the LTM MBONs, opening up space for the formation of new memories in the restrained MBONs. When combined with sparse coding of odours in a large number of KCs, the LTM MBONs can store multiple memories for different odours. Short-term experience might occasionally affect the behaviour when the susceptible and restrained MBONs learn something new, and hence mask the LTM output, but eventually this will be smoothly integrated with the previous experience in the LTM MBONs. The DPR plays an important role in this mechanism, as we saw earlier, and the connectivity alone is not enough for it to work properly.

By contrast, the RPE plasticity rule lacks this flexibility and fails to maintain useful LTMs when applied to the same circuit architecture. A literal interpretation of RPE for the MB would require that the difference (error) between the postsynaptic MBON activity and the DA level is somehow calculated in the presynaptic KC axon. This seems inconsistent with the observation that learning is generally unaffected by silencing the MBONs during acquisition (Hige et al., 2015; Krashes et al., 2007; Dubnau et al., 2001; McGuire et al., 2001). Alternatively (and not directly requiring MBON activity in the KC plasticity rule) the RPE could be implemented by circuits (Bennett et al., 2021; Springer and Nawrot, 2021; Eschbach et al., 2020) in which DANs transmit an error signal computed by their input reinforcement plus the opposing feedback from MBONs (i.e., MBONs inhibit DANs that increase the KC→MBON synaptic weights, or they excite those that suppress the synaptic weights). However, although the evidence for MBON-DAN feedback connections is well-grounded, it is less clear that they are consistently opposing. For example, in the microcircuits we have described, based on neurophysiological evidence, some DANs that depress synaptic weights receive inhibitory feedback from MBONs (Pavlowsky et al., 2018) and some DANs that potentiate synaptic weights receive excitatory feedback from DANs (Ichinose et al., 2015). As we have shown, the DPR is able to operate with this variety of MBON-DAN connections. Note that, by using the appropriate circuit, that is, positive MBON-DAN feedback to depressing DANs, our DPR could also have an RPE effect. Although the proposed IC does not include such connections, it is still possible that they exist.

The conditioning effects of the model

During the past decades, a variety of learning effects have been investigated in flies, including forward and backward (relief) conditioning, first- and second-order conditioning and blocking, which we could potentially use to challenge our model. In the methods section ‘Derivation of the dopaminergic plasticity rule’, we demonstrate that our model supports the backward (or relief) conditioning results presented in Handler et al., 2019. Backward conditioning is when the reinforcement is delivered just before the odour presentation and it is based on the time dependency between the two stimuli. Handler et al., 2019 suggest that the backward conditioning is a mechanism driven by ER-Ca2+ and cAMP in a KC→MBON synapse, when a single DAN releases DA on it. In our model, we assume that different time courses in the response of DopR1 and DopR2 receptors cause the different patterns of ER-Ca2+ and cAMP, resulting in the formation of opposite associations for forward and backward conditioning. We note however that in our model the effect also requires that the target MBON inhibits the respective DAN (as in our susceptible memory microcircuits) altering the time course of neurotransmitter release. This may suggest that backward conditioning does not occur in all MB compartments. We believe that this mechanism for backward conditioning is better supported than the hypothesised mechanism of post-inhibitory rebound in opposing valence DANs presented in Adel and Griffith, 2021, although some role for both mechanisms is possible.

Backward conditioning can be distinguished from the unpaired conditioning effect; the latter involves the presentation of reinforcement and a specific odour in alternation with less temporal proximity. It has been observed (Jacob and Waddell, 2020; Schleyer et al., 2018) that this procedure will produce a change in response to the odour that is opposite in valence to the reinforcement, for example, approach to an odour that is ‘unpaired’ with shock. Note that this effect can be observed both in standard two-odour CS+/CS- training paradigms (where an altered response to CS-, in the opposite direction to CS+, is often observed) but also in single-odour unpaired paradigms. Surprisingly, our model also produces unpaired conditioning, notably through a different mechanism than backward conditioning. When DANs are activated by a reinforcement without KC activation, the weights of all KCs are potentially altered, for example, restored towards their resting weight or slightly potentiated. This alteration means that subsequent presentation of odour alone can be accompanied by MBON-driven activation of DANs, resulting in specific alteration of the weights for the presented odour. In the example of Figure 12, odour A starts to self-reinforce its attractive LTM when presented in alternation with shock and will be preferred to an alternative odour B in subsequent testing. However, repeated presentation of other odours during testing, without further shock, might lead to generalisation (equal preference to all experienced odours).

The preference index (PI) of the agent during the classic unpaired conditioning paradigm.

During the training phase, we deliver electric shock or odour A alternately. During the test phase, we deliver odours A and B alternately. The PI is calculated by using the mushroom body output neuron (MBON) responses for each odour.

The self-reinforcing property of the positive feedback in the LTM microcircuit can also account for second-order conditioning. If a motivation has been associated to an odour, MBONs related to that motivation will have increased activity when the odour is delivered, even in the absence of reinforcement. In the LTM microcircuit, the positive MBON-DAN connection will consequently activate the charging DAN, so any additional cue (or KC activity) presented alongside the learned odour will also experience an increase in the respective KC→MBON weights, creating a similar charging momentum and resulting in a second-order association. Perhaps surprisingly, this predicts that second-order conditioning might happen directly in the LTM microcircuit without being filtered by the susceptible and restrained memories first. This would be consistent with the observation that second-order conditioning in flies requires strong induction of the first-order memory and that first-order memory does not appear to be extinguished by the absence of reinforcement during second-order training (Tabone and de Belle, 2011).

Finally, although we have not tested it explicitly here, it is clear that our plasticity rule (unlike RPE) would not produce blocking. The blocking effect, as described by Kamin, 1967, is when the conditioning to one stimulus subsequently blocks any conditioning to other elements of a mixture including that stimulus. Under RPE learning, this is explained by the first stimulus already correctly predicting the reinforcer, so there is no error to drive a change in the weights. Using the DPR, the updates are local to the synapse and do not depend on a calculation of errors summarised across different odour identities, so blocking does not happen, which is consistent with the observed behaviour of fruit flies (Young et al., 2011; Brembs and Heisenberg, 2001). Although the presentation of a learned odour along with a novel odour might, through feedback from the MBONs, alter the DAN responses to the reinforcement, in our circuit this is not generally an opponent feedback so will not cancel the reinforcing effects for the novel odour. This also highlights the difference between our susceptible, restrained, and long-term memory microcircuits from the RPE circuits described in Bennett et al., 2021, Springer and Nawrot, 2021, Eschbach et al., 2020, and Zhao et al., 2021. Nevertheless, as Wessnitzer et al., 2012 and later Bennett et al., 2021 suggest, the fact that blocking has not been observed in fruit flies could also be explained by the way that the mixture of odours is represented by the KCs, that is, that it might not be simply the superposition of the activity patterns of the individual odours.

Additional mushroom body connections

Our model suggests that only KC→MBON, MBON⊣DAN, MBON→DAN, and DAN⊸MBON connections are essential for successful learning in the MBs. However, there are a number of additional known connections in the MBs, such as KC→APL, APL⊣KC, DAN→MBON, axoaxonic KC→KC and KC→DAN connections that have been neglected in this model, and need further consideration.

In the larval brain, there are two anterior paired lateral (APL) neurons, one for each MB. They extend their dendrites to the lobes of the MBs and terminate their axons in the calyxes releasing the inhibitory GABA neurotransmitter (Tanaka et al., 2008). Although there are still two of them, in the adult brain both their dendrites and axons are innervating the calyx and the lobes (Wu et al., 2013), suggesting that they function as both global and local inhibitory circuits. Moreover, DAN⊣APL (Liu and Davis, 2009) and APL⊣DAN (Wu et al., 2012) connections have been proposed, but there is no clear description of what their function is. Several previous models (Peng and Chittka, 2017; Delahunt et al., 2018) have demonstrated that a potential function for this global/local inhibition network is gain control such that the total number of KCs firing to different stimuli remains similar, and indeed that the same effect can be implemented using a flexible threshold for KC firing (Saumweber et al., 2018; Zhu et al., 2020; Zhao et al., 2020). In our model, we have simplified the KC input, representing just two odours as different patterns across a small number of KCs with a fixed number of them being active at all times, so the hypothesised gain control function of the APL is not useful here. However, it remains an interesting question whether there is learning between the KC and APL in the lobes (Zhou et al., 2019), or between the APL and KC in the calyx, and what role this might play in the overall dynamics of memory acquisition.

In addition, Eichler et al., 2017 suggest that most of the KCs input, that is, 60%, is from other KCs. We suggest that these connections (together with the ones from the APL) might create local winner-takes-all (WTA) networks that force a limited number of KCs per compartment to be active at one time. This predicts that it is possible for the same KC axon to be active in one compartment but inactive in another (consistent with recent data from Bilz et al., 2020), and that an almost fixed number of KCs might be active at all times, even when no odour is delivered (e.g., fresh air only) enabling the acquisition and forgetting at all times. Ito et al., 2008 show that KCs can be active even in the absence of odours but with no consistent spiking, which is a characteristic of WTA networks when the underlying distribution of spikes across the neurons is almost uniform.

Eichler et al., 2017 also observed (from electron microscopy reconstruction in larva) that within a compartment, in a ‘canonical microcircuit’, KCs make direct synapses to the axons of DANs, and that DAN pre-synapses often simultaneously contact KCs and MBONs. The same connections have been observed in adult Drosophila by Takemura et al., 2017. The extent to which KCs (and thus odour inputs) might be directly exciting DANs remains unclear. Cervantes-Sandoval et al., 2017 show that stimulating KCs results in increased DAN responses and that DANs are activated through the ACh neurotransmitter. However, we note that in our model such an effect could be explained without assuming a direct connection. For example, in the LTM microcircuit, activating the KCs results in increased activity of the LTM MBON, which excites the respective charging DAN. The DAN that Cervantes-Sandoval et al., 2017 provide evidence from is PPL1-α2α′2, which gets excited by MBON-α2α′2 neurons as it is characterised by the ACh neurotransmitter (Aso et al., 2014a). In our terms, this could be an LTM MBON that excites its respective charging DAN, PPL1-α2α′2 (Li et al., 2020), and provide the source of ACh detected on it. More generally, the altered activity of DANs in response to odours that have been observed during learning can be also observed in our model, without requiring direct KC→DAN connections or their modification. Nevertheless, such connections may possibly play a role in enhancing the specificity of dopamine-induced changes in KC→MBON connectivity. Interestingly, the depression of KC→DAN synapses, in parallel with KC→MBON synapses, could provide an alternative mechanism for implementing RPE learning (Takemura et al., 2017).

Takemura et al., 2017 demonstrate that the direct synapses observed from DANs to MBONs are functional in altering the MBON postsynaptic current to DAN activation, independently of KCs. This could be a mechanism by which learnt responses to reinforcements are coordinated with the current presence or absence of the reinforcement (Schleyer et al., 2020; Schleyer et al., 2011; Gerber and Hendel, 2006). Another possibility is that postsynaptic as well as presynaptic changes might be involved in learning at the KC→MBON synapse (Pribbenow et al., 2021).

Beyond attraction and aversion

The IC consists of six MBONs and six DANs that link a pair of antagonistic motivations, attraction, and avoidance. However, there are ∼34 MBONs and ∼130 DANs in the MB of the adult fruit fly brain, within which the IC is an identifiable motif. We suggest the possibility that this motif could be repeated, representing additional opposing motivations, with some neurons having multiple roles depending on the motivational context as proposed by Cohn et al., 2015, working either as restrained MBONs and discharging DANs, or as LTM MBONs and forgetting DANs depending on the reinforcer identity. We have illustrated this concept of a unified system of motivations as the ‘incentive wheel’ (see Appendix 1—figure 1). This could explain how PAM-β2β′2a (i.e., MB301B; May et al., 2020) is a sugar-encoding discharging DAN in the appetitive olfactory conditioning context, but it also is an avoidance-driving forgetting DAN in a different context (e.g., aversive olfactory conditioning). In addition, two MBONs of the IC do not interact with the α′/β′ KCs of the MB. MBON-γ4>γ1γ2 and MBON-γ1pedc>α/β are part of two autonomous microcircuits, that is, the SMs, and are working under the context provided by the ∼675 γ-KCs relative to the task. This makes it possible that the KCs from the γ lobe connect to all the SMs of the flies for the approximately eight available motivations illustrated in Appendix 1—figure 1.

From a functional point of view, the MBs seem to be involved in the motivation and behaviour of the animal, especially when it comes to behaviours essential for survival. In the mammalian brain, this function is subserved by the limbic system, which is composed of a set of complicated structures, such as the thalamus, hypothalamus, hippocampus, and amygdala (Dalgleish, 2004; Roxo et al., 2011). According to Papez, 1937, sensory (and mostly olfactory) input comes in the limbic system through the thalamus, which connects to both the cingulate cortex (through the sensory cortex) and the hypothalamus (Roxo et al., 2011; Dalgleish, 2004). Responses in the cingulate cortex are guiding the emotions, while the ones in the hypothalamus are guiding the behaviour (bodily responses). Finally, the hypothalamus connects with the cingulate cortex through the anterior thalamus (forward) and the hippocampus (backward stream). Maclean, 1949 augmented this model by adding the amygdala and PFC structures that encode primitive emotions (e.g., anger and fear) and connect to the hypothalamus (Roxo et al., 2011; Dalgleish, 2004). We suggest that some of the functions we have identified in the MB IC could be mapped to limbic system structures (see Figure 13).

The mammalian limbic system as described by Papez, 1937 and Maclean, 1949 and the suggested parallels in the proposed incentive circuit.

On the left, we show the mushroom body microcircuits that correspond to the different structures in the mammalian limbic system. In the centre, we have the connections among the different structures of the limbic system. On the right, we show the groups of mushroom body neurons we suggest that have a similar function to the ones in the limbic system.

More specifically, the α′/β′-KCs could have a similar role to the neurons in the thalamus, α/β-KCs represent a higher abstraction of the input stimuli and have a similar role to the ones in the sensory cortex, while the γ-KCs represent relatively unprocessed stimuli. This would make the susceptible MBONs parallel to neurons in the amygdala, creating responses related to primitive motivations and connecting to (inhibiting) the restrained MBONs, which we would compare to the hypothalamus as providing the main control of behaviour. As we suggest that the same MBONs could fulfil a role as LTM or restrained in different circuits (see Appendix 1—figure 1), the LTM would also correspond to hypothalamus, with input from the α′/β′-KCs, and thus the RSM, RLM, LTM, and MAM microcircuits are assumed to correspond to hypothalamus functions. Following this analogy, we predict that the function of the cingulate cortex then is represented by the α/β MBONs, encoding the ‘emotions’ of the animal towards reinforced stimuli, potentially controlling more sophisticated decision-making. This mapping would suggest the connections amongst the restrained/LTM (α′/β′) MBONs and the ‘emotional’ (α/β) MBONs are similar to the hippocampus and anterior thalamus pathways.

While it might seem startling to suggest that a compact circuit of single identified neurons in the insect MB mimics in miniature these far larger and more complex structures in the mammalian brain, the justification comes from the similarity in the behavioural demands common to all animals: surviving and adapting in a changing world.

Materials and methods

Implementation of the incentive circuit

We represent the connections between neurons by using synaptic weight matrices and non-linearly transform the information passing from one neuron to another by using an activation function. Next, we define these parameters and some properties of our computational model, which are not a result of unconstrained optimisation and are consistent throughout all our experiments.

Parameters of the model

Request a detailed protocol

We assume that the odour identity passes through the projection neurons (PNs) into the mushroom body and its Kenyon cells (KCs). It is not in the scope of this work to create a realistic encoding of the odour in the PNs, so we assume that the odour signal is represented by np=2 PNs, one for each odour, and that these project to form distinct activations in a set of nk=10 KCs in the MB, that is, a subset of KCs that respond to the specific odours used in the experiments. Therefore, the vector pA=[1,0] represents the activity of the PNs when odour A is detected, pB=[0,1] when odour B, pAB=[1,1] when both odours, and p=[0,0] when none of them is detected. The responses of the KCs are calculated by

(2) k(t)=WAT0.5[kT(t)Wp2k+η]whereηN(0,0.001)

η is some Gaussian noise, Wp2kR+2×10 is the weights matrix that allows the transformation of the two-dimensional odour signal into the ten-dimensional KC responses, and t is the current time-step. The WTA0.5 [x] is an activation function that keeps the top 50% of KCs active, based on the strength of their activity. Note that the number of neurons we are using for PNs and KCs is not very important, and we could use any combination of PN and KC populations. However, the bigger the KC population the smaller percentage of them should be active. The PN→KC synaptic weights used are shown as

(3) Wp2k=[0.80.80.80.80.80.80.800000000.80.80.80.80.80.8].

The odours are represented by different firing patterns across 10 KCs: 4 fire only for A, and 3 fire only for B, while the remaining 3 fire to either odour. This is to show the effects of the DPR when we have overlap in the KCs that respond to the two odours used in the conditioning paradigm. This assumption also created the best fit with the data, suggesting that there might be overlapping KCs encoding the real odours tested in the fly experiments.

We transform the reinforcement (US), u(t){0,1}2, delivery into an input for the DANs by using the weights matrix Wu2d+2×nd. We represent the activity of the DANs, d(t)6, with a six-dimensional vector, where each dimension represents a different neuron in our model. Specifically,

(4) d(t)=[dat(t)dav(t)cat(t)cav(t)fat(t)fav(t)].

The US is represented by a two-dimensional vector where the first dimension denotes rewarding signal and the second dimension denotes punishment: usugar=[1,0] and ushock=[0,1]; and the contribution of this vector to the responses of the DANs is given by

(5) Wu2d=[202000020200].

In line with the DANs vector representation, we have a similar vector for MBONs, m(t)6, where each dimension represents the response of a specific neuron in time t as is shown in the following equation:

(6) m(t)=[sat(t)sav(t)rat(t)rav(t)mat(t)mav(t)].

The weight matrix that encodes the contribution of KCs to the MBON responses, Wk2m(t)10×6, is initialised as

(7) Wk2m(t=0)=1(nk,nm)=1(10,6),

which effectively is a 10×6 matrix of ones. In other words, all KCs connect to all MBONs, and their initial weight is positive and the same for all connections. As these are plastic weights, their value depends on the time-step, and therefore we provide time, t, as a parameter. Note that also wrest=1, which initially results in the absence of memory, Wk2mij(t=0)-wrest=0. Thus, any deviation of the synaptic weights from their resting value represents a stored memory with strength vmemij(t)=||Wk2mij(t)-wrest||.

There are also MBON→DAN, Wm2d6×6, and MBON→MBON connections, Wm2m6×6, which are given by

(8) Wm2d=[0-0.30000-0.300000000.50000000.500000.300.500000.300.5]

and

(9) Wm2m=[000-10000-1000000000000000000000000000].

The above matrices summarise the excitatory (positive) and inhibitory (negative) connections between MBONs and DANs or other MBONs as defined in the IC (Figure 3, see also Figure 14). The sign of the weights was fixed but the magnitude of the weights was hand-tuned in order to get the desired result, given the constraint that equivalent types of connections should be the same weight (e.g., in the reciprocal microcircuits). The magnitude of the synaptic weights specifies the effective strength of each of the described microcircuits in the overall circuit. We also add some bias to the responses of DANs, bd, and MBONs, bm, which is fixed as

(10) bd=[-0.5-0.5-0.15-0.15-0.15-0.15]
(11) bm=[-2-2-0.5-0.5-0.5-0.5].
Figure 14 with 3 supplements see all
The synaptic weights and connections among the neurons of the incentive circuit (IC).

Each panel corresponds to a different synaptic weights matrix of the circuit. The size of the circles when the presynaptic axon crosses the postsynaptic dendrite shows how strong a connection is, and the colour shows the sign (blue for inhibition, red for excitation). Light-green stars show where the synaptic plasticity takes place and how the dopaminergic neurons (DANs) modulate the synaptic weights between Kenyon cells (KCs) and mushroom body output neurons (MBONs).

This bias can be interpreted as the resting value of the neurons or some external input from other neurons that are not included in our model.

Finally, we define the DAN function matrix, Wd2kmnd×nm, which transforms the responses of the DANs into the dopamine factor that modulates the Wk2m(t) synaptic weights, and it is given as

(12) Wd2km=[0-10000-100000000-10.3000-1000.300-0.300-1000-0.3-10].

All the parameters described above are illustrated in Figure 14. Figure 14—figure supplement 1, Figure 14—figure supplement 2, and Figure 14—figure supplement 3 show how each of these parameters affects the responses of the neurons in the IC. The last thing left to describe is the activation function, which is used in order to generate the DAN and MBON responses. This is

(13) ϱ(x)={2ifx2,xif0<x<2,0ifx0,

which is the rectified linear unit (ReLU) function, bounded in ϱ(x)[0,2]. The reason why we bound the activity is to avoid having extremely high values that explode during the charging of the LTM.

Forward propagation

Request a detailed protocol

For each time-step, t, we read the environment and propagate the information through the model in order to update the responses of the neurons and the synaptic weights. This process is called forward propagation, and we repeat it as long as the experiment runs.

First, we read the CS, p(t), and US, u(t), from the environment and calculate the KC responses by using Equation 2. In order to calculate the DANs and MBONs update, we define the differential equations as follows:

(14) τdddt=-d+uT(t)Wu2d+mT(t)Wm2d+bd
(15) τdmdt=-m+kT(t)Wk2m(t)+mT(t)Wm2m+bm

where τ=3 is a time-constant that is defined by the number of time-steps associated in each trial, T denotes the transpose operation of the matrix or vector, and m=m(t) and d=d(t) are functions of time. Using the above differential equations, we calculate the updated responses (i.e., responses in the next time-step, t) as

(16) d(t)=ϱ[1τd+dddt]
(17) m(t)=ϱ[1τm+dmdt].

Finally, we calculate the dopaminergic factor, δ(t)6, and update the KC→MBON synaptic weights as

(18) δ(t)=dT(t)Wd2km
(19) τdWk2mdt=δ(t)*[kT(t)+Wk2m-wrest]
(20) Wk2m(t)=max[Wk2m+dWk2mdt,0]

where ‘*’ denotes the element-wise multiplication, wrest=1 is the resting value of the weights, and Wk2m=Wk2m(t) is a function of time. Note that element-wise multiplication means that each element of the δ(t) vector will be multiplied with each column of the Wk2m(t) matrix. Also, the element-wise addition of the transposed vector, kT(t), to the Wk2m(t) matrix, means that we add each element of k(t) to the corresponding row of Wk2m(t). We repeat the above procedure as many times as it is required in order to complete the running experimental paradigm routine.

Modelling the neural responses

Request a detailed protocol

To emulate the acquisition and forgetting paradigms used for flies, we run the simulated circuit in an experiment that consists of T=73 time-steps. Each time-step actually comprises four repeats of the forward propagation update described above to smooth out any bias due to the order of computations (value vs. weights update). After the initialisation time-step at t=0, there are 24 trials where each trial consists of 3 in-trial time-steps.

Within each trial, the first time-step has no odour, and in the second and third time-steps, odour is presented: odour A on even trials and odour B on odd trials. A trial can have no shock (Figure 15A), unpaired shock presented in the first time-step (Figure 15B), or paired shock presented in the third time-step (Figure 15C). The first two trials compose the ‘pre-training phase’, where we expose the model to the two odours alternately (i.e., odour A in trial 1 and odour B in trial 2) without shock delivery. Then we have the acquisition phase, where we deliver shock paired with odour B for 10 trials (five trials per odour; Figure 15D). Before we proceed to the forgetting phases, we leave two empty trials (one per odour), which we call the resting trials. The forgetting phases last for another 10 trials (five trials per odour; Figure 15E–G). During the extinction phase, no shock is delivered while we continue alternating the odours (see Figure 15E); during the unpaired phase, shock is delivered unpaired from odour A (see Figure 15F); while at the reversal phase shock is paired with odour A (Figure 15G).

Description of the simulation process from our experiments.

A single trial is composed of three in-trial time-steps and each time-step by four repeats. Odour is provided only during the second and third in-trial time-steps, while shock delivery is optional. (A) In an extinction trial, only odour (CS) is delivered. (B) During an unpaired trial, shock is delivered during in-trail time-step 1. (C) During a paired trial, shock is delivered along with odour delivery and during in-trial time-step 3. (D) The acquisition phase has five odour A-only trials and five paired odour B trials alternating. (E) The extinction phase has five odour A-only trials and five odour B-only trials alternating. (F) The unpaired phase has five odour A unpaired trials and five odour B-only trials alternating. (G) The reversal phase has five odour A paired trials and five odour B-only trials alternating. The colours used in this figure match the ones in Figure 4.

The classic unpaired conditioning paradigm
Request a detailed protocol

In this case, during the acquisition phase we deliver only electric shock in odd trials (omission of odour B), followed by an extinction phase as described above.

Modelling the behaviour

Request a detailed protocol

The experiments last for 100s each, and they are split into three phases as shown in Figure 11A. In pre-training, the flies are placed in the centre of arena and explore freely for 20s. In training, either shock or sugar is associated with the region 30cm around odour A, odour B, or around both sources for 30s. In post-training, we remove the reinforcement and let the flies express their learnt behaviour for another 50s, creating an extinction forgetting condition. Figure 11B shows the normalised cumulative time spent experiencing each odour and the odour preference of the flies during the different phases for each of the six training conditions, and for 10 repeats of the experiment, when their behaviour is controlled by a combination of the attractive and repulsive forces on the two odours. The actual paths of the flies for all the 10 repeats are illustrated in Figure 11—figure supplement 4.

In practice, in order to create the experiences of nfly=100 flies, we have created another routine that embeds the simulation of their motion and environment. We represent the position of each fly, a(t), and the sources of the odours in the arena, μA and μB for odours A and B, respectively, in the 2D space as complex numbers in the form x+iy. Therefore, the flies are initialised in a(t=0)=0 and the sources of the odours are placed in μA=-0.6 and μB=0.6. The standard deviation of the odour distributions is σA=σB=0.3.

We get the odour intensity in each time-step by using the Gaussian density functions of the two odours and the position of the fly in the arena

(21) p(t)={pAB,ifN(a(t)|μA,σA)>θCSandN(a(t)|μB,σB)>θCS,pA,ifN(a(t)|μA,σA)>θCS,pB,ifN(a(t)|μB,σB)>θCS,p,otherwise

where pA, pB, pAB, and p are the identities of odours A, B, ‘A and B’, and none of them, respectively, in the PNs as described in the ‘Parameters of the model’, and θCS=0.2 is the detection threshold for the odour. Note that PN responses depend only on the fact that an odour has been detected or not and it is not proportional to the detected intensity. The reinforcement is applied to the simulated fly when the position of the agent is inside a predefined area around the odour, that is, ||μCSa(t)||<ρUS, where ρUS=0.3 is the radius of the reinforced area. Note that the radius of the area where the odour is detectable is roughly ρCS0.58, which is larger than the reinforced area. Then we run a forward propagation using the above inputs.

From the updated responses of the MBONs, we calculate the attraction force, v(t), for the mixture of odours which modulates the velocity of the fly. This force is calculated by taking the difference between the responses of the MBONs that drive the behaviour:

(22) vat(CS|t)=13[sat(t)+rat(t)+mat(t)]μCSa(t)||μCSa(t)||whereCS{odour A, odour B}
(23) vav(CS|t)=13[sav(t)+rav(t)+mav(t)]μCSa(t)||μCSa(t)||whereCS{odour A, odour B}
(24) v(t)=CS{A,B}PCS(t)vat(CS|t)CS{A,B}PCS(t)vav(CS|t)

where μCS is the position of the odour source and PCS(t) is the probability of being closer to the specific CS source calculated using the Gaussian distribution function and the Bayesian theorem. For example, given that the prior probability of being closer to odours A and B is equal at any time, that is, P(A)=P(B)=0.5, the probability of being closer to odour A is given by

(25) PA(t)=N(a(t)|μA,σA)N(a(t)|μA,σA)+N(a(t)|μB,σB).

The velocity of the simulated fly is updated as follows

(26) v(t)=v(t1)+v(t)+εx+iεywhereεx,εyN(μ=0,σ=0.1)
(27) v^(t)=0.05v(t)||v(t)||.

We normalise the velocity so that we keep the direction but replace the step size with 0.05m/s. The noise added to the velocity is introduced in order to enable the flies to move in two dimensions and not just between the two odour sources. Also, when the attraction force is v(t)=0, then the noise along with the previous velocity is the one that drives the flies.

We repeat the above process for T=100 time-steps with 1Hz (one time-step per second), and we provide shock or sugar (when appropriate) between time-steps 20 and 50, otherwise we use a zero-vector as US input to the DANs.

Calculating the normalised cumulative exposure and the preference Index

Request a detailed protocol

In Figure 11B, for each phase (i.e., pre-training, training, and post-training), we report the normalised cumulative exposure of the flies in each odour and their PI between them. The normalised cumulative exposure is calculated by

(28) CCS, phaseR=i=1RtCS, phaseiTphase

where R is the repeat of the experiment, i is the iterative repeat, Tphase is the number of time-steps for the specific phase, and tCS,phasei is the number of time-steps spent exposed in the specific CS {A,B}, phase, and repeat.

The preferences index for every repeat is calculated using the above quantities

(29) PIphaseR(t)=CA, phaseR-CB, phaseRCA, phaseR+CB, phaseR.

The reward prediction error plasticity rule

Request a detailed protocol

In Figure 9—figure supplement 2, Figure 9—figure supplement 3, Figure 11—figure supplement 5, and Figure 11—figure supplement 6, we present the responses and synaptic weights of the IC neurons, and the behaviour of the simulated flies using the RPE plasticity rule. This was done by replacing our plasticity rule in Equation 19 with the one below:

(30) τdWk2mdt=k(t)[δ(t)m(t)+wrest].

Derivation of the dopaminergic plasticity rule

Handler et al., 2019 suggest that ER-Ca2+ and cAMP play a decisive role in the dynamics of forward and backward conditioning. More specifically, they suggest that the KC→MBON synaptic change, ΔWk2mij, is proportional to the combined ER-Ca2+ and cAMP levels, which can be written formally as

(31) ΔWk2mij(t)(ER-Ca2+)ij(t)(cAMP)ij(t).

We assume that ER-Ca2+ and cAMP levels are determined by information available in the local area of the target KC axon (presynaptic terminal): the dopamine (DA) level emitted by the DANs to the KC synapses of the respective (jth) MBON, Dj(t)0; the activity of the (ith) presynaptic KC, ki(t)0; the respective KC→MBON synaptic weight; Wk2mij(t)0 (assumed always positive, exciting the MBON), and the resting synaptic weights, wrest, which we assume are a constant parameter of the synapse. Tuning the above quantities in order to reproduce the ER-Ca2+ and cAMP levels, we postulate a mathematical formulation of the latter as a function of the available information

(32) (ER-Ca2+)ijDj(t)[ki(t)wrest][Dj(t)Dj(t)]Wk2mij(t)
(33) (cAMP)ijDj(t)[ki(t)-wrest]

where Dj(t) and Dj(t) are the depression and potentiation components of the DA, respectively (assumed to correspond to DopR1 and DopR2 receptors [Handler et al., 2019] or potentially to involve co-transmitters released by the DAN such as nitric oxide [Aso et al., 2019]). We assume two types of DAN terminals: the depressing and potentiating terminals. In depressing terminals (arrow down), Dj(t) makes a higher peak in its activity followed by a faster diffusion than Dj(t), which seems to be the key for the backward conditioning. The opposite happens in potentiating DAN terminals. Figure 16 shows the ER-Ca2+ and cAMP levels during forward and backward conditioning for a depressing DAN [see Figure 16—figure supplement 1 for the responses of all the terms used including Dj(t) and Dj(t)], which are comparable to the data shown in Handler et al., 2019 (also Figure 16, shown in grey). Note that here we are more interested in the overall effects of learning shown in Figure 16A rather than the detailed responses of Figure 16B.

Figure 16 with 2 supplements see all
The effect of the ER-Ca2+ and cAMP based on the order of the conditional stimuli (CS) and unconditional stimuli (US).

(A) Normalised mean change of the synaptic weight plotted as a function of the Δs (US start – CS start), similar to Handler et al., 2019, Figure 5F (blue line). For ease of comparison, the predicted mean values are drawn on the top of the data (mean ± SEM) from the original paper (Handler et al., 2019); grey lines and error bars. (B) Detailed ER-Ca2+ and cAMP responses reproduced for the different Δs, and their result synaptic weight change. Black arrowhead marks the time of the CS (duration 0.5s); red arrowhead marks the time of the US (duration 0.6s), similar to Handler et al., 2019, Figure 5D. For ease of comparison, the predicted responses are drawn on the top of the data from the original paper (Handler et al., 2019); grey lines.

By replacing Equation 32 and Equation 33 in Equation 31, we can rewrite the update rule as a function of known quantities, forming our DPR of Equation 1, which we rewrite for convenience

ΔWk2mij(t)=δi(t)[kj(t)+Wk2mij(t)wrest]whereδj(t)=Dj(t)Dj(t)

The dopaminergic factor, δj(t), is the difference between the Dj(t) and Dj(t) levels, and it can be positive [Dj(t)>Dj(t)] or negative [Dj(t)<Dj(t)]. Combined with the state of the KC activity results in the four different weight modulation effects: depression, potentiation, recovery, and saturation.

In Figure 16B (where we assume a depressing DAN terminal), all four effects occur in four out of the six cases, creating complicated dynamics that allow forward and backward learning. Similarly, a potentiating terminal might trigger all the effects in a row but in different order and duration. Note that in the simulations run for the results of this paper, we simplify the dopaminergic factor to have a net positive or negative value for the time-step in which it influences the synaptic weight change as the time-steps used are long enough (e.g., around 5s; see ‘Implementation of the incentive circuit’ section), and we assume less complicated interchange among the effects.

In Figure 16A, we report the normalised mean change of the synaptic weight calculated using the computed ER-Ca2+ and cAMP levels and the following formula:

(34) ΔWk2mij1Tt=0T-1-(ER-Ca2+)ij(t)-(cAMP)ij(t).

Decomposing the dopaminergic factor

Request a detailed protocol

In Equation 18, the dopaminergic factor, δ(t), is derived from the matrix Equation 12 which captures in abstracted and time-independent form the effects of dopamine release. To model these more explicitly, as described in the ‘Derivation of the dopaminergic plasticity rule’, the dopaminergic factor can be decomposed as δ(t)=D(t)-D(t) where each component has a time-dependent form given by the differential equations

(35) dDdt=1τlongdT(t)Wd2km+1τshortdT(t)Wd2km(21τshort1τlong)D
(36) dDdt=1τshortdT(t)Wd2km+1τlongdT(t)Wd2km(21τshort1τlong)D

where Wd2km+ and Wd2km represent the positive-only (potentiation/saturation) and negative-only (depression/recovery) dopaminergic effects; d(t) is a vector of the responses of all DANs as a function of time, t; D=D(t) nd D=D(t) are the depression and potentiation components of the DA as functions of time, t; and τshort and τlong are the exponential decay time-constants that define the short (main) and long (secondary) durations of the dopamine effect. The longer the time constant, the slower the diffusion but also the lower the peak of the effect. Note that the two time constants must satisfy the constraint 0<1τshort+1τlong2 in order for the above differential equations to work properly.

In Figure 16, where we are interested in more detailed dynamics of the plasticity rule, and the sampling frequency is high, that is, 100Hz, we use τshort=60 and τlong=104, which we choose after a parameter exploration available in Figure 16—figure supplement 2. This essentially means that D(t) and D(t) are expressed as time-varying functions following DAN spike activity. Note that for the specific (susceptible) type of MBON examined there, the DAN causes depression of the synapse, so there is no positive dopaminergic effect, that is, Wk2dm+=0. By setting Wk2dm+=0 in Equation 35 and Equation 36, we have the fast update with the high peak for D(t) (0.5s for a full update) and a slower update with lower peak for D(t) (1s for a full update), as described in the ‘Derivation of the dopaminergic plasticity rule’ section.

For the experiments in Figures 4 and 11, we use τshort=1 and τlong=+, which removes the dynamics induced by the relation between D(t) and D(t), and Equation 18 emerges from:

(37) δ(t)=D(t)D(t)=dT(t)Wd2km++dT(t)Wd2km,forτshort=1andτlong=+=dTWd2km

This essentially means that each update represents a time-step that is longer than the effective period of backward conditioning for the responses of the ‘Microcircuits of the mushroom body’ and ‘Modelling the behaviour’ sections (where sampling frequency is low, i.e., ≤0.5Hz and 1Hz, respectively), and therefore, we use the same time constants that result in the simplified Equation 18.

Data collection

Request a detailed protocol

In order to verify the plausibility of the IC, we recorded the neural activity in genetically targeted neurons during aversive olfactory conditioning which is described in more detail in McCurdy et al., 2021. We simultaneously expressed the green GCaMP6f Ca2+ indicator and red Ca2+-insensitive tdTomato in neurons of interest to visualise the Ca2+ changes which reflect the neural activity. We collected data from 357 five-to-eight-day-old female flies (2–14 per neuron; eight flies on average) and for 43 neurons, which can be found in Figure 4—source data 1 (also illustrated in Figure 4—figure supplement 1).

Each fly was head-fixed for simultaneous delivery of odours and electric shock while recording the neural activity. Their proboscis was also glued, while their body and legs were free to move (see Figure 4A). The flies were allowed to recover from the gluing process for 15min before placing them under the microscope. We used green (555nm) and blue (470nm) lights to record GCaMP and Tomato signals. We also used 0.1% 3-octanol (OCT) and 0.1% 4-methylcyclohexanol (MCH) for odours A and B, respectively, and the flow rate was kept constant at 500 mL/min for each odour. The flies were allowed to acclimate to the airflow for at least 1 min before starting of the experiment.

During the experiments, we alternate trials where 5s of each odour is presented 5s after the (green or red) light is on. We start with two pre-training trials (one per odour) followed by five acquisition trials per odour. During acquisition, flies receive alternating 5s pulses of OCT (odour A) and MCH (odour B) paired with electric shock, repeated for five trials. During reversal, OCT is presented with shock and MCH without, repeated for two trials. On trials where electric shock was delivered, it was presented 4s after odour onset for 100ms at 120V.

Calculating off- and on-shock values

Request a detailed protocol

From the data collection process described above, we get trials of 100 time-steps and at 5Hz (20s each). Odour is delivered between time-steps 25 and 50 (between 5s and 10s), and shock is delivered during time-step 45 (at 9s). In this work, we report two values for each trial: the off-shock and on-shock values, which represent the average response to the odour before and during the period in which shock delivery could have occurred (even if shock is not delivered).

For the off-shock value, from each datastream of activity from the target neuron, we collect the values from time-steps between 28 (5.6s) and 42 (8.4s). This gives us a matrix of nfly×15 values, whose average and standard deviation are the reported off-shock values. Similarly, for the on-shock values, we collect the values in time-steps between 44 (8.6s) and 48 (9.6s), which gives a matrix of nfly×5 values, whose average and standard deviation are the on-shock values. We define ‘on-shock’ as the time window from 8.6s to 9.6s, where shock onset occurs at t=9s.

Appendix 1

The incentive wheel

We have shown that the incentive circuit is able to explain classical conditioning experiments that have been done with adult fruit flies, and its neurons can replicate the responses of the mushroom bodies in the flies’ brain. We have also seen that three types of memories are stored in this model (i.e., susceptible, restrained, and long-term) for each of the two represented motivations of the animal (i.e., attraction or avoidance) guided by reinforcements (i.e., reward or punishment). Although this model is sufficient to explain the behaviour of the animals in the laboratory, where the animal is exposed to controlled portions of chemicals and the results are translated into a simple attraction to or avoidance from a source, in the wild there are more than two motivations that modulate the behaviour of the animal either synergistically or opponently.

Real-life experiences are complicated and rich in information. This could produce a whole spectrum of reinforcements and motivations that guide the behaviour of animals. Data show that animals respond differently in different reinforcements, which cannot be represented just by the magnitude of a single variable, for example, more/less rewarding/punishing. For example, different concentrations of salt (Zhang et al., 2013) or sugar (Colomb et al., 2009) might be combined with the sated state of the animal, activate different subsets of DANs, and trigger different behaviours, such as feeding or escaping. When the male fruit fly is exposed to female pheromones, courtship behaviour is triggered through P1 neurons (Kallman et al., 2015; Sten et al., 2020), which can be translated to attraction, but has nothing to do with the appetite of the animal. On the other hand, other male pheromones trigger avoidance which suggests a similarity to the IC could explain this behaviour. The MB has been proved to contribute to many behaviours other than olfactory classical conditioning, including visual navigation, and its output neurons encode richer information that is very close to humans’ decision-making (Heisenberg, 2003).

It is reasonable to think that the 34 output and 130 dopaminergic neurons interacting with the MBs in the brain of fruit flies are not all used in order to discriminate odours and assign a positive or negative reinforcement to them driving attraction and avoidance. For this reason, we believe that the different MBONs do not represent different odours like it has been proposed before (Huerta et al., 2004) neither are split into two groups (e.g., attraction or avoidance; Schwaerzel et al., 2003; Schroll et al., 2006; Waddell, 2010), but rather represent different motivations of the animal that altogether guide its overall behaviour (Heisenberg, 2003; Krashes et al., 2009). These motivations are associated to different contexts, which are represented by the responses of the KCs as it has been proposed by Cohn et al., 2015, who showed that the same output neurons in the γ compartment respond differently when a different context is given. This context enables or disables different microcircuits of the MB — similar to the ones described in the ‘Microcircuits of the mushroom body’ section, which result in the activation of a subset of MBONs that represent different motivations, while the overlapping microcircuits result in what we sometimes call “noisy” or “insignificant” changes in the behaviour.

Appendix 1—figure 1 illustrates such a model, which we call the ‘incentive wheel’ (IW). In this model, we use four identical incentive memories (C0/4, C1/5, C2/6, and C3/7), where the reciprocal STMs microcircuit of the one is the reciprocal LTMs microcircuit of another. As the structure of the RLM microcircuit is identical to the RSM one, we assume that the RLM of circuit C0/4 is the RSM of circuit C1/5, and the RLM of circuit C1/5 is the RSM of circuit C2/6, and so on. This way, we weave the different circuits into an incentive wheel with opposing motivations. The reinforcements that trigger the DANs in this model are drawn from a spectrum, and the output of the MBONs of the model triggers different motivations. The LTMs and restrained memories can both exist in the same neurons of the core of the model, representing different motivations in different contexts. This might cause changes in the behaviour of the circuits that are irrelevant to the associated reinforcement, but relevant to a neighbouring reinforcement of the spectrum.

Appendix 1—figure 1
The ‘incentives wheel’ model.

This model supports that reinforcement is not binary but it draws its values from a spectrum. Different types of reinforcements trigger different dopaminergic neurons (DANs) that enable learning in different parts of the mushroom body. Colours show the variety of motivations that the model can encode associated with humans’ ‘wheel of emotions’ (Plutchik, 2001); e.g., light green: trust; green: fear; light blue: surprise; blue: sadness; pink: disgust; red: anger; orange: anticipation; yellow: joy. Neurons of more than one colours are part of multiple circuits that contribute to different motivations.

The incentive wheel is an example of how the IC can be a part of a bigger circuit that can provide a variety of motivations to the animal. An extension of it could have susceptible MBONs, si, connecting to other susceptible MBONs from a parallel IW model with higher-order motivations. In Appendix 1—figure 1, we have associated the different motivations with the primary human emotions from the ‘wheel of emotions’ (Plutchik, 2001). Higher-order motivations could exist by combining primary motivations as if they were emotions and result in more complicated behaviours for the animal.

Appendix 2

Appendix 2—table 1
Additional information on the mushroom body neurons used for the incentive circuit, sorted by their short name.

Information about the neurons has been collected from McCurdy et al., 2021 and Aso et al., 2014a.

DriverClusterNeuron name# cellsShort nameAlternative names
MB011BM4/M6MBON-γ5β’2a1MBON-01MB-M6
MB399BM4/M6MBON-β2β′2a1MBON-02
MB298BMV2MBON-γ4>γ1γ21MBON-05
MB112CMBON-γ1pedc>α/β1MBON-11MB-MVP2
MB077BV3/V4MBON-γ2α’12MBON-12
MB050BV2MBON-α’13MBON-15
MB109BPAMPAM-β’2a6–9PAM-02
MB301BPAMPAM-β2β′2a> 3PAM-04Subset of MB-M8
MB312BPAMPAM-γ4<γ1γ213–17PAM-07Subset of MB-AIM?
MB320CPPL1PPL1-γ1ped1PPL1-01MB-MP1, MP
MB296B1PPL1PPL1-γ2α′111PPL1-03MB-MV1
MB296B2PPL1PPL1-γ2α′121PPL1-03MB-MV1

Data availability

All data generated or analysed during this study are included in the manuscript and supporting files. Figure 3—source data 1 contains the numerical data used to generate Figure 3—figure supplement 1, Figure 4—source data 1 contains the numerical data used to generate Figure 4—figure supplement 1 and parts of Figures 5–8. The scripts for producing the data and generating Figures 5 (C, D & E), 6 (C, D & E), 7 (D & E), 8 (C & D), 11B, 12, 16 and all figure supplements are located at https://github.com/InsectRobotics/IncentiveCircuit (copy archived at swh:1:rev:98a8f85745a1426e8e5b787ceedd3f680a2b66c6). Figures 5A, 6A, 7A, 7C, 8A, 9A, 9B and Figure 5—figure supplement 1A were generated using the Fly Brain software. All figures were edited using the Inkscape software.

References

  1. Conference
    1. Balkenius A
    2. Kelber A
    3. Balkenius C
    (2006)
    From Animals to Animats 9, 9th International Conference on Simulation of Adaptive Behavior, SAB 2006
    Proceedings. Lecture Notes in Computer Science. pp. 422–433.
  2. Book
    1. Kamin LJ
    (1967)
    Predictability, surprise, attention and conditioning
    In: Campbell BA, Church RM, editors. Punishment Aversive Behavior. Appleton- Century-Crofts. pp. 279–296.
  3. Book
    1. Rescorla RA
    2. Wagner AR
    (1972)
    A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement
    In: Black AH, editors. In Classical Conditioning II: Current Research and Theory. New York: Appleton-Century-Crofts. pp. 1–497.
    1. Schwaerzel M
    2. Monastirioti M
    3. Scholz H
    4. Friggi-Grelin F
    5. Birman S
    6. Heisenberg M
    (2003)
    Dopamine and Octopamine Differentiate between Aversive and Appetitive Olfactory Memories in Drosophila
    The Journal of Neuroscience 23:10495–10502.

Decision letter

  1. Upinder Singh Bhalla
    Reviewing Editor; Tata Institute of Fundamental Research, India
  2. Ronald L Calabrese
    Senior Editor; Emory University, United States
  3. Mani Ramaswami
    Reviewer; Trinity College Dublin, Ireland

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

Thank you for submitting the paper "The incentive circuit: memory dynamics in the mushroom body of Drosophila melanogaster" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Mani Ramaswami (Reviewer #3).

We are sorry to say that, after consultation with the reviewers, we have decided that this work will not be considered further for publication by eLife.

While all the reviewers appreciate the ambition and value in integrating diverse sources of data to developing a model of learning, they had some substantial concerns. These are elaborated in their detailed comments, and I provide a distillation of the discussion that the reviewers and I had about the paper. Since it will take considerable further work to address these points, the reviewers and I felt that the paper should be rejected. If the authors wish to resubmit after completely addressing the concerns this would be fine.

1. The reviewers found the paper a difficult read. Could the authors rewrite to make it accessible to a wide range of readership?

2. The formulation of the DLR seems to be a variant of RPE (Reward Prediction Error) learning rules, and hence the conclusions need to be re-evaluated.

Can the authors re-think the basic formulation of DLR starting with Equations (2) and (3)? There should be some experimental tests if the DLR is indeed determined to be different from regular RPE.

3. The microcircuits should be better based on experimental data. From our understanding, the data shown in Figures 4H/G, 5B/C/E/F and 6B/C seems to have been obtained by simulations. Would Ca recordings for these figures be feasible? Can there be stronger justification for the connectivity of the proposed incentive circuit?

3b. The proposed circuit connectivity of the 'incentive circuit' needs to be defined for each MBON because most contemporary work shows that different kinds of memory involved plasticity in different subsets of MBONs. Can the model make specific testable predictions for each subset of MBON?

4. Further experimental predictions should be made, based on well-parameterized models of the underlying neurons. Can the authors provide considerably more clarity on which sets of behavioral or physiological data are selected by the authors as targets or tests for specific parts of their model?

5. Can the model account for existing data showing overlapping conflicting engrams? Additional experiments and simulations may be needed to ascertain this.

Reviewer #1 (Recommendations for the authors):

This ambitious study builds a model of a proposed key circuit motif in fly behaviour and learning, the Incentive Circuit. The authors examine its implications for a variety of behaviours, and perform a thorough circuit-level mapping of model neuronal activity to recordings. The model uses abstracted model neurons and synaptic signaling, but with careful attention to experimental data at many steps. The mapping to experiments is good, and the model makes far-reaching predictions for animal behaviour.

The development of the model is generally well presented. The learning rule is derived from earlier work (Handler et al) and then the authors transform the terms for ER-ca2+ and for cAMP to terms emerging from DA inputs. The model development is especially systematic, building up to the final version step by step with reference to experiments. Importantly, these are mapped to specific sets of experimental observations on the circut level.

I have mostly comments to clarify or strengthen the presentation.

1. I had a little trouble to envision the two components of D2 and D1. Are they time-varying? Seems to be, see Equation 4, where they are presented as D1(t) and D2(t). In other words, do they express D2 and D1 as distinct α functions following spike activity in the DAN?

However, in the text and figures it is frequently presented in terms such as D2 > D1 (eg., Figure 3), which looks like a static effect. This was confusing.

Also in Figure 2A, are we seeing the peak values of ER-Ca or area under curve?

Around line 128 it is a hint that it is area under curve, but I am not sure.

2. I would have liked to have seen some more mapping to functional experiments in the figures up to Figure 7, where the components of the model are being built up. The authors mention several in the text. Even a qualitative look at the experimental responses would help to strengthen the motivation of the model design.

3. The authors then utilize this circuit in an aversive olfactory conditioning paradigm, for which they provide experimental data corresponding to the various neuron types. They then simulate this. This is an outstanding way to validate/test their model. It would be helpful to have the experimental and simulated responses interleaved on the same figure so as to better compare.

4. I appreciate that it is quite challenging for a simulation to simultaneously replicate properties of several intermediate stages of circuit activity, even more so when the stimulus is not one that the model has been trained on. Could the authors confirm that this is indeed the case, i.e. that the model outcome for figure 9 was obtained only from the parameter tuning earlier in the paper up to Figure 7?

5. It would be useful to perform a statistical evaluation of the fidelity of the model as compared to experiment.

6. The authors then place their model flies in a virtual arena and explore a number of behaviours. Here they contrast their model behaviour with the predictions from a different learning, reward prediction error. I would have liked to have seen in figure 11 an illustration of the correspondence to experimental observations from the literature.

Reviewer #2 (Recommendations for the authors):

The manuscript in its current form is built around two main threads. In the first thread, the authors review several results in the literature on associative learning in the mushroom body of the adult fruit fly, and construct an Incentive Circuit (IC) consisting of 6 dopaminergic and 6 mushroom body neurons with specific memory dynamics. They provide a coherent functional view of some of the disparate recent results in associative learning of the mushroom body.

The second thread incorporates a Dopaminergic Learning Rule (DLR) into the IC computational model, providing a computational system for evaluating the learning mechanisms involved.

A weakness here is that the acquisition, forgetting and assimilation of memories qualitatively described in the first thread are not strongly linked with the quantitative IC model described in the second thread.

Conversely, the validation of the IC model circuit, given the noisy data that the authors provide, is only possible in terms of trends, i.e., simple visual inspection. Interpreting the data then is difficult as it does not provide enough constraints for the computational model.

Given the limitations inherent in the validation of the IC from their recorded data, the authors proceed to explore the DLR using behavioral experiments purely based on simulations. This is an effective methodology widely employed in, e.g., robotics. The authors extensively compare the 'learning/navigation' performance of DLR with a variant of reward prediction error (RPE) learning rule and demonstrate a better learning performance. While the comparison may be compelling, we found that underlying the DLR, is the computation of a prediction error, i.e., DLR is a variant of RPE. This calls for a re-evaluation, positioning and clarification of some of the key conclusions regarding why the DLR is effective in associative learning tasks.

l.128 The section 'Mushroom Body Microcircuits' makes good first reading. However, most of the key statements could further benefit from more extensive quantitative backing as hinted at in Figures 4, 5 and 6 (see also my comment below). Since these microcircuits are simpler than the IC, my expectation is that they could provide better intuition regarding their function.

Figures 4F and 4G are rather difficult to understand/parse. More caption details, choice of different colors, would help.

Same comment regarding Figures 5B, 5C, 5E and 5F, and 6B, 6C.

While Figure 8 is to be commended, the data is rather noisy and, in my view, despite the best intentions, rather difficult to understand/evaluate. As the authors argue in l.312, 'we computationally modelled the incentive circuit in order to demonstrate all the properties we described before and compare the reconstructed activities to the ones of Figure 8C'. However, a comparison by simple visual inspection is rather unconvincing. The need for introducing a distance measure is in order.

I found 'modeling behavior', as presented in the current version of the manuscript, to be quite effective. However, I'd like to note that in the process, the authors changed the underlying PN activity model. This requires, given that the rest of the paper is based on a binary odor model of the PN activity (see the discussion preceding Equation (6)), some careful/detailed assessment of its implications. Finally, the authors propose to compare their DLR with a variant of RPE. Here a major conceptual problem arises.

The authors argue that DLR is a fundamentally different learning rule from RPE. They state in l.462 that 'The idea behind RPE is that the fly learns to predict how rewarding or punishing a stimulus is by altering its prediction when this does not match the actual reward or punishment experience'.

This can be adapted to the mushroom body circuit by assuming that the MBON output provides a prediction of DAN activity. But this is exactly what Equation (18) states. The differential equation (18) describing the gradient of the DAN activity is equal to sum of the weighted shock delivery ('transform' in l.750) and the weighted MBON activity (l.755).

The sum is just the prediction error between the two terms. Consequently, since the DLR is, in view of this reviewer, a variant of RPE, a comparison with another RPE is of little interest. A substantial re-write of the paper starting with the section on the Incentive Circuit (l. 257) is in order.

l.765: "The above matrices summarise the excitatory (positive) and inhibitory (negative) connections between MBONs and DANs or other MBONs. The magnitude of the weights was hand-tuned in order to get the desired result." This 'hand-tuning" appears, to me, to be a 'construction' of the prediction error on the right hand side of Equation (18). Some details might help clarify to what extent the hand-tuning is based on the assumptions of the binary model of the 2 odors at the PN level. I presume that the generality of the model alluded to in l.743 stating that 'that the number of neurons we are using for PNs and KCs is not very important and we could use any combination of PN and KC populations' breaks down and the hand-tuning needs to be repeated every time the number of neurons is changed.

Reviewer #3 (Recommendations for the authors):

The authors propose an original dopaminergic learning rule, which, when implemented in simple neural circuit motifs shown to exist within the Drosophila mushroom body (MB) , can potentially account for a very large number of independent, poorly integrated physiological and behavioural phenomena associated with the mushroom body. It considers multiple behavioural roles of MB output neurons beyond attraction and aversion and offers new insight to the how the MB functions in acquisition, consolidation and forgetting of memories. The manuscript further attempts to show how similar principles could potentially be useful in the mammalian brain. An ambitious and integrative analysis of this sort is sorely needed in the field.

The paper has obviously involved very broad and deep consideration of the MB connectome as well as genetic, physiological and behavioural studies of the roles of the different classes of Kenyon cells, MBONs and DANs that innervate the mushroom body. It is original and ambitious and potentially very valuable to the field.

My major reservation is that the manuscript is very difficult to read and evaluate by anyone who is not a Drosophila mushroom body aficionado. I consider myself an interested reader and one who keeps broad track of the field, but found the need to read and evaluate far too many papers cited by the authors to decide how well phenomena the authors attempt to model have been demonstrated and how well assumptions made by the authors are justified by data. E.g. I was stymied even at figure 1, where mutual inhibition between MBONs is indicated and it took me considerable (and eventually futile) effort to look into where and how well this has been established.

To make the work more accessible at least to this moderately educated reviewer, I fear that a major re-rewrite will be required. I would suggest that for each section – exactly has been shown be clearly enumerated, with enough detail provided for the reader to judge the strength of these data. The justification and support for three types of MBONs and their incentive should also be particularly clearly indicated.

Moreover, while the authors are correct to point out the limitations of current models based on dopamine prediction-error, I do wonder if there is room for prediction error to also contribute meaningfully within the framework proposed in this paper.

I apologise for the not having a list of specific issues for the authors to address, because I found the basis to be so difficult to explore but here is some general feedback.

1. It is nice that and the dynamics of neural responses obtained with the model correspond closely with ones reported in previous studies (although there are exceptions, some nicely highlighted by the authors).

2. There should be deeper engagement with signalling mechanisms that differentiate the two types of dopamine receptors. I found the assumptions regarding their differences to be useful for the modelling of different effects of reinforcement before or after sensory experience (Ruta Cell 2019), but quite superficial in terms of providing hypothesis for how the receptors may differ in terms of mechanism of action.

3. ON the same note, specific experimental predictions of the model could also be clearly indicated at the end of each section.

4. While the authors admittedly designed informative and clear figures, and their Table 1 points the reader to papers that report relevant neural connections and neuronal functions, this is not enough. Data in support of each assumption should be clearly and specifically mentioned and hypotheses connections also clearly stated. After considerable effort, I still could find no evidence for lthe existence of inhibitory connections between MBONγ4 and MBONγ2 (which is not to say that none exist – but surely it is the authors job to clarify this).

5. The authors should also try to account for the discovery of parallel, independent memory traces (like appetitive LTM formation towards the CS- in classic LTM aversive training paradigms).

6. Does the dopaminergic learning rule explain the differences in dynamics and memory strength between appetitive and aversive memories? These two types of memory involved different molecular components and display different learning rules (stronger short-term aversive memories and longer-lasting appetitive memories requiring less training)? This should perhaps be clarified, particularly since KC output appears dispensable for aversive learning (aquisition) but potentially necessary for the acquisition of appetitive memories (Pribbenow et al., 2021).

7. I found the easy assumption that forgetting involves erasure to be troubling. Perhaps this happens sometimes. But many apparently "forgotten" memories are never erased, simply not reactivated for multiple reasons. Intellectually this point needs to be acknowledged.

[Editors’ note: what follows is the authors’ response to the second round of review.]

Thank you for resubmitting your work entitled "The incentive circuit: memory dynamics in the mushroom body of Drosophila melanogaster" for further consideration by eLife. Your revised article has been evaluated by Ronald Calabrese (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

1. Could the authors compare their simulated/predicted behavior with some quantitative or semi-quantitative measures of experimental behavior?

2. Can the authors elaborate on their mapping of ER-CA and cAMP in the model with the cited data? This relates to point 4 from Reviewer 1.

3. Can the authors do some parameter sensitivity analysis as suggested by the reviewers?

In addition, the reviewers had a few points for the authors to expand upon in the revision, and a number of useful suggestions to improve clarity.

Reviewer #1 (Recommendations for the authors):

This is an ambitious but also highly complicated odelling study that seeks to account for a wide range of fly learning behaviour in terms of underlying learning rules and circuitry.

The strengths of the study are its ambition, detail and substantial attention to experimental inputs. In principle it builds up a large and testable conceptual framework for understanding many aspects of learning. Its weaknesses, which are readily fixed, are 1. That the study misses opportunities to better compare model to experiments. And 2, that the study doesn’t do a systematic parameter and model exploration to see how robust are the properties.

With these additions the study would be strong and of value to the field in laying out a template for further investigation. The authors posit that this framework could also apply to other organisms.

1. This is an ambitious but also highly complicated odelling study that seeks to account for a wide range of fly learning behaviour in terms of underlying learning rules and circuitry. The authors have made substantial improvements to the clarity of the presentation, particularly with regards to comparison of experimental and simulated data.

I would have liked to see similar comparison for two more features: the behaviour, and the crucial learning rule section, as I comment below. I note that a similar request was made in an earlier review.

2. The other big thing I would have liked to see is an exploration of parameter sensitivity. This is needed both because of model complexity and because of the not-perfect match between model and input data. No model is perfect, but the confidence in a model is much improved if one can see that it still ‘works’ even when the numbers (and other assumptions) shift around a bit.

3. Behaviour: The authors have made the interesting and potentially powerful step of linking their model to measurable behaviour. But they miss the opportunity to put the outcomes (experiment and model) side by side. Even a semi-quantitative distillation to some common metric for displaying and comparing the experimental and model properties would have been valuable.

4. Figure 16: Details of ER-CA and cAMP in the model don’t match data. The form of the pairing for ER Ca is inconsistent with the data of Handler et al., particularly when CS precedes US by a large interval. Handler et al. show no response for forward pairing even several seconds after the last stimulus. Also, the time-course of ER response for the backward pairing case is inconsistent. In the Handler data (Figure 6) the ER signal remains low (i.e, very different from baseline) well past 5 seconds, whereas in Figure 16 the signal returns to baseline within 5s. I am also concerned that there doesn't seem to be experimental support for the reduced cAMP signal at very small overlap intervals. Indeed, the Handler data suggests that there is a large signal at the 0.5s and -1.2s points. Figure 16 shows that the model assumes a low and brief signal at -1.2s.

I would have appreciated having the experimental data from Handler and others illustrated here in the same figure, just to see how well the model forms behave. It would save the reader the step of going to look up another paper and tracking down appropriate figure panels.

5. As one example of a useful parameter sensitivity analysis: The form of the deltaWij seems rather crucial to the model, so I'm homing in on this. It is a difference of two values which are themselves clearly the difference of opposing signals. It would therefore be valuable to show that relaxation of these tight timing requirements does not upset the learning rule and subsequent behaviour.

It would be useful to see similar sensitivity analyses for other key parts of the model.

Clarifications:

6. pg 28: 3 lines from bottom.

Do the authors mean "activity of the ith presynaptic KC? 'Target' sounds like it is postsynaptic.

7. Equation 30 onward.

w_rest: Is this a global parameter for all synapses?

w_rest: The way it is used in the equation looks more like a_rest, the resting activity of the synapse. Sorry to be pedantic, but the units of weight and rate don't match.

This gets further mixed in the equation between lines 853 and 854 where the authors add ki and Wij. Maybe ki is scaled somehow to weights?

8. Figure 5 and later: The responses, both experimental and model, are shown as an up-down oscillation. I assume that the up states are measurements during the training, and down is measurement half a day later. But this is hard to see from the text or legends, and I had to go down to the last section in the methods to see that this seems to be described as on-shock and off-shock values. It is confusing and should be mentioned in the figure legends and accompanying text.

Reviewer #2 (Recommendations for the authors):

The authors propose an original dopaminergic learning rule, which, when implemented in simple neural circuit motifs shown to exist within the Drosophila mushroom body (MB), can potentially account for a very large number of independent, poorly integrated physiological and behavioural phenomena associated with the mushroom body. It considers multiple behavioural roles of MB output neurons beyond attraction and aversion and offers new insight to the how the MB functions in acquisition, consolidation and forgetting of short and long-term memories. They discuss how the motifs and computations discussed would be relevant to other MB functions and altered by known connections, not yet included the simplified model. The manuscript further attempts to show how similar principles could potentially be useful in the mammalian brain. An ambitious and integrative analysis of this sort is sorely needed in the field.

I thank the authors for a very constructive, clear and insightful response to the prior criticism and queries, The manuscript is now hugely improved and can be accepted with no further changes. I think it represents a major contribution to the field. This is a wonderful piece of work that I, at any rate, would recommend to anyone interested in the mushroom body.

Reviewer #3 (Recommendations for the authors):

First, I'd like to thank the authors for responding to my concerns/suggestions. At this point, it reads, in my assessment, much better as a result of the many changes. In particular, the newer figures are of high quality and their stated goals much easier to grasp. Also, shifting most of the discussion of the "formal" model in the (old) Results section to the (new) Methods section makes reading flow more intuitively.

Second, the disagreement we had, appears now to be more in terms of naming/labelling of Equation (18) and (30), thus clarifying the rational for the naming of the 2 learning rules (DPR) and (RPE). However, the "RPE" naming for (30) is, in my view, a bit of a stretch, but I am not raising an objection. Just a friendly note to the authors.

I'd like to make a final suggestion that future readers might benefit from. Reviewer 1 raised this issue already and the authors addressed the question. However, in my view, the presentation starting with "we postulate a mathematical formulation …" just above Equation (32), seems a bit circular. While the authors answered the question, in terms of intuitive modeling (Equation (34)), the presentation thread I am referring to is rather formal. The D's in Equations (32), (33) are not explicitly defined; the equations, when added up are consistent with the Equation above line 854. While Equation (34) provides the intuition of the decomposition of the weights into 2 terms, this decomposition is by no means unique. Having said that, we are then confronted with Equations (35) and (36). There is little justification given for the rational of choosing/postulating these two diff. Equations. I presume that the solution for these Equations are the D's. A careful reading seems to suggest that these are delayed differential equations. In math terms, a single delayed diff. Equation is infinite dimensional, and essentially intractable. The following Equations (37)-(39), while consistent with the discussion above, do not help clarify the matter. Which brings one back to Equations (32), (33). Finally, the Methods section has a sizable number of matrices that have seemingly arbitrary entries.

https://doi.org/10.7554/eLife.75611.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

While all the reviewers appreciate the ambition and value in integrating diverse sources of data to developing a model of learning, they had some substantial concerns. These are elaborated in their detailed comments, and I provide a distillation of the discussion that the reviewers and I had about the paper. Since it will take considerable further work to address these points, the reviewers and I felt that the paper should be rejected. If the authors wish to resubmit after completely addressing the concerns this would be fine.

We are grateful to the editor and all three reviewers for their enthusiasm for our model, and appreciate the detailed suggestions for how to improve this manuscript. We have taken seriously these comments and have performed substantial revisions of this manuscript to address all concerns raised.

1. The reviewers found the paper a difficult read. Could the authors rewrite to make it accessible to a wide range of readership?

We recognize this as a serious issue and have thoroughly rewritten this manuscript. For example, we have moved the derivation of the dopaminergic plasticity rule to the Methods section, and instead now provide a more intuitive description of the model in the Results section. We also updated the text to concurrently name the mushroom body neurons we believe form the incentive circuit as we describe the specific components of the incentive circuit for ease of comparison. We have updated the behavioural results and methods in order to improve clarity. Several excerpts of this major rewrite are presented below in our responses to each reviewer.

To further improve the accessibility of our manuscript, we have also edited the figures to provide experimental data side-by-side with simulated data to make it easier to compare. Finally, we have changed the way that we illustrate the behaviour of the simulated flies to better show the effects of the plasticity rule and the long-term memory.

2. The formulation of the DLR seems to be a variant of RPE (Reward Prediction Error) learning rules, and hence the conclusions need to be re-evaluated.

Can the authors re-think the basic formulation of DLR starting with Equations (2) and (3)? There should be some experimental tests if the DLR is indeed determined to be different from regular RPE.

We politely disagree that the DLR (now DPR) is a variant of RPE, and have made our explanation of this more clear in the text. For example, “Instead of calculating the error between the reinforcement and its prediction, DPR uses the reinforcement as a driving force to maximise the diversity in the synaptic weights of reinforced experience while reducing it for experiences irrelevant to the reinforcement, which is functionally closer to the information maximisation theory (Bell and Sejnowski, 1995; Lee et al., 1999; Lulham et al., 2011) than the RPE principle.” - lines 103-107.

“Note that this rule [i.e., RPE] allows updates to happen only when the involved KC is active, implying synaptic plasticity even without DAN activation but not without KC activation, which is in contrast with our DPR and recent findings (Berry et al., 2018; Hige et al., 2015) [also in larva (Schleyer et al., 2018, 2020)].” - lines 400-403.

However, we do not exclude the possibility that RPE could be implemented via other mushroom body neurons or connection which are featured in our incentive circuit model.

“… by using the appropriate circuit, i.e., positive MBON-DAN feedback to depressing DANs, our DPR could also have an RPE effect. Although the proposed incentive circuit does not include such connections, it is still possible that they exist.” - lines 484-486.

Finally, we provide a list of testable predictions in Box 1 that includes an experiment to distinguish RPE from DLR, suggested by the combination of the DPR with the incentive circuit.

“By consistently activating one of the LTM MBONs while delivering a specific odour, the LTM MBON should show increased response to that odour even without the use of a reinforcement. This would verify the saturation effect of the DPR and the charging momentum hypothesis. On the other hand, if we observe reduced response rate, this would show that MBON-DAN feedback connection is inhibitory and that RPE is implemented by the circuit.” - lines 644-649.

3. The microcircuits should be better based on experimental data. From our understanding, the data shown in Figures 4H/G, 5B/C/E/F and 6B/C seems to have been obtained by simulations. Would Ca recordings for these figures be feasible? Can there be stronger justification for the connectivity of the proposed incentive circuit?

Those data shown were indeed generated by simulations that highlighted the effects of the different microcircuits combined with the plasticity rule, and Ca recordings for a subset of these experiments already exists and is reproduced in this manuscript. In order to avoid further confusion, experimental and simulated data are now plotted next to each other in the same figure, and we explicitly state in the figure legends whether each subfigure is simulated or experimental data.

We agree that it is critical that our proposed circuit is strongly grounded in experimental data. Our model was designed through close inspection of a comprehensive dataset of DAN and MBON activity during reversal learning in McCurdy et al. (2021), and an examination of the literature regarding anatomical connections between proposed neurons. Hence there is a strong functional and anatomical basis for our circuit. Additionally, our model allows us to make concrete predictions about how neurons would respond in other learning tasks, such as extinction learning and unpaired shock presentation. We found other papers [e.g., Felsenberg et al. (2018), Berry et al. (2018), Ichinoise et al. (2015)] which provide experimental data for some neurons during some of these tasks, which largely aligns with our model predictions. We believe that the few remaining ‘gaps’ in experimental data can be performed by other labs in the future.

3b. The proposed circuit connectivity of the 'incentive circuit' needs to be defined for each MBON because most contemporary work shows that different kinds of memory involved plasticity in different subsets of MBONs. Can the model make specific testable predictions for each subset of MBON?

We include in Table 1 a complete list of known neurons that we propose comprise our circuit connectivity (e.g., MBON-γ1pedc>α/β), how we define it in terms of our model (e.g., sat), and the microcircuit in the incentive circuit we propose it is in (e.g., SM). We now also include more discussion of similarities between the proposed properties of our neurons and known properties of memory and plasticity derived from experimental data. For example, we identify MBON-γ1pedc>α/β as an MBON that encodes susceptible memories, which is consistent with how its corresponding DAN, PPL1-γ1pedc, induces a relatively high learning rate and low retention time (Aso and Rubin, 2016).

“Figure 5C-E show the responses of these neurons from experimental data (left) and from our model (right) during aversive conditioning […], which follow a similar pattern. Learning in this circuit is shown by the sharp drop (in both experimental data and model) of the response of MBON-γ1pedc>α/β (Figure 5D) to odour B already from the second trial of the acquisition phase. […] due to our plasticity rule, if the US subsequently occurs without the CS […], the MBON synaptic weights reset due to the recovery effect […]. This is consistent with the high learning rate and low retention time observed in Aso and Rubin. (2016), and it results in a memory that is easily created and erased: a ‘susceptible memory’.” - lines 186-197.

On the other hand, PPL1-γ2α’1 and PAM-β'2a keep the balance between the attraction and avoidance STM, which is also consistent with the more balanced learning rate and retention time in PPL1-γ2α’1 found by Aso and Rubin (2016).

“The ‘charging’ DANs, PAM-β'2a and PPL1-γ2α'1, should be activated directly by reinforcement as well as by the restrained MBONs. This allows for memories to be affected directly by the reinforcement, but also by the expression of the opposite valence memories. The latter feature keeps the balance between the memories by automatically erasing a memory when a memory of the opposite valence starts building up and results in the balanced learning rate and retention time as observed in Aso and Rubin (2016).” - lines 233-238.

We have also provided general predictions regarding specific neurons of the incentive circuit in Box 1.

“MBON-γ2α'1 and MBON-γ5β'2a should exhibit short-term memories, while MBON-α'1 and MBON-β2β'2a long-term memories. MBON-γ1pedc>α/β and MBON-γ4>γ1γ2 should exhibit susceptible memories. Restrained and susceptible MBONs should show more consistent responses across flies. LTM MBONs should have more variable responses because they encode all previous experiences of the animal.” - lines 633-637.

“Blocking the output of charging DANs (i.e., PPL1-γ2α'1 and PAM-β'2a) could reduce the acquisition rate of LTM MBONs, while blocking the output of LTM MBONs would prevent memory consolidation. Blocking the reciprocal connections of the circuit should prevent generalising amongst opposing motivations (unable to make short- or long-term alteration of responses to odours once memories have formed). Blocking the output of forgetting DANs would additionally lead to hyper-saturation of LTMs, which could cause inflexible behaviour.” - lines 650-656.

“Activation of the forgetting DANs should depress the KC-MBON synaptic weights of the restrained and LTM MBONs of the same and opposite valence respectively, and as a result suppress their response to KC activation. Activation of the same DANs should cause increased activity of these MBONs for silenced KCs at the time..” - lines 657-660.

4. Further experimental predictions should be made, based on well-parameterized models of the underlying neurons. Can the authors provide considerably more clarity on which sets of behavioral or physiological data are selected by the authors as targets or tests for specific parts of their model?

We have increased the clarity of the predictions resulting from our model by adding a floating box in the discussion (see Box 1), where we summarise a number of specific predictions (some of them mentioned in the previous comment). We also add a column in Table 1 clarifying whether physiological/anatomical (i.e., using light or electron microscopy) or behavioural/functional (i.e., looking at the responses of postsynaptic neurons while manipulating the pre-synaptic ones) data were used in order to validate the connections of the model.

5. Can the model account for existing data showing overlapping conflicting engrams? Additional experiments and simulations may be needed to ascertain this.

This is an interesting example of sophisticated memory mechanisms in the brain, and both experimental data and our model support this phenomenon. Very recent work (Felsenberg et al., 2018; McCurdy et al., 2021) found that conflicting memories can coexist in the fly brain. For example, MBON-γ1pedc>α/β stores the original aversive memory (odour A = avoidance), and does not change its response to odour A despite multiple subsequent presentations of odour A in the absence of shock. Other MBONs, e.g., MBON-γ5β’2a and MBON-γ2α’1 do in fact change their responses to odour A during extinction/reversal. This phenomenon in part formed the basis for our model, thus our model accounts for these phenomena. While we do not have a complete dataset of all relevant neurons and all learning tasks, our model provides predictions of how these neurons would respond, and this can be verified by experimental labs in the future. We now include this in our results:

“From the summarised synaptic weights shown in Figure 11 —figure supplement 1 [equivalent to engrams], we can see that the susceptible MBONs immediately block the simulated flies from approaching the punishing odours [i.e., original aversive memory], while they allow them to approach the rewarding ones, […]. Susceptible MBONs [i.e., MBON-γ1pedc>α/β and MBON-γ4>γ1γ2] convulsively break the balance between attraction and avoidance created by the restrained and LTM MBONs, also affecting their responses, and allowing STM and as a result LTM formation even without the presence of reinforcement. Figure 11 —figure supplement 1 also show that the restrained MBONs [i.e., MBON-γ5β’2a and MBONγ2α’1] seem to play an important role during the first repeats (up to 5), but then they seem to reduce their influence giving up the control to the LTM MBONs [i.e., MBON-α’1 and MBON-β2β’2a], which seem to increase their influence with time. […] [the different types of MBONs / conflicting engrams] seem to better work when combined, as they complement one another in different stages, e.g., during early or late repeats [of the experiment] and crucial times.” - lines 371-387.

Reviewer #1 (Recommendations for the authors):

This ambitious study builds a model of a proposed key circuit motif in fly behaviour and learning, the Incentive Circuit. The authors examine its implications for a variety of behaviours, and perform a thorough circuit-level mapping of model neuronal activity to recordings. The model uses abstracted model neurons and synaptic signaling, but with careful attention to experimental data at many steps. The mapping to experiments is good, and the model makes far-reaching predictions for animal behaviour.

The development of the model is generally well presented. The learning rule is derived from earlier work (Handler et al) and then the authors transform the terms for ER-ca2+ and for cAMP to terms emerging from DA inputs. The model development is especially systematic, building up to the final version step by step with reference to experiments. Importantly, these are mapped to specific sets of experimental observations on the circuit level.

I have mostly comments to clarify or strengthen the presentation.

1. I had a little trouble to envision the two components of D2 and D1. Are they time-varying? Seems to be, see equation 4, where they are presented as D1(t) and D2(t). In other words, do they express D2 and D1 as distinct α functions following spike activity in the DAN? However, in the text and figures it is frequently presented in terms such as D2 > D1 (eg., Figure 3), which looks like a static effect. This was confusing.

We thank the reviewer for this opportunity to clarify our model. D1 (now D) and D2 (now D) are indeed time-varying, i.e., work as a function with time as a parameter, and we do express them as distinct α functions. We agree with the reviewer that there was potential for confusion, because in the main modelling results for the incentive circuit we use a low time resolution such that these effects can be abstracted to be ‘static’ properties of the influence of specific DANs, even though this is ultimately based on the evidence for two components to the response to DA. The influence (positive or negative ‘dopaminergic factor’) on a particular KC-MBON synapse can still be time-varying as it depends on the activity of all DANs targeting this synapse. We have moved the explanation of the two-component DA response to the methods and now focus on the abstracted concept in the main text. We also updated our description that now makes clear the two terms are time-varying.

“This essentially means that D(t) and D(t) are expressed as time-varying functions following DAN spike activity. […] we have the fast update with the high peak for D(t) (0.5 sec for a full update) and a slower update with lower peak for D(t) (1 sec for a full update), …” - lines 873-878.

2. Also in Figure 2A, are we seeing the peak values of ER-Ca or area under curve? Around line 128 it is a hint that it is area under curve, but I am not sure.

It is an approximation of the normalized area under the curve; we refer to it as “the normalised mean change of the synaptic weight” in the manuscript. We have added the paragraph below in order to clarify this now.

“In Figure 16A, we report the normalised mean change of the synaptic weight calculated using the computed ER-ca2+ and cAMP levels and the formula below

ΔWk2mij1Tt=0T1(ERCa2+)ij(t)(cAMP)ij(t)” - lines 864-865 and Equation 34.

3. I would have liked to have seen some more mapping to functional experiments in the figures up to Figure 7, where the components of the model are being built up. The authors mention several in the text. Even a qualitative look at the experimental responses would help to strengthen the motivation of the model design.

We agree, and have addressed this point by plotting known experimental data side by side of simulated data for ease of comparison in Figures 5, 6, 7, and 8. We also explicitly state in the text the results of experimental studies and how that data corresponds with what is predicted by our model. E.g.,

“Learning in this circuit is shown by the sharp drop (in both experimental data and model) of the response of MBON-γ1pedc>α/β (Figure 5D) to odour B already from the second trial of the acquisition phase. […] due to our plasticity rule, if the US subsequently occurs without the CS […], the MBON synaptic weights reset due to the recovery effect […]. This is consistent with the high learning rate and low retention time observed in Aso and Rubin. (2016), and it results in a memory that is easily created and erased: a ‘susceptible memory’.” - lines 189-197.

“the experimental data shows a slight drop in the shock response (first paired with odour B, then with odour A) of the DAN, PPL1-γ1pedc, during the whole experiment, although it remains active throughout. We assume this drop may reflect a sensory adaptation to shock but have not included it in our model.” - lines 201-204.

“Interestingly, the [neural] responses communicated by the MB296B1 terminal are close to the ones produced by the punishment-encoding charging DAN (see Figure 6C) and the ones of the MB296B2 are close to the ones produced by the attraction-driving forgetting DAN (see Figure 8D).” - lines 300-303.

4. The authors then utilize this circuit in an aversive olfactory conditioning paradigm, for which they provide experimental data corresponding to the various neuron types. They then simulate this. This is an outstanding way to validate/test their model. It would be helpful to have the experimental and simulated responses interleaved on the same figure so as to better compare.

We agree and we now plot the experimental and simulated responses side-by-side on the same figure for ease of comparison.

5. I appreciate that it is quite challenging for a simulation to simultaneously replicate properties of several intermediate stages of circuit activity, even more so when the stimulus is not one that the model has been trained on. Could the authors confirm that this is indeed the case, i.e. that the model outcome for figure 9 was obtained only from the parameter tuning earlier in the paper up to Figure 7?

That is correct: the parameters of the model are the same for the whole manuscript. The only parameter that was different was the LTM changing synaptic weight (c->m) in the microcircuits description, and this was just to exaggerate the long-term memory effect and make it more obvious to the reader. As these figures are omitted in our new version of the manuscript, now the parameters are the same for all the results, and we explicitly confirm this in the Methods.

6. It would be useful to perform a statistical evaluation of the fidelity of the model as compared to experiment.

That’s an excellent idea that we now address with Figure 3 —figure supplement 1. We plotted the correlation between behavioural data predicted by our model and experimentally-derived behavioral data from 92 experiments extracted from different studies by Bennett et al. (2021), and found a very strong positive correlation, r = 0.76, p = 2.2 x 10-18 (selected neurons) and r = 0.77, p = 2.2 x 10-19 (best-fit neurons).

7. The authors then place their model flies in a virtual arena and explore a number of behaviours. Here they contrast their model behaviour with the predictions from a different learning, reward prediction error. I would have liked to have seen in figure 11 an illustration of the correspondence to experimental observations from the literature.

This comment inspired us to perform the additional analysis summarized in Figure 3 —figure supplement 1. In this figure, we compare the correlation between our model and experimental data with the correlation between RPE and experimental data, and found that our model performs better than other models. Pearson’s correlations and p values for DPR, RPE and model presented in Bennett et al., 2021, respectively: rDPR = 0.77, pDPR = 1.65 x 10-19; rRPE = 0.58, pRPE = 2.32 x 10-9; and rBennett = 0.68, pBennett < 10-4.

In addition, we provide qualitative evidence of correspondence between our simulated behavior and experimentally-derived behavioural data. For example,

“By looking at the PIs of Figure 11B, we see a strong effect when electric shock is paired with odour A or B, but not very strong otherwise. We also see a smaller π for flies experiencing sugar than the ones that experience electric shock, which is inline with experimental data (Krashes and Waddell, 2011a, b).” - lines 357-360.

Reviewer #2 (Recommendations for the authors):

The manuscript in its current form is built around two main threads. In the first thread, the authors review several results in the literature on associative learning in the mushroom body of the adult fruit fly, and construct an Incentive Circuit (IC) consisting of 6 dopaminergic and 6 mushroom body neurons with specific memory dynamics. They provide a coherent functional view of some of the disparate recent results in associative learning of the mushroom body.

The second thread incorporates a Dopaminergic Learning Rule (DLR) into the IC computational model, providing a computational system for evaluating the learning mechanisms involved.

A weakness here is that the acquisition, forgetting and assimilation of memories qualitatively described in the first thread are not strongly linked with the quantitative IC model described in the second thread.

Conversely, the validation of the IC model circuit, given the noisy data that the authors provide, is only possible in terms of trends, i.e., simple visual inspection. Interpreting the data then is difficult as it does not provide enough constraints for the computational model.

Given the limitations inherent in the validation of the IC from their recorded data, the authors proceed to explore the DLR using behavioral experiments purely based on simulations. This is an effective methodology widely employed in, e.g., robotics. The authors extensively compare the 'learning/navigation' performance of DLR with a variant of reward prediction error (RPE) learning rule and demonstrate a better learning performance. While the comparison may be compelling, we found that underlying the DLR, is the computation of a prediction error, i.e., DLR is a variant of RPE. This calls for a re-evaluation, positioning and clarification of some of the key conclusions regarding why the DLR is effective in associative learning tasks.

Substantive concerns

1. l.128 The section 'Mushroom Body Microcircuits' makes good first reading. However, most of the key statements could further benefit from more extensive quantitative backing as hinted at in Figures 4, 5 and 6 (see also my comment below). Since these microcircuits are simpler than the IC, my expectation is that they could provide better intuition regarding their function.

This is an excellent point. We now plot the quantitative experimental data side-by-side with the simulated data for ease of comparison. We also include the corresponding neuron name with the model neuron’s name, for better intuition, as the reviewer suggested.

“Learning in this circuit is shown by the sharp drop (in both experimental data and model) of the response of MBON-γ1pedc>α/β (Figure 5D) to odour B already from the second trial of the acquisition phase. […] due to our plasticity rule, if the US subsequently occurs without the CS […], the MBON synaptic weights reset due to the recovery effect […]. This is consistent with the high learning rate and low retention time observed in Aso and Rubin. (2016), and it results in a memory that is easily created and erased: a ‘susceptible memory’.” - lines 189-197.

“the experimental data shows a slight drop in the shock response (first paired with odour B, then with odour A) of the DAN, PPL1-γ1pedc, during the whole experiment, although it remains active throughout. We assume this drop may reflect a sensory adaptation to shock but have not included it in our model.” - lines 201-204.

“Interestingly, the [neural] responses communicated by the MB296B1 terminal are close to the ones produced by the punishment-encoding charging DAN (see Figure 6C) and the ones of the MB296B2 are close to the ones produced by the attraction-driving forgetting DAN (see Figure 8D).” - lines 300-303.

2. Figures 4F and 4G are rather difficult to understand/parse. More caption details, choice of different colors, would help.

Same comment regarding Figures 5B, 5C, 5E and 5F, and 6B, 6C.

Based on this comment and similar sentiments expressed by other reviewers, we have now removed these figures. Instead, we made new figures (i.e., Figures 5, 6, 7, and 8) which plot known experimental data side by side of simulated data for ease of comparison.

3. While Figure 8 is to be commended, the data is rather noisy and, in my view, despite the best intentions, rather difficult to understand/evaluate. As the authors argue in l.312, 'we computationally modelled the incentive circuit in order to demonstrate all the properties we described before and compare the reconstructed activities to the ones of Figure 8C'. However, a comparison by simple visual inspection is rather unconvincing. The need for introducing a distance measure is in order.

Although we do not incorporate a distance measure, we have made two major changes to address this issue: First, we plot the experimental data next to the simulated data so that readers can perform visual inspection more easily. Second, we now provide explicit descriptions of the level of similarity between recorded and simulated data for each neuron. Overall, there is a large degree of overlap for the majority of neurons e.g., PPL1-γ1pedc, both PPL1-γ2α’1, PAM-β’2, MBON-γ1>α/β, MBON-γ5β’2a, MBON-γ2α΄1 and MBON-β2β’2. However, it is interesting that some neurons, e.g., the MBON-α’1, do not have as good of a fit. We discuss this in the text and provide possible reasons for why these neurons in particular do not fit as well.

“Figure 5C-E show the responses of these neurons from experimental data (left) and from our model (right) during aversive conditioning (the paradigm shown in Figure 4), which seem to follow similar patterns.” - lines 186-188.

“However, these trends are not evident in the experimental data as illustrated in Figure 7D (left). We suggest this is because responses of long-term memory neurons depend on the overall experience of the animal and are thus hard to predict during one experiment. For example, it could be the case that the animal has already built some long-term avoidance memory for odour A, such that its presentation without reinforcement in our experiment continues its learning momentum leading to the observed increasing response.” - lines 266-272.

4. I found 'modeling behavior', as presented in the current version of the manuscript, to be quite effective. However, I'd like to note that in the process, the authors changed the underlying PN activity model. This requires, given that the rest of the paper is based on a binary odor model of the PN activity (see the discussion preceding Equation (6)), some careful/detailed assessment of its implications.

This is a valid point. For consistency we reran our simulations in Figure 11 using the same PN activity parameters (binary, using a threshold on odour intensity) as those used in the earlier figures. These new results are comparable to the previous version, and still support our conclusions. We have now included the specifics of this model here:

“Note that PN responses depend only on the fact that an odour has been detected or not and it is not proportional to the detected intensity.” - lines 812-813.

5. Finally, the authors propose to compare their DLR with a variant of RPE. Here a major conceptual problem arises. The authors argue that DLR is a fundamentally different learning rule from RPE. They state in l.462 that 'The idea behind RPE is that the fly learns to predict how rewarding or punishing a stimulus is by altering its prediction when this does not match the actual reward or punishment experience'. This can be adapted to the mushroom body circuit by assuming that the MBON output provides a prediction of DAN activity. But this is exactly what Equation (18) states. The differential equation (18) describing the gradient of the DAN activity is equal to sum of the weighted shock delivery ('transform' in l.750) and the weighted MBON activity (l.755). The sum is just the prediction error between the two terms. Consequently, since the DLR is, in view of this reviewer, a variant of RPE, a comparison with another RPE is of little interest. A substantial re-write of the paper starting with the section on the Incentive Circuit (l. 257) is in order.

We believe we all agree that our DPR (with the simplest circuit implementation as shown in Figure 2) is definitely not a variant of the RPE.

“Instead of calculating the error between the reinforcement and its prediction, DPR uses the reinforcement as a driving force to maximise the diversity in the synaptic weights of reinforced experience while reducing it for experiences irrelevant to the reinforcement, which is functionally closer to the information maximisation theory (Bell et al., 1995; Lee et al., 1999; Lulham et al., 2011) than the RPE principle.” - lines 103-107.

The reviewer is right that (in the incentive circuit) DAN responses are indeed calculated based on the weighted US plus the weighted MBON activity. However, as the DAN and US responses are always positive numbers, and the synaptic weights are also positive, this is not (in general) the calculation of an error. An exception is the s->d connection, which is inhibiting. However, even for this case, although this term could be interpreted as error calculation, only the positive part of the DAN activity is used in the learning rule (see Equation 22) which means that the MBON activity (indirectly passed through the DAN activity) only controls the magnitude and not the ‘direction’ of change for the synaptic weight. It is thus clearly different to RPE methods that control both the magnitude and direction of change based on the error computed between the reinforcement and its ‘prediction’.

“Consequently, the model data shows a positive feedback effect: the DAN causes depression of the MBON response to odour, reducing inhibition of the DAN, which increases its response, causing even further depression in the MBON. Note this is opposite to the expected effects of reward prediction error.” - lines 205-208.

That said, we mention in our Discussion the possibility that RPE could be implemented by other neurons or connections of the mushroom body (not included in the incentive circuit):

“However, although the evidence for MBON-DAN feedback connections is well-grounded, it is less clear that they are consistently opposing. For example, in the microcircuits we have described, based on neurophysiological evidence, some DANs that depress synaptic weights receive inhibitory feedback from MBONs (Pavlowsky et al., 2018) and some DANs that potentiate synaptic weights receive excitatory feedback from DANs (Ichinose et al., 2015). As we have shown, the DPR is able to operate with this variety of MBON-DAN connections. Note that, by using the appropriate circuit, i.e., positive MBON-DAN feedback to depressing DANs, our DPR could also have an RPE effect. Although the proposed incentive circuit does not include such connections, it is still possible that they exist.” - lines 478-486.

6. l.765: "The above matrices summarise the excitatory (positive) and inhibitory (negative) connections between MBONs and DANs or other MBONs. The magnitude of the weights was hand-tuned in order to get the desired result." This 'hand-tuning" appears, to me, to be a 'construction' of the prediction error on the right hand side of Equation (18). Some details might help clarify to what extent the hand-tuning is based on the assumptions of the binary model of the 2 odors at the PN level. I presume that the generality of the model alluded to in l.743 stating that 'that the number of neurons we are using for PNs and KCs is not very important and we could use any combination of PN and KC populations' breaks down and the hand-tuning needs to be repeated every time the number of neurons is changed.

We have now made clearer in the methods that the ‘hand-tuning’ of weight magnitude does not permit alteration of the sign of the weights, and it does not result in effective construction of prediction error, as detailed in our previous answer. The tuning is used to create a better matching between the recorded and reconstructed responses in Figures 5-9, and so as to keep the balance of memories in the circuit, e.g., the MAM forgetting should be equally weighted to the LTM charging so that we erase from the STM the same amount as we store in the LTM, and it is independent to the PN activity pattern. The weights are not further changed for the remainder of the results. Finally, by hand-tuning we want to emphasise that we haven’t used any automatic, unconstrained method to calculate the weights in order to fit the data better. We have now edited the text to reflect this:

“… we define these parameters and some properties of our computational model, which are not a result of unconstrained optimisation and are consistent throughout all our experiments.” - lines 705-706.

“The sign of the weights was fixed but the magnitude of the weights was hand-tuned in order to get the desired result, given the constraint that equivalent types of connections should be same weight (e.g., in the reciprocal microcircuits). The magnitude of the synaptic weights specify the effective strength of each of the described microcircuits in the overall circuit.” - lines 743-746.

Reviewer #3 (Recommendations for the authors):

The authors propose an original dopaminergic learning rule, which, when implemented in simple neural circuit motifs shown to exist within the Drosophila mushroom body (MB) , can potentially account for a very large number of independent, poorly integrated physiological and behavioural phenomena associated with the mushroom body. It considers multiple behavioural roles of MB output neurons beyond attraction and aversion and offers new insight to the how the MB functions in acquisition, consolidation and forgetting of memories. The manuscript further attempts to show how similar principles could potentially be useful in the mammalian brain. An ambitious and integrative analysis of this sort is sorely needed in the field.

The paper has obviously involved very broad and deep consideration of the MB connectome as well as genetic, physiological and behavioural studies of the roles of the different classes of Kenyon cells, MBONs and DANs that innervate the mushroom body. It is original and ambitious and potentially very valuable to the field.

My major reservation is that the manuscript is very difficult to read and evaluate by anyone who is not a Drosophila mushroom body aficionado. I consider myself an interested reader and one who keeps broad track of the field, but found the need to read and evaluate far too many papers cited by the authors to decide how well phenomena the authors attempt to model have been demonstrated and how well assumptions made by the authors are justified by data.

1. E.g. I was stymied even at figure 1, where mutual inhibition between MBONs is indicated and it took me considerable (and eventually futile) effort to look into where and how well this has been established.

In Figure 1 we meant to demonstrate that MBON-to-MBON connections exist in the mushroom bodies, and it was not our intention to suggest mutual inhibitory connections. We have changed the lines of these connections to dashed, so that they look different from the rest of the connection. We have also updated the caption of Figure 1 to make this clear.

“These circuits include some direct (but not mutual) MBON-MBON connections (dashed inhibitory connections).” – Figure 1.

2. To make the work more accessible at least to this moderately educated reviewer, I fear that a major re-rewrite will be required. I would suggest that for each section – exactly has been shown be clearly enumerated, with enough detail provided for the reader to judge the strength of these data. The justification and support for three types of MBONs and their incentive should also be particularly clearly indicated.

We have undertaken a major rewriting and we hope the manuscript is now easier to process, including for those less familiar with the Drosophila mushroom body. This includes more explicit connection of each part of the circuit construction to the relevant data.

3. Moreover, while the authors are correct to point out the limitations of current models based on dopamine prediction-error, I do wonder if there is room for prediction error to also contribute meaningfully within the framework proposed in this paper.

Indeed, we believe that there is still room for RPE in the fly brain, that could also be implemented by our DPR given specific circuitry as we discuss in our text:

“… this rule [i.e., DPR], in combination with some specific types of circuits, can result in prediction of reinforcements, …” - lines 107-108.

“… by using the appropriate circuit, i.e., positive MBON-DAN feedback to depressing DANs, our DPR could also have an RPE effect. Although the proposed incentive circuit does not include such connections, it is still possible that they exist.” - lines 484-486.

However, we also believe it is not a general (or necessary) property of plasticity in the mushroom body, as we illustrate in our incentive circuit.

I apologise for the not having a list of specific issues for the authors to address, because I found the basis to be so difficult to explore but here is some general feedback.

4. It is nice that and the dynamics of neural responses obtained with the model correspond closely with ones reported in previous studies (although there are exceptions, some nicely highlighted by the authors).

We thank the reviewer and we are happy that they see the value of our work.

5. There should be deeper engagement with signalling mechanisms that differentiate the two types of dopamine receptors. I found the assumptions regarding their differences to be useful for the modelling of different effects of reinforcement before or after sensory experience (Ruta Cell 2019), but quite superficial in terms of providing hypothesis for how the receptors may differ in terms of mechanism of action.

D1 and D2 (now D and D, respectively) are not necessarily meant to be DopR1 and DopR2 responses.

These are 2 abstract terms/components of the dopaminergic signal that interact in the synapse and might be related to DopR1 and DopR2, but they are not the same. We hope that this is clearer now in our text:

“… where Dj(t) and Dj(t) are the depression and potentiation components of the DA respectively [assumed to correspond to DopR1 and DopR2 receptors (Handler et al., 2019), or potentially to involve cotransmitters released by the DAN such as Nitric Oxide (Aso et al., 2019)].” - lines 845-847.

6. ON the same note, specific experimental predictions of the model could also be clearly indicated at the end of each section.

We have now added a floating box (Box 1) with specific experimental predictions of the model. Some examples of these predictions include: (a) the roles of the different DANs and MBONs in the memory dynamics of fruit flies, (b) how the activity of specific neurons would be affected when manipulating the activity of specific neurons in the mushroom body and (c) what are the effects of manipulating the neurons in different conditioning types (e.g., first-order, second-order and unpaired).

7. While the authors admittedly designed informative and clear figures, and their Table 1 points the reader to papers that report relevant neural connections and neuronal functions, this is not enough. Data in support of each assumption should be clearly and specifically mentioned and hypotheses connections also clearly stated. After considerable effort, I still could find no evidence for the existence of inhibitory connections between MBONγ4 and MBONγ2 (which is not to say that none exist – but surely it is the authors job to clarify this).

We have updated Table 1 to make more clear what information about neural connections is known versus hypothesized. For each connection, we denote whether its anatomical connection (using light microscopy or electron microscopy) or functional connection (i.e. whether activating the presynaptic neuron leads to an excitatory or inhibitory response in the postsynaptic neuron, and/or the neurotransmitter released by the presynaptic neuron) is known.

Regarding the inhibitory connection between MBON-4 and MBON-2, we assume that the reviewer refers to the depressing dopaminergic effect of PAM-04 (i.e., PAM-β2β’2a, fav) to the KC-MBON synapses of MBON-02 (i.e., MBON-β2β’2a, mat) in the reciprocal LTM microcircuit. We based our assumption that this effect exists on Aso et al. (2014) and Li et al. (2020), who support that specific MBONs that extend their dendrites in compartments where specific DANs terminate their axons are affected by dopamine emitted by them. However, in most cases it is unclear whether the effect of this dopamine is potentiating or depressing, which we try to infer by using the data from McCurdy et al. (2021). Exceptions are the microcircuits described by Pavlowsky et al. (2018), Felsenberg et al. (2018), McCurdy et al. (2021) and Ichinose et al. (2015), who experimentally show the sign of dopamine effect onto the target synapses of the specific MBONs, which we take into account and use them as is in the model. The rest of the effects are postulated either by the symmetry of the circuit or from logic of what the desired function is.

8. The authors should also try to account for the discovery of parallel, independent memory traces (like appetitive LTM formation towards the CS- in classic LTM aversive training paradigms).

We agree with the reviewer that this is an important phenomenon and it should be addressed. There are multiple ways that parallel memories are built in our model. First, referring to memories of the same odour (transmitted by the same KC population), and second, to individual odours (transmitted by different KC populations). We think that it is now clear in our manuscript that independent memory traces are formed in the susceptible MBONs (as the activity of the one does not depend on the other – not connected in any way), while STM and LTM MBONs store dependent memories (as they are connected reciprocally and build dependencies). We mention this here:

“The restrained MBONs activate their respective ‘charging’ DANs, which start to potentiate the ‘LTM’ MBONs of same valence, while also depressing the response (to KC input) of the restrained MBON of opposite valence.” - lines 159-161.

“… the susceptible MBONs immediately block the simulated flies from approaching the punishing odours, while they allow them approach the rewarding ones, […]. This is partially because of the lack of reciprocal connections between the opposing susceptible MBONs, and it can be verified through the appetitive conditioning, […]. Susceptible MBONs convulsively break the balance between attraction and avoidance created by the restrained and LTM MBONs, …” - lines 372-378.

On the other hand, memories associated to different odour identities are formed in parallel through the different populations of KCs (i.e., their connections to MBONs). Although these memories are in principle independent, they can be dependent if the populations of two odours are overlapping.

“… our results show that (in time) the simulated flies seem to develop some prior knowledge about both odours when experienced at least one of them with reinforcement (see Figure 11B and Figure 11 —figure supplement 2A), which we suggest is because of their overlapping KCs associated with both odours.” - lines 363-366.

9. Does the dopaminergic learning rule explain the differences in dynamics and memory strength between appetitive and aversive memories? These two types of memory involved different molecular components and display different learning rules (stronger short-term aversive memories and longer-lasting appetitive memories requiring less training)? This should perhaps be clarified, particularly since KC output appears dispensable for aversive learning (acquisition) but potentially necessary for the acquisition of appetitive memories (Pribbenow et al., 2021).

That’s an excellent question! Indeed, the DPR produced similar findings in our simulations of behavioural experiments, in terms of dynamics and memory strength between appetitive and aversive memories. Our (simulated) behavioural experiments show that this difference in dynamics and memory strength between appetitive and aversive memories is a result of the behaviour itself and has nothing to do with the plasticity rule or the circuit. Specifically, although the DPR and IC are characterized by complete symmetry, the fact that flies attracted by an odour tend to spend more time experiencing this odour, while flies avoiding an odour tend to spend less time experiencing it, actually produces this difference in the learning outcome. So we predict that the mechanism that handles both cases is exactly the same, but a more naturalistic condition is needed in order to see this effect. This is now highlighted in our manuscript:

“We […] see a smaller π for flies experiencing sugar than the ones that experience electric shock, which is inline with experimental data (Krashes and Waddell, 2011a,b). When shock is paired with both odours we expect that the simulated flies will try to minimise the time spent exposed to any of them […]. In contrast, simulated flies seem to increase the time spend in both odours when paired with sugar with a slight preference towards the reinforced odour.” - lines 358-363.

10. I found the easy assumption that forgetting involves erasure to be troubling. Perhaps this happens sometimes. But many apparently "forgotten" memories are never erased, simply not reactivated for multiple reasons. Intellectually this point needs to be acknowledged.

We thank the reviewer for the opportunity to refine our wording. As the reviewer points out, there are multiple neural mechanisms that could lead to the behavioral manifestation of a “forgotten” memory, e.g., that the fly no longer avoids an odour previously paired with aversive stimuli. In some instances, the original aversive memory undergoes decay over time (e.g., susceptible MBONs during extinction and unpaired learning).

“… due to our plasticity rule, if the US subsequently occurs without the CS (see unpaired phase in the model, for which we do not have fly data), the MBON synaptic weights reset due to the recovery effect…” - lines 193-195.

In some cases (e.g., restrained and LTM MBONs), it remains intact but competes with a new parallel memory formed when the odour is presented without electric shock, as in extinction or reversal learning.

“The response of MBON-γ5β'2a (Figure 5E) can be observed to have the opposite pattern [to the MBONγ1pedc>α/β], i.e., it starts to respond to odour B from the second trial of acquisition as it is no longer ‘restrained’. Note however that the response it expresses, when the restraint is removed, also depends on its own synaptic weights for KC input, which as we will see, may be affected by other elements in the incentive circuit.” - lines 197-201.

In our model, although memories in the susceptible and restrained MBONs are constantly updated, LTM MBONs integrate these memories and save them for a long time through saturation.

“Figure 7D (right) demonstrates the charging of the avoidance-driving LTM MBON during the acquisition (for odour B) and its continued increase during the forgetting phases.” - lines 265-266.

However, even when the memories in the LTM MBONs are weakened (e.g., due to the reciprocal LTM connections), we suggest that they are further assimilated by higher level LTMs in the vertical lobes of the MB, but this is not part of our circuit and needs further investigation.

“… we predict that the function of the cingulate cortex is represented by the α/β MBONs, encoding the ‘emotions’ of the animal towards reinforced stimuli, potentially controlling more sophisticated decision making.” - lines 691-694.

[Editors’ note: what follows is the authors’ response to the second round of review.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

1. Could the authors compare their simulated/predicted behavior with some quantitative or semi-quantitative measures of experimental behavior?

We recognise the need for quantitative comparison between our results and the literature. For this reason, in our behavioural Results section, we highlight more clearly the results reported in our Figure 3 —figure supplement 1, showing (using distillation to a common metric) a high correlation between the behaviour produced by our model and data from 92 classical conditioning experiments.

“Following this approach and using the summarised data collected by Bennett et al. (2021), we have tested the performance of our model in 92 olfactory classical conditioning intervention experiments from 14 studies (Felsenberg et al., 2017; Perisse et al., 2016; Aso and Rubin, 2016; Yamagata et al., 2016; Ichinose et al., 2015; Huetteroth et al., 2015; Owald et al., 2015; Aso et al., 2014b; Lin et al., 2014; Plaçais et al., 2013; Burke et al., 2012; Liu et al., 2012; Aso et al., 2010; Claridge-Chang et al., 2009), i.e., the observed effects on fly learning of silencing or activating specific neurons, including positive and negative reinforcements. The Δf predicted from the incentive circuit correlated with the one reported from the actual experiments with correlation coefficient r=0.76, p=2.2 x 10-18 (Figure 3 —figure supplement 1).” – lines 333-342.

2. Can the authors elaborate on their mapping of ER-CA and cAMP in the model with the cited data? This relates to point 4 from Reviewer 1.

We also noticed that the individual traces of ER-Ca and cAMP (Figure 16B) do not match exactly the data from Handler et al. (2019). However, Figure 16A (and B – ΔW, i.e., black line) shows that their effect (combination of the two) is very similar to the one presented by the original paper (Pearson correlation: r=0.98, p=3.9 10-4). To allow direct comparison, we now plot the modelled ER-Ca and cAMP on the top of the data from Handler et al. (2019) in Figures 16A and B (grey lines). Note that we do not claim to model the exact ER-Ca and cAMP levels and we hope that this is now clear in the text.

“Figure 16 shows the ER-Ca2+ and cAMP levels during forward and backward conditioning for a depressing DAN […], which are comparable to the data shown in Handler et al., (2019) (also Figure 16 – shown in grey). Note that here we are more interested in the overall effects of learning shown in Figure 16A rather than the detailed responses of Figure 16B.” – lines 869-873.

3. Can the authors do some parameter sensitivity analysis as suggested by the reviewers?

We added an extensive search/analysis on the timing parameters (i.e., τshort and τlong) of the plasticity rule where we compare the correlation between the data and the effect of our equation (i.e., Figure 16A). We have now added Figure 16 —figure supplement 2 showing the results of this analysis.

“In Figure 16, where we are interested in more detailed dynamics of the plasticity rule, and the sampling frequency is high, i.e., 100 Hz, we use τshort = 60 and τlong = 104, which we choose after a parameter exploration available in Figure 16 —figure supplement 2” – lines 893-895.

Regarding the circuit, the synaptic strengths are hand tuned in order to make the plots in Figures 5-8 (at least visually) match. A parameter analysis (in the same way that we did it for the plasticity rule, i.e., comparing the reproduced responses to the data using a standard measure, e.g., Pearson correlation coefficient) is less effective and harder to make for these parameters, as each connection affects the responses of many neurons in the circuit. Instead, we have created Figure 14 —figure supplements 1, 2 and 3, which show how the responses of the neurons in the circuit alter by changing one parameter at a time.

“Figure 14 —figure supplement 1, Figure 14 —figure supplement 2 and Figure 14 —figure supplement 3 show how each of these parameters affect the responses of the neurons in the incentive circuit.” – lines 769-771.

In addition, the reviewers had a few points for the authors to expand upon in the revision, and a number of useful suggestions to improve clarity.

Reviewer #1 (Recommendations for the authors):

This is an ambitious but also highly complicated modeling study that seeks to account for a wide range of fly learning behaviour in terms of underlying learning rules and circuitry.

The strengths of the study are its ambition, detail and substantial attention to experimental inputs. In principle it builds up a large and testable conceptual framework for understanding many aspects of learning. Its weaknesses, which are readily fixed, are 1. that the study misses opportunities to better compare model to experiments. and 2, that the study doesn't do a systematic parameter and model exploration to see how robust are the properties.

With these additions the study would be strong and of value to the field in laying out a template for further investigation. The authors posit that this framework could also apply to other organisms.

General points:

1. This is an ambitious but also highly complicated modeling study that seeks to account for a wide range of fly learning behaviour in terms of underlying learning rules and circuitry. The authors have made substantial improvements to the clarity of the presentation, particularly with regards to comparison of experimental and simulated data.

I would have liked to see similar comparison for two more features: the behaviour, and the crucial learning rule section, as I comment below. I note that a similar request was made in an earlier review.

The reviewer has expanded on this issue in points 3 and 4 and we respond there.

2. The other big thing I would have liked to see is an exploration of parameter sensitivity. This is needed both because of model complexity and because of the not-perfect match between model and input data. No model is perfect, but the confidence in a model is much improved if one can see that it still 'works' even when the numbers (and other assumptions) shift around a bit.

The reviewer has expanded on this issue in point 5 and we respond there.

3. Behaviour: The authors have made the interesting and potentially powerful step of linking their model to measurable behaviour. But they miss the opportunity to put the outcomes (experiment and model) side by side. Even a semi-quantitative distillation to some common metric for displaying and comparing the experimental and model properties would have been valuable.

We recognise the need for quantitative comparison between our results and the literature. For this reason, in our behavioural Results section, we highlight more clearly the results reported in our Figure 3 —figure supplement 1, which shows (using distillation to a common metric) a high correlation between the behaviour produced by our model and data from 92 classical conditioning experiments.

“Following this approach and using the summarised data collected by Bennett et al. (2021), we have tested the performance of our model in 92 olfactory classical conditioning intervention experiments from 14 studies (Felsenberg et al., 2017; Perisse et al., 2016; Aso and Rubin, 2016; Yamagata et al., 2016; Ichinose et al., 2015; Huetteroth et al., 2015; Owald et al., 2015; Aso et al., 2014b; Lin et al., 2014; Plaçais et al., 2013; Burke et al., 2012; Liu et al., 2012; Aso et al., 2010; Claridge-Chang et al., 2009), i.e., the observed effects on fly learning of silencing or activating specific neurons, including positive and negative reinforcements. The Δf predicted from the incentive circuit correlated with the one reported from the actual experiments with correlation coefficient r=0.76, p=2.2 x 10-18 (Figure 3 —figure supplement 1).” – lines 333-342.

4. Figure 16: Details of ER-CA and cAMP in the model don't match data. The form of the pairing for ER Ca is inconsistent with the data of Handler et al., particularly when CS precedes US by a large interval. Handler et al. show no response for forward pairing even several seconds after the last stimulus. Also, the time-course of ER response for the backward pairing case is inconsistent. In the Handler data (Figure 6) the ER signal remains low (i.e, very different from baseline) well past 5 seconds, whereas in Figure 16 the signal returns to baseline within 5s. I am also concerned that there doesn't seem to be experimental support for the reduced cAMP signal at very small overlap intervals. Indeed, the Handler data suggests that there is a large signal at the 0.5s and -1.2s points. Figure 16 shows that the model assumes a low and brief signal at -1.2s. I would have appreciated having the experimental data from Handler and others illustrated here in the same figure, just to see how well the model forms behave. It would save the reader the step of going to look up another paper and tracking down appropriate figure panels.

We agree with the reviewer that the individual traces of ER-Ca and cAMP do not match exactly the data from Handler et al. (2019). On the other hand, Figure 16A (and B – ΔW, i.e., black line) shows that their effect (combination of the two) is very similar to the one presented by the original paper (r=0.98, p=3.9 10-4). The authors of Handler et al. (2019) have kindly provided the data from their figures, which allows us to report the Pearson correlation coefficient and also explore the timing parameters (requested in a different point). Thus, we now plot the modeled ER-Ca and cAMP on the top of the data from Handler et al. (2019) in Figures 16A and B (grey lines) as the reviewer suggested. Note that we do not claim to model the exact ER-Ca and cAMP levels and we hope that this is now clear in the text.

“Figure 16 shows the ER-Ca2+ and cAMP levels during forward and backward conditioning for a depressing DAN […], which are comparable to the data shown in Handler et al., (2019) (also Figure 16 – shown in grey). Note that here we are more interested in the overall effects of learning shown in Figure 16A rather than the detailed responses of Figure 16B.” – lines 869-873.

5. As one example of a useful parameter sensitivity analysis: The form of the deltaWij seems rather crucial to the model, so I'm homing in on this. It is a difference of two values which are themselves clearly the difference of opposing signals. It would therefore be valuable to show that relaxation of these tight timing requirements does not upset the learning rule and subsequent behaviour. It would be useful to see similar sensitivity analyses for other key parts of the model.

We agree with the reviewer that it would be very interesting to explore all the parameters of the model. This way we could show how sensitive the predictions of the model are in the selection of its parameters. Given that we now have the Handler et al. (2019) data, we added an extensive search/analysis on the timing parameters (i.e., τshort and τlong) of the plasticity rule where we compare the correlation between the data and the effect of our equation (i.e., Figure 16A). We have now added Figure 16 —figure supplement 2 showing the results of this analysis.

“In Figure 16, where we are interested in more detailed dynamics of the plasticity rule, and the sampling frequency is high, i.e., 100 Hz, we use τshort = 60 and τlong = 104, which we choose after a parameter exploration available in Figure 16 —figure supplement 2” – lines 893-895.

Regarding the circuit, an extensive parameter search is much harder to make. The parameters of the circuit include the strength of a connection between MBONs and post-synaptic targets (e.g., other MBONs or DANs), the modulatory strength of the different types of DANs onto target KC>MBON synapses and the biases (i.e., resting activity) of the neurons. In our approach, the synaptic strengths are hand tuned in order to make the plots in Figures 5-8 (at least visually) match. A parameter analysis (in the same way that we did it for the plasticity rule, i.e., comparing the reproduced responses to the data using a standard measure, e.g., Pearson correlation coefficient) is less effective and harder to make for these parameters, as each connection affects the responses of many neurons in the circuit. Instead, we have created Figure 14 —figure supplements 1, 2 and 3, which show how the responses of the neurons in the circuit alter by changing one parameter at a time.

“Figure 14 —figure supplement 1, Figure 14 —figure supplement 2 and Figure 14 —figure supplement 3 show how each of these parameters affect the responses of the neurons in the incentive circuit.” – lines 769-771.

Clarifications:

6. pg 28: 3 lines from bottom.

Do the authors mean "activity of the ith presynaptic KC? 'Target' sounds like it is postsynaptic.

The reviewer is right and we have changed this as suggested.

“the activity of the (ith) pre-synaptic KC” – page 29, 3 lines from bottom.

7. Equation 30 onward.

w_rest: Is this a global parameter for all synapses?

w_rest: The way it is used in the equation looks more like a_rest, the resting activity of the synapse. Sorry to be pedantic, but the units of weight and rate don't match.

This gets further mixed in the equation between lines 853 and 854 where the authors add ki and Wij. Maybe ki is scaled somehow to weights?

wrest is a global parameter that corresponds to the default weight of a variable synapse and the weight to which it tends to return. It could in principle differ for different synapses but for simplicity we here assume that all the KC>MBON synapses (the only variable synapses in our model) should have the same resting weight. We appreciate the issue that ‘synapse weight’ and ‘neuron activity (firing rate)’ are not intrinsically the same units but have indeed implicitly scaled the latter (ki) to allow it to be compared to the current weight in the relevant weight-change equation. In general, it is common to all neural plasticity rules to assume there is some direct conversion from activity levels to weight changes.

8. Figure 5 and later: The responses, both experimental and model, are shown as an up-down oscillation. I assume that the up states are measurements during the training, and down is measurement half a day later. But this is hard to see from the text or legends, and I had to go down to the last section in the methods to see that this seems to be described as on-shock and off-shock values. It is confusing and should be mentioned in the figure legends and accompanying text.

We thank the reviewer for noticing this. The oscillations in the responses are due to the on- and off-shock values, not of different days but of consecutive time-windows (expose the animal to the odour only – i.e., off-shock – before introducing shock along with the odour – i.e., on-shock). We have now updated the captions of Figures 5-8 (and figure supplements) to clarify that.

“For each trial we report two consecutive time-steps: the off-shock (i.e., odour only) followed by the onshock (i.e., paired odour and shock) when available (i.e., odour B in acquisition and odour A in reversal phase) otherwise a second off-shock time-step (i.e., all the other phases).” – Figure 5-9 and the respective Figure supplements.

Reviewer #3 (Recommendations for the authors):

First, I'd like to thank the authors for responding to my concerns/suggestions. At this point, it reads, in my assessment, much better as a result of the many changes. In particular, the newer figures are of high quality and their stated goals much easier to grasp. Also, shifting most of the discussion of the "formal" model in the (old) Results section to the (new) Methods section makes reading flow more intuitively.

Second, the disagreement we had, appears now to be more in terms of naming/labeling.of Equation (18) and (30), thus clarifying the rational for the naming of the 2 learning rules (DPR) and (RPE). However, the "RPE" naming for (30) is, in my view, a bit of a stretch, but I am not raising an objection. Just a friendly note to the authors.

I'd like to make a final suggestion that future readers might benefit from. Reviewer 1 raised this issue already and the authors addressed the question. However, in my view, the presentation starting with "we postulate a mathematical formulation …" just above Equation (32), seems a bit circular. While the authors answered the question, in terms of intuitive modeling (Equation (34)), the presentation thread I am referring to is rather formal. The D's in Equations (32), (33) are not explicitly defined; the equations, when added up are consistent with the Equation above line 854.

We do define D’s as the depression and potentiation components of the DA, assumed to correspond to DopR1 and DopR2 receptors or potentially to involve co-transmitters released by the DAN such as Nitric Oxide.

“where Dj (t) and Dj(t) are the depression and potentiation components of the DA respectively [assumed to correspond to DopR1 and DopR2 receptors (Handler et al., 2019), or potentially to involve cotransmitters released by the DAN such as Nitric Oxide (Aso et al., 2019)].” – lines 863-865.

While Equation (34) provides the intuition of the decomposition of the weights into 2 terms, this decomposition is by no means unique. Having said that, we are then confronted with Equations (35) and (36). There is little justification given for the rational of choosing/postulating these two diff. Equations. I presume that the solution for these Equations are the D's. A careful reading seems to suggest that these are delayed differential equations. In math terms, a single delayed diff. Equation is infinite dimensional, and essentially intractable. The following Equations (37)-(39), while consistent with the discussion above, do not help clarify the matter. Which brings one back to Equations (32), (33).

We are grateful that the reviewer had such a close look to our equations which lead us to have a closer look as well. We came up with Equations (35) and (36) as a simple model of the shape of rise and decay responses to DA release. However, thanks to the reviewer’s comments, we realised that there is a mistake in the differential equations and the τshort and τlong parameters in our equation. The correct values are τshort = 1 and τlong = +∞. Equations 35 and 36 have been amended accordingly.

Note that now we remove the time as a parameter in these equations so that it becomes less confusing and closer to standard notation. The above differential equations and parameters can be used to generate the plasticity rule as in Equations (37)-(39):

dDdt+D=dTWd2kmD(t)=dT(t)Wd2km

and

dDΔdt+DΔ=dTWd2km+DΔ(t)=dT(t)Wd2km+

Which results in

δ(t)=DΔ(t)d(t)=dT(t)Wd2km++dT(t)Wd2km=dT(T)Wd2km

We have now corrected the differential equations and selected parameters in our methods as well – Equations (35) and (36).

Finally, the Methods section has a sizable number of matrices that have seemingly arbitrary entries.

The entries are not arbitrary as the non-zero entries and the sign of the entries are determined by connectivity considerations but it is true that the magnitudes of the non-zero values are somewhat arbitrary, having been chosen through hand tuning in order to (at least visually) match the recorded responses. After another reviewer #1 suggestion (point 5), we now provide Figure 14 —figure supplement 1, 2 and 3, which show how the reconstructed responses are affected by modifying these parameters.

https://doi.org/10.7554/eLife.75611.sa2

Article and author information

Author details

  1. Evripidis Gkanias

    Institute of Perception Action and Behaviour, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review and editing
    For correspondence
    ev.gkanias@ed.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3343-9039
  2. Li Yan McCurdy

    Department of Cellular and Molecular Physiology, Yale University, New Haven, United States
    Contribution
    Conceptualization, Data curation, Investigation, Project administration, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8862-6715
  3. Michael N Nitabach

    1. Department of Cellular and Molecular Physiology, Yale University, New Haven, United States
    2. Department of Genetics, Yale University, New Haven, United States
    3. Department of Neuroscience, Yale University, New Haven, United States
    Contribution
    Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Barbara Webb

    Institute of Perception Action and Behaviour, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review and editing
    For correspondence
    B.Webb@ed.ac.uk
    Competing interests
    No competing interests declared

Funding

Engineering and Physical Sciences Research Council (EP/L016834/1)

  • Evripidis Gkanias
  • Barbara Webb

National Institute of Neurological Disorders and Stroke (R01NS091070)

  • Michael N Nitabach

National Institutes of Health (R01NS091070)

  • Michael N Nitabach

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are grateful to Bertram Gerber for his useful comments on the earlier drafts of the manuscript. We also thank James Bennett for discussion on their data and experiments and Vanessa Ruta for kindly providing their data for validating the dopaminergic plasticity rule. We also thank the Insect Robotics group for helpful critique on the figures and the reviewers on earlier revisions for their fruitful comments.

Senior Editor

  1. Ronald L Calabrese, Emory University, United States

Reviewing Editor

  1. Upinder Singh Bhalla, Tata Institute of Fundamental Research, India

Reviewer

  1. Mani Ramaswami, Trinity College Dublin, Ireland

Publication history

  1. Preprint posted: June 11, 2021 (view preprint)
  2. Received: November 16, 2021
  3. Accepted: March 7, 2022
  4. Version of Record published: April 1, 2022 (version 1)

Copyright

© 2022, Gkanias et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 855
    Page views
  • 172
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Evripidis Gkanias
  2. Li Yan McCurdy
  3. Michael N Nitabach
  4. Barbara Webb
(2022)
An incentive circuit for memory dynamics in the mushroom body of Drosophila melanogaster
eLife 11:e75611.
https://doi.org/10.7554/eLife.75611
  1. Further reading

Further reading

    1. Cancer Biology
    2. Computational and Systems Biology
    Pan Cheng, Xin Zhao ... Teresa Davoli
    Research Article

    How cells control gene expression is a fundamental question. The relative contribution of protein-level and RNA-level regulation to this process remains unclear. Here, we perform a proteogenomic analysis of tumors and untransformed cells containing somatic copy number alterations (SCNAs). By revealing how cells regulate RNA and protein abundances of genes with SCNAs, we provide insights into the rules of gene regulation. Protein complex genes have a strong protein-level regulation while non-complex genes have a strong RNA-level regulation. Notable exceptions are plasma membrane protein complex genes, which show a weak protein-level regulation and a stronger RNA-level regulation. Strikingly, we find a strong negative association between the degree of RNA-level and protein-level regulation across genes and cellular pathways. Moreover, genes participating in the same pathway show a similar degree of RNA- and protein-level regulation. Pathways including translation, splicing, RNA processing, and mitochondrial function show a stronger protein-level regulation while cell adhesion and migration pathways show a stronger RNA-level regulation. These results suggest that the evolution of gene regulation is shaped by functional constraints and that many cellular pathways tend to evolve one predominant mechanism of gene regulation at the protein level or at the RNA level.

    1. Computational and Systems Biology
    2. Neuroscience
    Janus RL Kobbersmed, Manon MM Berns ... Alexander M Walter
    Research Article Updated

    Synaptic communication relies on the fusion of synaptic vesicles with the plasma membrane, which leads to neurotransmitter release. This exocytosis is triggered by brief and local elevations of intracellular Ca2+ with remarkably high sensitivity. How this is molecularly achieved is unknown. While synaptotagmins confer the Ca2+ sensitivity of neurotransmitter exocytosis, biochemical measurements reported Ca2+ affinities too low to account for synaptic function. However, synaptotagmin’s Ca2+ affinity increases upon binding the plasma membrane phospholipid PI(4,5)P2 and, vice versa, Ca2+ binding increases synaptotagmin’s PI(4,5)P2 affinity, indicating a stabilization of the Ca2+/PI(4,5)P2 dual-bound state. Here, we devise a molecular exocytosis model based on this positive allosteric stabilization and the assumptions that (1.) synaptotagmin Ca2+/PI(4,5)P2 dual binding lowers the energy barrier for vesicle fusion and that (2.) the effect of multiple synaptotagmins on the energy barrier is additive. The model, which relies on biochemically measured Ca2+/PI(4,5)P2 affinities and protein copy numbers, reproduced the steep Ca2+ dependency of neurotransmitter release. Our results indicate that each synaptotagmin engaging in Ca2+/PI(4,5)P2 dual-binding lowers the energy barrier for vesicle fusion by ~5 kBT and that allosteric stabilization of this state enables the synchronized engagement of several (typically three) synaptotagmins for fast exocytosis. Furthermore, we show that mutations altering synaptotagmin’s allosteric properties may show dominant-negative effects, even though synaptotagmins act independently on the energy barrier, and that dynamic changes of local PI(4,5)P2 (e.g. upon vesicle movement) dramatically impact synaptic responses. We conclude that allosterically stabilized Ca2+/PI(4,5)P2 dual binding enables synaptotagmins to exert their coordinated function in neurotransmission.