A functional model of adult dentate gyrus neurogenesis
Abstract
In adult dentate gyrus neurogenesis, the link between maturation of newborn neurons and their function, such as behavioral pattern separation, has remained puzzling. By analyzing a theoretical model, we show that the switch from excitation to inhibition of the GABAergic input onto maturing newborn cells is crucial for their proper functional integration. When the GABAergic input is excitatory, cooperativity drives the growth of synapses such that newborn cells become sensitive to stimuli similar to those that activate mature cells. When GABAergic input switches to inhibitory, competition pushes the configuration of synapses onto newborn cells toward stimuli that are different from previously stored ones. This enables the maturing newborn cells to code for concepts that are novel, yet similar to familiar ones. Our theory of newborn cell maturation explains both how adultborn dentate granule cells integrate into the preexisting network and why they promote separation of similar but not distinct patterns.
Introduction
In the adult mammalian brain, neurogenesis, the production of new neurons, is restricted to a few brain areas, such as the olfactory bulb and the dentate gyrus (Deng et al., 2010). The dentate gyrus is a major entry point of input from cortex, primarily entorhinal cortex (EC), to the hippocampus (Amaral et al., 2007), which is believed to be a substrate of learning and memory (Jarrard, 1993). Adultborn cells in dentate gyrus mostly develop into dentate granule cells (DGCs), the main excitatory cells that project to area CA3 of hippocampus (Deng et al., 2010).
The properties of rodent adultborn DGCs change as a function of their maturation stage, until they become indistinguishable from other mature DGCs at approximately 8 weeks (Deng et al., 2010; Johnston et al., 2016; Figure 1a). Many of them die before they fully mature (Dayer et al., 2003). Their survival is experience dependent and relies on NMDA receptor activation (Tashiro et al., 2006). Initially, newborn DGCs have enhanced excitability (SchmidtHieber et al., 2004; Li et al., 2017) and stronger synaptic plasticity than mature DGCs, reflected by a larger longterm potentiation (LTP) amplitude and a lower threshold for induction of LTP (Wang et al., 2000; SchmidtHieber et al., 2004; Ge et al., 2007). Furthermore, after 4 weeks of maturation adultborn DGCs have only weak connections to interneurons, while at 7 weeks of age, their activity causes indirect inhibition of mature DGCs (Temprana et al., 2015).
Newborn DGCs receive no direct connections from mature DGCs (Deshpande et al., 2013; Alvarez et al., 2016) (yet see Vivar et al., 2012), but are indirectly activated via interneurons (Alvarez et al., 2016; Heigele et al., 2016). At about 3 weeks after birth, the γaminobutyric acid (GABAergic) input from interneurons to adultborn DGCs switches from excitatory in the early phase to inhibitory in the late phase of maturation (Ge et al., 2006; Deng et al., 2010) (‘GABAswitch’, Figure 1a). Analogous to a similar transition during embryonic and early postnatal stages (Wang and Kriegstein, 2011), the GABAswitch is caused by a change in the expression profile of chloride cotransporters. In the early phase of maturation, newborn cells express the ${\text{Na}}^{+}{\text{K}}^{+}\text{}2{\text{Cl}}^{}$ cotransporter NKCC1, which leads to a high intracellular chloride concentration. Hence, the GABA reversal potential is higher than the resting potential (Ge et al., 2006; Heigele et al., 2016), and GABAergic inputs lead to Cl^{−} ions outflow through the ${\text{GABA}}_{A}$ ionic receptors, which results in depolarization of the newborn cell (BenAri, 2002; Owens and Kriegstein, 2002). In the late phase of maturation, expression of the ${\text{K}}^{+}{\text{Cl}}^{}$coupled cotransporter KCC2 kicks in, which lowers the intracellular chloride concentration of the newborn cell to levels similar to those of mature cells, leading to a hyperpolarization of the cell membrane due to Cl^{−} inflow upon GABAergic stimulation (BenAri, 2002; Owens and Kriegstein, 2002). The transition from depolarizing (excitatory) to hyperpolarizing (inhibitory) effects of GABA is referred to as the ‘GABAswitch’. It has been shown that GABAergic inputs are crucial for the integration of newborn DGCs into the preexisting circuit (Ge et al., 2006; Chancey et al., 2013; Alvarez et al., 2016; Heigele et al., 2016).
The mammalian dentate gyrus contains – just like hippocampus in general – a myriad of inhibitory cell types (Freund and Buzsáki, 1996; Somogyi and Klausberger, 2005; Klausberger and Somogyi, 2008), including basket cells, chandelier cells, and hilar cells (Figure 1—figure supplement 1). Basket cells can be subdivided in two categories: some express cholecystokinin (CCK) and vasoactive intestinal polypeptide (VIP), while the others express parvalbumin (PV) and are fastspiking (Freund and Buzsáki, 1996; Amaral et al., 2007). Chandelier cells also express PV (Freund and Buzsáki, 1996). Overall, it has been estimated that PV is expressed in 15–21% of all dentate GABAergic cells (Freund and Buzsáki, 1996) and in 20–25% of the GABAergic neurons in the granule cell layer (Houser, 2007). Amongst the GABAergic hilar cells, 55% express somatostatin (SST) (Houser, 2007) and somatostatinpositive interneurons (SSTINs) represent about 16% of the GABAergic neurons in the dentate gyrus as a whole (Freund and Buzsáki, 1996). While axons of hilar interneurons stay in the hilus and provide perisomatic inhibition onto dentate GABAergic cells, axons of hilarperforantpathassociated interneurons (HIPP) extend to the molecular layer and provide dendritic inhibition onto both DGCs and interneurons (Yuan et al., 2017). HIPP axons generate lots of synaptic terminals and extend as far as 3.5 mm along the septotemporal axis of the dentate gyrus (Amaral et al., 2007). PVexpressing interneurons (PVINs) and SSTINs both target adultborn DGCs early (after 2–3 weeks) in their maturation (Groisman et al., 2020). PVINs provide both feedforward inhibition and feedback inhibition (also called lateral inhibition) to the DGCs (Groisman et al., 2020). In general, SSTINs provide lateral, but not feedforward, inhibition onto DGCs (Stefanelli et al., 2016; Groisman et al., 2020; Figure 1—figure supplement 1).
Adultborn DGCs are preferentially reactivated by stimuli similar to the ones they experienced during their early phase of maturation, up to 3 weeks after cell birth (Tashiro et al., 2007). Even though the amount of newly generated cells per month is rather low (3–6% of the total DGCs population [van Praag et al., 1999; Cameron and McKay, 2001]), adultborn DGCs are critical for behavioral pattern separation (Clelland et al., 2009; Sahay et al., 2011a; Jessberger et al., 2009), in particular in tasks where similar stimuli or contexts have to be discriminated (Clelland et al., 2009; Sahay et al., 2011a). However, the functional role of adultborn DGCs is controversial (Sahay et al., 2011b; Aimone et al., 2011). One view is that newborn DGCs contribute to pattern separation through a modulatory role (Sahay et al., 2011b). Another view suggests that newborn DGCs act as encoding units that become sensitive to features of the environment which they encounter during a critical window of maturation (Kee et al., 2007; Tashiro et al., 2007). Some authors have even challenged the role of newborn DGCs in pattern separation in the classical sense and have proposed a pattern integration effect instead (Aimone et al., 2011), while others suggest a dynamical (Aljadeff et al., 2015; ShaniNarkiss et al., 2020) or forgetting (Akers et al., 2014) role for newborn DGCs. Within that broader controversy, we ask two specific questions: First, why are GABAergic inputs crucial for the integration of newborn DGCs into the preexisting circuit? And second, why are newborn DGCs particularly important in tasks where similar stimuli or contexts have to be discriminated?
To address these questions, we present a model of how newborn DGCs integrate into the preexisting circuit. In contrast to earlier models where synaptic input connections onto newborn cells were assumed to be strong enough to drive them (Chambers et al., 2004; Becker, 2005; Crick and Miranker, 2006; Wiskott et al., 2006; Chambers and Conroy, 2007; Aimone et al., 2009; Appleby and Wiskott, 2009; Weisz and Argibay, 2009; Temprana et al., 2015; Finnegan and Becker, 2015; DeCostanzo et al., 2018), our model uses an unsupervised biologically plausible Hebbian learning rule that makes synaptic connections between EC and newborn DGCs either disappear or grow from small values at birth to values that eventually enable feedforward input from EC to drive DGCs. Contrary to previous modeling studies, our plasticity model does not require an artificial renormalization of synaptic connection weights since model weights are naturally bounded by the synaptic plasticity rule. We show that learning with a biologically plausible plasticity rule is possible thanks to the GABAswitch, which has been overlooked in previous modeling studies. Specifically, the growth of synaptic weights from small values is supported in our model by the excitatory action of GABA, whereas, after the switch, specialization of newborn cells arises from competition between DGCs, triggered by the inhibitory action of GABA. Furthermore, our theory of adultborn DGCs integration yields a transparent explanation of why newborn cells favor pattern separation of similar stimuli, but do not impact pattern separation of distinct stimuli.
Results
We model a small patch of cells within dentate gyrus as a recurrent network of 100 DGCs and 25 GABAergic interneurons, omitting the Mossy cells for the sake of simplicity (Figure 1b). The modeled interneurons correspond to SSTINs from the HIPP category, as they are the providers of feedback inhibition to DGCs through dendritic projections (Stefanelli et al., 2016; Yuan et al., 2017; Groisman et al., 2020; Figure 1—figure supplement 1). The activity of a DGC with index $i$ and an interneuron with index $k$ is described by their continuous firing rates ${\nu}_{i}$ and ${\nu}_{k}^{I}$, respectively. Firing rates are modeled by neuronal frequency–current curves that vanish for weak input and increase if the total input into a neuron is larger than a firing threshold. Since newborn DGCs exhibit enhanced excitability early in maturation (SchmidtHieber et al., 2004; Li et al., 2017), the firing threshold of model neurons increases during maturation from a lower to a higher value (Materials and methods). Connectivity in a localized patch of dentate neurons is high: DGCs densely project to GABAergic interneurons (Acsády et al., 1998), and SSTINs heavily project to cells in their neighborhood (Amaral et al., 2007). Hence, in the recurrent network model, each model DGC projects to, and receives input from, a given interneuron with probability 0.9. The exact percentage of GABAergic neurons (or SSTINs) in the dentate gyrus as a whole is not known, but has been estimated at about 10% and only a fraction of these are SSTINs (Freund and Buzsáki, 1996). The number of inhibitory neurons in our model network might therefore seem too high. However, our results are robust to substantial changes in the number of inhibitory neurons (Supplementary file 2).
Each of the 100 model DGCs receives input from a set of 144 model EC cells (Figure 1b). In the rat, the number of DGCs has been estimated to be about 10^{6}, while the number of EC input cells is estimated to be about 2 · 10^{5} (Andersen et al., 2007), yielding an expansion factor from EC to dentate gyrus of about 5. Theoretical analysis suggests that the expansion of the number of neurons enhances decorrelation of the representation of input patterns (Marr, 1969; Albus, 1971; Marr, 1971; Rolls and Treves, 1998) and promotes pattern separation (Babadi and Sompolinsky, 2014). Our standard network model does not reflect this expansion because we want to highlight the particular ability of adult neurogenesis in combination with the GABAswitch to decorrelate input patterns independently of specific choices of the network architecture. However, we show later that an enlarged network with an expansion from 144 model EC cells to 700 model DGCs (similar to the anatomical expansion factor) yields similar results.
At birth, a DGC with index $i$ does not receive synaptic glutamatergic input yet. Hence, the connection from any model EC cell with index $j$ is initialized at ${w}_{ij}=0$. The growth or decay of the synaptic strength ${w}_{ij}$ of the connection from $j$ to $i$ is controlled by a Hebbian plasticity rule (Figure 1c):
where x_{j} is the firing rate of the presynaptic EC neuron, η (‘learning rate’) is the susceptibility of a cell to synaptic plasticity, and $\alpha ,\beta ,\gamma $ are positive parameters (Materials and methods, Table 1). The first term on the righthand side of Equation (1) describes LTP whenever the presynaptic neuron is active (${x}_{j}>0)$ and the postsynaptic firing ${\nu}_{i}$ is above a threshold θ; the second term on the righthand side of Equation (1) describes longterm depression (LTD) whenever the presynaptic neuron is active and the postsynaptic firing rate is positive but below the threshold θ; LTD stops if the synaptic weight is zero. Such a combination of LTP and LTD is consistent with experimental data (Artola et al., 1990; Sjöström et al., 2001) as shown in earlier ratebased (Bienenstock et al., 1982) or spikebased (Pfister and Gerstner, 2006) plasticity models. The third term on the righthand side of Equation (1) implements heterosynaptic plasticity (Chistiakova et al., 2014; Zenke and Gerstner, 2017): whenever strong presynaptic input arriving at synapses $k\ne j$ drives the firing of postsynaptic neuron $i$ at a rate above θ, the weight of a synapse $j$ is downregulated if synapse $j$ does not receive any input, while the weights of synapses $k\ne j$ are simultaneously increased due to the first term (Lynch et al., 1977). Importantly, the threshold condition for the third term (postsynaptic rate above θ) is the same as that for induction of LTP in the first term so that if some synapses are potentiated, silent synapses are depressed. In the model, heterosynaptic interaction between synapses is induced since information about postsynaptic activity is shared across synapses. This could be achieved in biological neurons via backpropagating action potentials or similar depolarization of the postsynaptic membrane potential at several synaptic locations; alternatively, heterosynaptic crosstalk could be implemented by signaling molecules. Note that since our neuron model is a point neuron, all synapses are neighbors of each other. In our model, the ‘heterosynaptic’ term has a negative sign which ensures that the weights cannot grow without bounds (Materials and methods). In this sense, the third term has a ‘homeostatic’ function (Zenke and Gerstner, 2017), yet acts on a time scale faster than experimentally observed homeostatic synaptic plasticity (Turrigiano et al., 1998).
We ask whether such a biologically plausible plasticity rule enables adultborn DGCs to be integrated in an existing network of mature cells. To address this question, we exploit two observations (Figure 1a): first, the effect of interneurons onto newborn DGCs exhibits a GABAswitch from excitatory to inhibitory after about three weeks of maturation (Ge et al., 2006; Deng et al., 2010) and, second, newborn DGCs receive input from interneurons early in their maturation (before the third week), but project back to interneurons only later (Temprana et al., 2015). For simplicity, no plasticity rule was implemented within the dentate gyrus: connections between newborn DGCs and inhibitory cells are either absent or present with a fixed value (see below). However, before integration of adultborn DGCs can be addressed, an adultstage network where mature cells already store some memories has to be constructed.
Mature neurons represent prototypical input patterns
In an adultstage network, some mature cells already have a functional role. Hence, we start with a network that already has strong random ECtoDGC connection weights (Materials and methods). We then pretrain our network of 100 DGCs using the same learning rule (Equation (1), with identical learning rate η for all DGCs) that we will use later for the integration of newborn cells. For the stimulation of EC cells, we apply patterns representing thousands of handwritten digits in different writing styles from MNIST, a standard data set in artificial intelligence (Lecun et al., 1998). Even though we do not expect EC neurons to show a twodimensional arrangement, the use of twodimensional patterns provides a simple way to visualize the activity of all 144 EC neurons in our model (Figure 1d). We implicitly model feedforward inhibition from PVINs (Groisman et al., 2020; Figure 1—figure supplement 1) by normalizing input patterns so that all inputs have the same amplitude (Materials and methods). Below, we present results for a representative combination of three digits (digits 3, 4, and 5), but other combinations of digits have also been tested (Supplementary file 1).
After pretraining with patterns from digits 3 and 4 in a variety of writing styles, we examine the receptive field of each DGC. Each receptive field, consisting of the connections from all 144 EC neurons onto one DGC, is characterized by its spatial structure (i.e. the pattern of connection weights) and its total strength (i.e. the efficiency of the optimal stimulus to drive the cell). We observe that out of the 100 DGCs, some have developed spatial receptive fields that correspond to different writing styles of digit 3, others receptive fields that correspond to variants of digit 4 (Figure 1e).
Behavioral discrimination has been shown to be correlated with classification accuracy based on DGC population activity (Woods et al., 2020). Hence, to quantify the representation quality, we compute classification performance by a linear classifier that is driven by the activity of our 100 DGC model cells (Materials and methods). At the end of pretraining, the classification performance for patterns of digits 3 and 4 from a distinct test set not used during pretraining is high: 99.25% (classification performance on digit 3: 98.71%; digit 4: 99.80%), indicating that nearly all input patterns of the two digits are well represented by the network of mature DGCs. The median classification performance for 10 random combinations of two groups of pretrained digits is 98.54%, the 25th percentile 97.26%, and the 75th percentile 99.5% (Supplementary file 1).
A detailed mathematical analysis (Materials and methods) shows that heterosynaptic plasticity in Equation (1) ensures that the total strength of the receptive field of each selective DGC converges to a stable value which is similar for selective DGCs confirming the homeostatic function of heterosynaptic plasticity (Zenke and Gerstner, 2017). As a consequence, synaptic weights are intrinsically bounded without the need to impose hard bounds on the weight dynamics. Moreover, we find that the spatial structure of the receptive field represents the weighted average of all those input patterns for which that DGC is responsive. The mathematical analysis also shows that those DGCs that do not develop selectivity have weak synaptic connections and a very low total strength of the receptive field.
After convergence of synaptic weights during pretraining, selective DGCs are considered mature cells. Mature cells are less plastic than newborn cells (SchmidtHieber et al., 2004; Ge et al., 2007). So in the following, unless specified otherwise, we set $\eta =0$ in Equation (1) for mature cells (feedforward connection weights from EC to mature cells remain therefore fixed). A scenario where mature cells retain synaptic plasticity is also investigated (see Robustness of the model and Supplementary file 4). Some DGCs did not develop any strong weight patterns during pretraining and exhibit unselective receptive fields (highlighted in red in Figure 1e). We classify these as unresponsive units.
Newborn neurons become selective for novel patterns during maturation
In our main neurogenesis model, we replace unresponsive model units by plastic newborn DGCs ($\eta >0$ in Equation (1)), which receive lateral GABAergic input but do not receive feedforward input yet (all weights from EC are set to zero). The replacement of unresponsive neurons reflects the fact that unresponsive units have weak synaptic connections and, experimentally, a lack of NMDA receptor activation has been shown to be deleterious for the survival of newborn DGCs (Tashiro et al., 2006). To mimic exposure of an animal to a novel set of stimuli, we now add input patterns from digit 5 to the set of presented stimuli, which was previously limited to patterns of digits 3 and 4. The novel patterns from digit 5 are randomly interspersed into the sequence of patterns from digits 3 and 4; in other words, the presentation sequence was not optimized with a specific goal in mind.
We postulate that functional integration of newborn DGCs requires the twostep maturation process caused by the GABAswitch from excitation to inhibition. Since excitatory GABAergic input potentially increases correlated activity within the dentate gyrus network, we predict that newborn DGCs respond to familiar stimuli during the early phase of maturation, but not during the late phase, when inhibitory GABAergic input leads to competition.
To test this hypothesis, our model newborn DGCs go through two maturation phases (Materials and methods). The early phase of maturation is cooperative because, for each pattern presentation, activated mature DGCs indirectly excite the newborn DGCs via GABAergic interneurons. We assume that in natural settings, the activation of ${\text{GABA}}_{A}$ receptors is low enough that the mean membrane potential remains below the chloride reversal potential at which shunting inhibition would be induced (Heigele et al., 2016). In this regime, the net effect of synaptic activity is hence excitatory. This lateral activation of newborn DGCs drives the growth of their receptive fields in a direction similar to those of the currently active mature DGCs. Consistent with our hypothesis we find that, at the end of the early phase of maturation, newborn DGCs show a receptive field corresponding to a mixture of several input patterns (Figure 2a).
In the late phase of maturation, model newborn DGCs receive inhibitory GABAergic input from interneurons, similar to the input received by mature DGCs. Given that at the end of the early phase, newborn DGCs have receptive fields similar to those of mature DGCs, lateral inhibition induces competition with mature DGCs for activation during presentation of patterns from the novel digit. Because model newborn DGCs start their late phase of maturation with a higher excitability (lower threshold) compared to mature DGCs, consistent with observed enhanced excitability of newborn cells (SchmidtHieber et al., 2004; Li et al., 2017), the activation of newborn DGCs is facilitated for those input patterns for which no mature DGC has preexisting selectivity. Therefore, in the late phase of maturation, competition drives the synaptic weights of most newborn DGCs toward receptive fields corresponding to different subcategories of the ensemble of input patterns of the novel digit 5 (Figure 2b).
The total strength of the receptive field of a given DGC can be characterized by the sum of the squared synaptic weights of all feedfoward projections onto the cell (i.e. the square of the L2norm). During maturation, the L2norm of the feedforward weights onto newborn DGCs increases (Figure 2e) indicating an increase in total glutamatergic innervation, e.g., through an increase in the number and size of spines (Zhao et al., 2006). Nevertheless, the distribution of firing rates of newborn DGCs is shifted to lower values at the end of the late phase compared to the end of the early phase of maturation (Figure 2c,d), consistent with in vivo calcium imaging recordings showing that newborn DGCs are more active than mature DGCs (Danielson et al., 2016).
We emphasize that upon presentation of a pattern of a given digit, only those DGCs with a receptive field similar to the specific writing style of the presented pattern become strongly active, others fire at a medium firing rate, yet others at a low rate (Figure 2g). As a consequence, the firing rate of a particular newborn DGC at the end of its maturation to a pattern from digit 5 is strongly modulated by the specific choice of stimulation pattern within the class of ‘5’s. Analogous results are obtained for patterns from pretrained digits 3 and 4 (Figure 2—figure supplement 1). Hence, the ensemble of DGCs is effectively performing pattern separation within each digit class as opposed to a simple ternary classification task. The selectivity of newborn DGCs develops during maturation. Indeed, during the late, competitive, phase, the percentage of active newborn DGCs decreases, both upon presentation of familiar patterns (digits 3 and 4), as well as upon presentation of novel patterns (digit 5) (Figure 2f). This reflects the development of the selectivity of our model newborn DGCs from broad to narrow tuning, consistent with experimental observations (MarínBurgin et al., 2012; Danielson et al., 2016).
If two novel ensembles of digits (instead of a single one) are introduced during maturation of newborn DGCs, we observe that some newborn DGCs become selective for one of the novel digits, while others become selective for the other novel digit (Figure 2—figure supplement 2). This was expected, since we have found earlier that DGCs are becoming selective for different prototype writing styles even within a digit category; hence introducing several additional digit categories of novel patterns simply increases the prototype diversity. Therefore, newborn DGCs can ultimately promote separation of several novel overarching categories of patterns, no matter if they are learned simultaneously or sequentially (Figure 2—figure supplement 2).
Adultborn neurons promote better discrimination
As above, we compute classification performance of our model network as a surrogate for behavioral discrimination (Woods et al., 2020). At the end of the late phase of maturation of newborn DGCs, we obtain an overall classification performance of 94.56% for the three ensembles of digits (classification performance for digit 3: 90.50%; digit 4: 98.17%; digit 5: 95.18%). Confusion matrices show that although novel patterns are not well classified at the end of the early phase of maturation (Figure 3e), they are as well classified as pretrained patterns at the end of the late phase of maturation (Figure 3f).
We compare this performance with that of a network where all three digit ensembles are directly simultaneously pretrained starting from random weights (Figure 3a, control 1). In this case, the overall classification performance is 92.09% (classification performance for digit 3: 86.83%; digit 4: 98.78%; digit 5: 90.70%). The confusion matrix shows that all three digits are decently classified, but with an overall lower performance (Figure 3d). Across 10 simulation experiments, classification performance is significantly higher when a novel ensemble of patterns is learned sequentially by newborn DGCs (P_{2}; Supplementary file 1), than if all patterns are learned simultaneously (P_{1}; Supplementary file 1). Indeed, the distribution of ${P}_{2}{P}_{1}$ for the 10 simulation experiments has a mean which is significantly different from zero (Wilcoxon signed rank test: pval = 0.0020, Wilcoxon signed rank = 55; oneway ttest: pval = 0.0269, tstat = 2.6401, df = 9; Supplementary file 1).
The GABAswitch guides learning of novel representations
To assess whether maturation of newborn DGCs promotes learning of a novel ensemble of digit patterns, we compare our results with two control models without neurogenesis (controls 2 and 3).
In control 2, similar to the neurogenesis case, the feedforward weights and thresholds of mature DGCs are fixed (learning rate $\eta =0$) after pretraining with patterns from digits 3 and 4, while the thresholds and weights of all unresponsive neurons remain plastic ($\eta >0$) upon introduction of patterns from the novel digit 5. The only differences to the model with neurogenesis are that unresponsive neurons: (1) keep their feedforward weights (i.e. no reinitialization to zero values) and (2) keep the same connections from and to inhibitory neurons. In this case, we find that the previously unresponsive DGCs do not become selective for the novel digit 5, no matter during how many epochs patterns are presented (we went up to 100 epochs) (Figure 3b, control 2). Therefore, if patterns from digit 5 are presented to the network, the model fails to discriminate them from the previously learned digits 3 and 4: the overall classification performance is 81.69% (classification performance for digit 3: 85.94%; digit 4: 97.56%; digit 5: 59.42%). This result suggests that integration of newborn DGCs is beneficial for sequential learning of novel patterns.
In control 3, all DGCs keep plastic feedforward weights (learning rate $\eta >0$) after pretraining and introduction of the novel digit 5, no matter if they became selective or not for the pretrained digits 3 and 4. We observe that in the case where all neurons are plastic, learning of the novel digit induces a change in selectivity of mature neurons. Several DGCs switch their selectivity to become sensitive to the novel digit (Figure 3c), while none of the previously unresponsive units becomes selective for presented patterns (compare with Figure 1e). In contrast to the model with neurogenesis, we observe a drop in classification performance to 90.92% (classification performance for digit 3: 85.45%; digit 4: 98.37%; digit 5: 88.90%). We find that the classification performance for digit 3 is the one which decreases the most. This is due to the fact that many DGCs previously selective for digit 3 modified their weights to become selective for digit 5. Importantly, the more novel patterns are introduced, the more overwriting of previously stored memories occurs. Hence, if all DGCs remain plastic, discrimination between a novel pattern and a familiar pattern stored long ago is impaired.
Maturation of newborn neurons shapes the representation of novel patterns
Since each input pattern stimulates slightly different, yet overlapping, subsets of the 100 model DGCs in a sparse code such that about 20 DGCs respond to each pattern (Figure 2g), there is no simple onetoone assignment between neurons and patterns. In order to visualize the activity patterns of the ensemble of DGCs, we perform dimensionality reduction. We construct a twodimensional space using the activity patterns of the network at the end of the late phase of maturation of newborn DGCs trained with ‘3’s, ‘4’s and ‘5’s. One axis connects the center of mass (in the 100dimensional activity space) of all DGC responses to ‘3’s with all responses to ‘5’s (arbitrarily called ‘axis 1’) and the other axis those from ‘4’s to ‘5’s (arbitrarily called ‘axis 2’). We then project the activity of the 100 model DGCs upon presentation of MNIST testing patterns onto those two axes, both at the end of the early and late phase of maturation of newborn DGCs (Materials and methods). Each twodimensional projection is illustrated by a dot whose color corresponds to the digit class of the presented input pattern (blue for digit 3, green for digit 4, red for digit 5). Different input patterns within the same digit class cause different activation patterns of the DGCs, as depicted by extended clouds of dots of the same color (Figure 4a,b). Interestingly, an example pattern of a ‘5’ that is visually similar to a ‘4’ (characterized by the green cross) yields a DGC representation that lies closer to other ‘4’s (green cloud of dots) than to typical ‘5’s (red cloud of dots) (Figure 4b). Noteworthy the separation of the representation of ‘5’s from ‘3’s and ‘4’s is better at end of the late phase (Figure 4b) when compared to the end of the early phase of maturation (Figure 4a). For instance, even though the pattern ‘5’ corresponding to the orange cross is represented close to representations of ‘4’s at the end of the early phase of maturation (green cloud of dots, Figure 4a), it is represented far from any ‘3’s and ‘4’s at the end of maturation (Figure 4b). The expansion of the representation of ‘5’s into a previously empty subspace evolves as a function of time during the late phase of maturation (Figure 4d).
Robustness of the model
Our results are robust to changes in network architecture. As mentioned earlier, neither the exact number of GABAergic neurons (Supplementary file 2), nor that of DGCs is critical. Indeed, a larger network with 700 DGCs, thus mimicking the anatomically observed expansion factor of about 5 between EC and dentate gyrus (all other parameters unchanged), yields similar results (Supplementary file 3).
In the network with 700 DGCs, 275 cells remain unresponsive after pretraining with digits 3 and 4. In line with our earlier approach in the network with 100 DGCs, we can algoritmically replace all unresponsive neurons with newborn DGCs before patterns of digit 5 are added. Upon maturation, newborn DGC receptive fields provide a detailed representation of the prototypes of the novel digit 5 (Figure 4—figure supplement 1) and good classification performance is obtained (Supplementary file 3). Interestingly, due to the randomness of the recurrent connections, some newborn DGCs become selective for particular prototypes of the familiar (pretrained) digits 3 and 4 that are not already extensively represented by the network (see newborn DGCs selective for digit 4 highlighted by magenta squares in Figure 4—figure supplement 1).
As an alternative to replacing all unresponsive cells simultaneously, we can also replace only a fraction of them by newborn cells so as to simulate a continuous turnover of cells. For example, if 119 of the 275 unresponsive cells are replaced by newborn DGCs before the start of presentations of digit 5, then these 119 cells become selective for different writing styles and generic features of the novel digit 5 (Figure 4—figure supplement 2) and allow a good classification performance of all three digits. On the other hand, replacing only 35 of the 275 unresponsive cells is not sufficient (Supplementary file 3). In an even bigger network with more than 144 EC cells and more than 700 DGCs, we could choose to replace 1% of the total DGC population per week by newborn cells, consistent with biology (van Praag et al., 1999; Cameron and McKay, 2001). Importantly, if only a small fraction of unresponsive cells are replaced at a given moment, other unresponsive cells remain available to be replaced later by newborn DGCs that are then ready to learn new stimuli.
Interestingly, the timing of the introduction of the novel stimulus is important. In our main neurogenesis model with 100 DGCs, we introduce the novel digit 5 at the beginning of the early phase of maturation, which consists of one epoch of MNIST training patterns (all patterns are presented once). If the novel digit is only introduced in the middle of the early phase (half epoch), it cannot be properly learned (classification performance for digit 5: 46.52%). However, if introduced after threeeights or onequarter of the early phase, the novel digit can be picked out (classification performance for digit 5: 93.61% and 94.17%, respectively). We thus observe an increase in performance the earlier the novel digit is introduced after cell birth (classification performance for digit 5 was 95.18% when introduced at the beginning of the early phase of maturation). Therefore, our model predicts that a novel stimulus has to be introduced early enough with respect to newborn DGC maturation to be well discriminated and that the accuracy of discrimination is better the earlier it is introduced.
This could lead to an online scenario of our model, where adultborn DGCs are produced every day and different classes of novel patterns are introduced at different timepoints. To understand whether newborn DGCs in their early and late phase of maturation would interfere, two aspects should be kept in mind. First, since model newborn DGCs in the early phase of maturation do not project to other neurons yet, they do not influence the circuit and thus do not affect maturation of other newborn DGCs. Second, since model newborn DGCs in the late phase of maturation project to GABAergic neurons in the dentate gyrus, they will, just like mature cells, indirectly activate newborn DGCs that are in their early phase of maturation. As a result, early phase newborn DGCs will develop receptive fields that represent an average of all the stimuli that excite the mature and late phase newborn DGCs, which indirectly activate them. The ultimate selectivity of newborn DGCs is determined after the GABAswitch, when competition sets in, which makes those cells that have recently switched most sensitive to aspects of the input patterns that are not yet well represented by other cells. Therefore, in an online scenario, different model newborn DGCs would become selective for different novel patterns according to both their maturation stage with respect to presentation of the novel patterns, and the selectivity of mature and late phase newborn DGCs which indirectly activate them.
Finally, in our neurogenesis model, we have set the learning rate of mature DGCs to zero despite the observation that mature DGCs retain some plasticity (SchmidtHieber et al., 2004; Ge et al., 2007). We therefore studied a variant of the model in which mature DGCs also exhibit plasticity. First, we used our main model with 100 DGCs and 21 newborn DGCs. The implementation was identical, except that the learning rate of the mature DGCs was kept at a nonzero value during the maturation of the 21 newborn DGCs. We do not observe a large change in classification performance, even if the learning rate of the mature cells is the same as that of newborn cells (Supplementary file 4). Second, we used our extended network with 700 DGCs to be able to investigate the effect of plastic mature DGCs while having a proportion of newborn cells matching experiments. We find that with 35 newborn DGCs (corresponding to the experimentally reported fraction of about 5%), plastic mature DGCs (with a learning rate half of that of newborn cells) improve classification performance (Supplementary file 4). This is due to the fact that several of the mature DGCs (that were previously selective for ‘3’s or ‘4’s) become selective for prototypes of the novel digit 5. Consequently, more than the 35 newborn DGCs specialize for digit 5, so that digit 5 is eventually represented better by the network with mature cell plasticity than the standard network where plasticity is limited to newborn cells. Note that those mature DGCs that had earlier specialized on writing styles of digit 3 or 4 similar to a digit 5 are most likely to retune their selectivity. If the novel inputs were very distinct from the pretrained familiar inputs, mature DGCs would be unlikely to develop selectivity for the novel inputs.
Newborn DGCs become selective for similar novel patterns
To investigate whether our theory for integration of newborn DGCs can explain why adult dentate gyrus neurogenesis promotes discrimination of similar stimuli, but does not affect discrimination of distinct patterns (Clelland et al., 2009; Sahay et al., 2011a), we use a simplified competitive winnertakeall network (Materials and methods). It contains only as many DGCs as trained clusters, and the GABAergic inhibitory neurons are implicitly modeled through direct DGCtoDGC inhibitory connections. DGCs are either silent or active (binary activity state, while in the detailed network DGCs had continuous firing rates). The synaptic plasticity rule is however the same as for the detailed network, with different parameter values (Materials and methods). We also construct an artificial data set (Figure 5a,b) that allows us to control the similarity $s$ of pairs of clusters (Materials and methods). The MNIST data set is not appropriate to distinguish similar from dissimilar patterns, because all digit clusters are similar and highly overlapping, reflected by a high within cluster dispersion (e.g. across the set of all ‘3’) compared to the separation between clusters (e.g. typical ‘3’ versus typical ‘5’).
After a pretraining period, a first mature DGC responds to patterns of cluster 1 and a second mature DGC to those of cluster 2 (Figure 5e,f). We then fix the feedforward weights of those two DGCs and introduce a newborn DGC in the network. Thereafter, we present patterns from three clusters (the two pretrained ones, as well as a novel one), while the plastic feedforward weights of the newborn DGC are the only ones that are updated. We observe that the newborn DGC ultimately becomes selective for the novel cluster if it is similar ($s=0.8$) to the two pretrained clusters (Figure 5i), but not if it is distinct ($s=0.2$, Figure 5j). The selectivity develops in two phases. In the early phase of maturation of the newborn model cell, a pattern from the novel cluster that is similar to one of the pretrained clusters activates the mature DGC that has a receptive field closest to the novel pattern. The activated mature DGC drives the newborn DGC via lateral excitatory GABAergic connections to a firing rate where LTP is triggered at active synapses onto the newborn DGC. LTP also happens when a pattern from one of the pretrained clusters is presented. Thus, synaptic plasticity leads to a receptive field that reflects the average of all stimuli from all three clusters (Figure 5g).
To summarize our findings in a more mathematical language, we characterize the receptive field of the newborn cell by the vector of its feedforward weights. Analogous to the notion of a firing rate vector that represents the set of firing rates of an ensemble of neurons, the feedforward weight vector represents the set of weights of all synapses projecting onto a given neuron (Figure 1b). In the early phase of maturation, for similar clusters, the feedforward weight vector onto the newborn DGC grows in the direction of the center of mass of all three clusters (the two pretrained ones and the novel one), because for each pattern presentation, be it a novel pattern or a familiar one, one of the mature DGCs becomes active and stimulates the newborn cell (compare Figure 5g and Figure 5k). However, if the novel cluster has a low similarity to pretrained clusters, patterns from the novel cluster do not activate any of the mature DGCs. Therefore, the receptive field of the newborn cell reflects the average of stimuli from the two pretrained clusters only (compare Figure 5h and Figure 5l).
As a result of the different orientation of the feedforward weight vector onto the newborn DGC at the end of the early phase of maturation, two different situations arise in the late phase of maturation, when lateral GABAergic connections are inhibitory. If the novel cluster is similar to the pretrained clusters, the weight vector onto the newborn DGC at the end of the early phase of maturation lies at the center of mass of all the patterns across the three clusters. Thus, it is closer to the novel cluster than the weight vector onto either of the mature DGCs (Figure 5g). So if a novel pattern is presented, the newborn DGC wins the competition between the three DGCs, and its feedforward weight vector moves toward the center of mass of the novel cluster (Figure 5i). By contrast, if the novel cluster is distinct, the weight vector onto the newborn DGC at the end of the early phase of maturation is located at the center of mass of the two pretrained clusters (Figure 5h). If a novel pattern is presented, no output unit is activated since their receptive fields are not similar enough to the input pattern. Therefore, the newborn DGC always stays silent and does not update its feedforward weights (Figure 5j). These results are consistent with studies that have suggested that dentate gyrus is only involved in the discrimination of similar stimuli, but not distinct stimuli (Gilbert et al., 2001; Hunsaker and Kesner, 2008). For discrimination of distinct stimuli, another pathway might be used, such as the direct EC to CA3 connection (Yeckel and Berger, 1990; Fyhn et al., 2007).
In conclusion, our model suggests that adult dentate gyrus neurogenesis promotes discrimination of similar patterns because newborn DGCs can ultimately become selective for novel stimuli, which are similar to already learned stimuli. On the other hand, newborn DGCs fail to represent novel distinct stimuli, precisely because they are too distinct from other stimuli already represented by the network. Presentation of novel distinct stimuli in the late phase of maturation therefore does not induce synaptic plasticity of the newborn DGC feedforward weight vector toward the novel stimuli. In the simplified network, the transition between similar and distinct can be determined analytically (Materials and methods). This analysis clarifies the importance of the switch from cooperative dynamics (excitatory interactions) in the early phase to competitive dynamics (inhibitory interactions) in the late phase of maturation.
Upon successful integration the receptive field of a newborn DGC represents an average of novel stimuli
With the simplified model network, it is possible to analytically compute the maximal strength of the DGC receptive field via the L2norm of the feedforward weight vector onto the newborn DGC (Materials and methods). In addition, the angle between the center of mass of the novel patterns and the feedforward weight vector onto the adultborn DGC can also be analytically computed (Materials and methods). To illustrate the analytical results and characterize the evolution of the receptive field of the newborn DGC, we thus examine the angle φ of the feedforward weight vector with the center of mass of the novel cluster (i.e. the average of the novel stimuli), as a function of maturation time (Figure 6b,c, Figure 6—figure supplement 1).
In the early phase of maturation, the feedforward weight vector onto the newborn DGC grows, while its angle with the center of mass of the novel cluster stays constant (Figure 6—figure supplement 1). In the late phase of maturation, the angle φ between the center of mass of the novel cluster and the feedforward weight vector onto the newborn DGC decreases in the case of similar patterns (Figure 6c, Figure 6—figure supplement 1), but not in the case of distinct patterns (Figure 6—figure supplement 1), indicating that the newborn DGC becomes selective for the novel cluster for similar but not for distinct patterns.
The analysis of the simplified model thus leads to a geometric picture that helps us to understand how the similarity of patterns influences the evolution of the receptive field of the newborn DGC before and after the switch from excitation to inhibition of the GABAergic input. For novel patterns that are similar to known patterns, the receptive field of a newborn DGC at the end of maturation represents the average of novel stimuli.
The cooperative phase of maturation promotes pattern separation for any dimensionality of input data
Despite the fact that input patterns in our model represent the activity of 144 or 128 model EC cells, the effective dimensionality of the input data was significantly below 100 because the clusters for different input classes were rather concentrated around their respective center of mass. We define the effective input dimensionality as the participation ratio (Mazzucato et al., 2016; LitwinKumar et al., 2017) (Materials and methods). Using this definition, the input data of both the MNIST 12 × 12 patterns from digits 3, 4, and 5 and the seven clusters of the handmade dataset for similar patterns ($s=0.8$) are relatively lowdimensional ($PR=19$ of a maximum of 144, and $PR=11$ of a maximum of 128, respectively). We emphasize that in both cases the spread of the input data around the cluster center implies that the effective dimensionality is larger than the number of clusters. In natural settings, we expect the input data to have even higher dimension. Therefore, here we investigate the effect of dimensionality of the input data on our neurogenesis model by increasing the spread around the cluster centers.
We use our simplified network model and create similar artificial datasets ($s=0.8$) with different values for the concentration parameter κ (Materials and methods). The smaller the κ, the broader the distributions around their center of mass; hence, the larger the overlap of patterns generated from different cluster distributions. Therefore, we can increase the effective dimensionality of the input by decreasing the concentration parameter κ. First, as expected from our analytical analysis (Materials and methods), we find that the broader the cluster distributions the smaller the length of the feedforward weight vector onto newborn DGCs (from just below 1.5 with $\kappa ={10}^{4}$ to about 1.35 with $\kappa =6\cdot {10}^{2}$). Second, we examine the ability of the simplified network to discriminate input patterns coming from input spaces with different dimensionalities. To do so, we compare our neurogenesis model (Neuro.) with a random initialization model (RandInitL.). In both cases, two DGCs are pretrained with patterns from two clusters, as above. Then we fix the weights of the two mature DGCs and introduce patterns from a third cluster as well as a newborn DGC. For the neurogenesis case, after maturation of the newborn DGC we fix its weights (while for the random initialization model we keep them plastic) upon introduction of patterns from a fourth cluster as well as another newborn DGC, and so on until the network contains seven DGCs and patterns from the full dataset of seven clusters have been presented. We compare our neurogenesis model, where each newborn DGC starts with zero weights and undergo a twophase maturation (one epoch per phase), with a random initialization model where each newborn DGC is directly fully integrated into the circuit and whose feedforward weight vector is randomly initialized with a length of 0.1 (RandInitL.) and is then learned for two epochs.
Since clusters can be highly overlapping, we assess discrimination performance by computing the reconstruction error at the end of training. Reconstruction error is evaluated analogously to classification error, except that the readout layer has the task of an autoencoder: it contains as many readout units as there are input units. Reconstruction error is the mean squared distance between the input vector and the reconstructed output vector based on testing patterns. We observe that for any dimensionality of the input space, even as high as 97dimensional, the neurogenesis model performs better (has a lower total reconstruction error) than the random initialization model (Supplementary file 5). Indeed, in the neurogenesis case newborn DGCs grow their feedforward weights (from zero) in the direction of presented input patterns in their early cooperative phase of maturation and can later become selective for novel patterns during the competitive phase. In contrast, since the random initialization model has no early cooperative phase, the newborn DGC weight vector does not grow unless an input pattern is by chance well aligned with its randomly initialized weight vector (which is unlikely in a highdimensional input space). We get similar results for a larger initialization of the synaptic weights (e.g. the length of the weight vector at birth is set to 1, results not shown). Importantly, in high input dimensions, the advantage of a larger weight vector length at birth in the random initialization model is overridden by the capability of newborn DGCs to grow their weight vector in the appropriate direction during their early cooperative phase of maturation. Finally, we note that even if the length of the feedforward weight vector onto newborn DGCs is set to 1.5 (RandInitH., Supplementary file 5), which is the upper bound according to our analytical results (Materials and methods), the random initialization model performs worse than the neurogenesis model for low up to relatively highdimensional input spaces ($PR=83$, Supplementary file 5) despite its advantage in the competition conferred by the longer weight vector. It is only when input clusters are extremely broad and overlapping that the random initialization model performs similarly to the neurogenesis model ($PR=90,97$, Supplementary file 5). In other words, a random initialization at full length of weight vectors works well if input data is homogeneously distributed on the positive quadrant of the unit sphere but fails if the input data is clustered in a few directions. Moreover, random initialization requires that synaptic weights are large from the start which is biologically not plausible. In summary, the twophase neurogenesis model is advantageous because the feedforward weights onto newborn cells can start at arbitrarily small values; their growth is, during the cooperative phase, guided to occur in a direction that is relevant for the task at hand; the final competitive phase eventually enables specialization onto novel inputs.
Discussion
While experimental studies, such as manipulating the ratio of NKCC1 to KCC2, suggest that the switch from excitation to inhibition of the GABAergic input onto adultborn DGCs is crucial for their integration into the preexisting circuit (Ge et al., 2006; Alvarez et al., 2016) and that adult dentate gyrus neurogenesis promotes pattern separation (Clelland et al., 2009; Sahay et al., 2011a; Jessberger et al., 2009), the link between channel properties and behavior has remained puzzling (Sahay et al., 2011b; Aimone et al., 2011). Our modeling work shows that the GABAswitch enables newborn DGCs to become selective for novel stimuli, which are similar to familiar, alreadystored, representations, consistent with the experimentally observed function of pattern separation (Clelland et al., 2009; Sahay et al., 2011a; Jessberger et al., 2009).
Previous modeling studies already suggested that newborn DGCs integrate novel inputs into the representation in dentate gyrus (Chambers et al., 2004; Becker, 2005; Crick and Miranker, 2006; Wiskott et al., 2006; Chambers and Conroy, 2007; Appleby and Wiskott, 2009; Aimone et al., 2009; Weisz and Argibay, 2009; Temprana et al., 2015; Finnegan and Becker, 2015; DeCostanzo et al., 2018). However, our work differs from them in four important aspects. First of all, we implement an unsupervised biologically plausible plasticity rule, while many studies used supervised algorithmic learning rules (Chambers et al., 2004; Becker, 2005; Chambers and Conroy, 2007; Weisz and Argibay, 2009; Finnegan and Becker, 2015; DeCostanzo et al., 2018). Second, as we model the formerly neglected GABAswitch, the connection weights from EC to newborn DGCs are grown from small values through cooperativity in the early phase of maturation. This integration step was mostly bypassed in earlier models by initialization of the connectivity weights toward newborn DGCs to random, yet fully grown values (Crick and Miranker, 2006; Aimone et al., 2009; Weisz and Argibay, 2009; Finnegan and Becker, 2015). Third, as the dentate gyrus network is commonly modeled as a competitive network, weight normalization is crucial. In our framework, competition occurs during the late phase of maturation. Previous modeling works either applied algorithmic weight normalization or hard bounds on the weights at each iteration step (Crick and Miranker, 2006; Aimone et al., 2009; Weisz and Argibay, 2009; Temprana et al., 2015; Finnegan and Becker, 2015). Instead, our plasticity rule includes heterosynaptic plasticity, which intrinsically softly bounds connectivity weights by a homeostatic effect. Finally, although some earlier computational models of adult dentate gyrus neurogenesis could explain the pattern separation abilities of newborn cells, separation was obtained independently of the similarity between the stimuli. Contrarily to experimental data, no distinction was made between similar and distinct patterns (Chambers et al., 2004; Becker, 2005; Crick and Miranker, 2006; Wiskott et al., 2006; Chambers and Conroy, 2007; Aimone et al., 2009; Appleby and Wiskott, 2009; Weisz and Argibay, 2012; Temprana et al., 2015; Finnegan and Becker, 2015; DeCostanzo et al., 2018). To our knowledge, we present the first model that can explain both (1) how adultborn DGCs integrate into the preexisting network and (2) why they promote pattern separation of similar stimuli and not distinct stimuli.
Our work emphasizes why a twophase maturation of newborn DGCs is beneficial for proper integration in the preexisting network. From a computational perspective, the early phase of maturation, when GABAergic inputs onto newborn DGCs are excitatory, corresponds to cooperative unsupervised learning. Therefore, the synapses grow in the direction of patterns that indirectly activate the newborn DGCs via GABAergic interneurons (Figure 6a). At the end of the early phase of maturation, the receptive field of a newborn DGC represents the center of mass of all input patterns that led to its (indirect) activation. In the late phase of maturation, GABAergic inputs onto newborn DGCs become inhibitory, so that lateral interactions change from cooperation to competition, causing a shift of the receptive fields of the newborn DGCs toward novel features (Figure 6b). At the end of maturation, newborn DGCs are thus selective for novel inputs. This integration mechanism is in agreement with the experimental observation that newborn DGCs are broadly tuned early in maturation, yet highly selective at the end of maturation (MarínBurgin et al., 2012; Danielson et al., 2016). Loosely speaking, the cooperative phase of excitatory GABAergic input promotes the growth of the synaptic weights coarsely in the relevant direction, whereas the competitive phase of inhibitory GABAergic input helps to specialize on detailed, but potentially important differences between patterns.
In the context of theories of unsupervised learning, the switch of lateral GABAergic input to newborn DGCs from excitatory to inhibitory provides a biological solution to the ‘problem of unresponsive units’ (Hertz et al., 1991). Unsupervised competitive learning has been used to perform clustering of input patterns into a few categories (Rumelhart and Zipser, 1985; Grossberg, 1987; Kohonen, 1989; Hertz et al., 1991; Du, 2010). Ideally, after learning of the feedforward weights between an input layer and a competitive network, input patterns that are distinct from each other activate different neuron assemblies of the competitive network. After convergence of competitive Hebbian learning, the vector of feedforward weights onto a given neuron points to the center of mass of the cluster of input patterns for which it is selective (Kohonen, 1989; Hertz et al., 1991). Yet, if the synaptic weights are randomly initialized, it is possible that the set of feedforward weights onto some neurons of the competitive network point in a direction ‘quasiorthogonal’ (Materials and methods) to the subspace of the presented input patterns. Therefore, those neurons, called ‘unresponsive units’, will never get active during pattern presentation. Different learning strategies have been developed in the field of artificial neural networks to avoid this problem (Grossberg, 1976; Bienenstock et al., 1982; Rumelhart and Zipser, 1985; Grossberg, 1987; DeSieno, 1988; Kohonen, 1989; Hertz et al., 1991; Du, 2010). However, most of these algorithmic approaches lack a biological interpretation. In our model, weak synapses onto newborn DGCs form spontaneously after neuronal birth. The excitatory GABAergic input in the early phase of maturation drives the growth of the synaptic weights in the direction of the subspace of presented patterns that succeed in activating some of the mature DGCs. Hence, the early cooperative phase of maturation can be seen as a smart initialization of the synaptic weights onto newborn DGCs, close enough to novel patterns so as to become selective for them in the late competitive phase of maturation. However, the cooperative phase is helpful only if the novel patterns are similar to the input statistics defined by the set of known (familiar) patterns.
Our results are in line with the classic view that dentate gyrus is responsible for decorrelation of inputs (Marr, 1969; Albus, 1971; Marr, 1971; Rolls and Treves, 1998), a necessary step for differential storage of similar memories in CA3, and with the observation that dentate gyrus lesions impair discrimination of similar but not distinct stimuli (Gilbert et al., 2001; Hunsaker and Kesner, 2008). To discriminate distinct stimuli, another pathway might be involved, such as the direct EC to CA3 connection (Yeckel and Berger, 1990; Fyhn et al., 2007).
The parallel of neurogenesis in dentate gyrus and olfactory bulb suggests that similar mechanisms could be at work in both areas. Yet, even though adult olfactory bulb neurogenesis seems to have a similar functional role to adult dentate gyrus neurogenesis (Sahay et al., 2011b), follow a similar integration sequence and undergo a GABAswitch from excitatory to inhibitory, the circuits are different in several aspects. First, while newborn neurons in dentate gyrus are excitatory, newborn cells in the olfactory bulb are inhibitory. Second, the newborn olfactory cells start firing action potentials only once they are well integrated (Carleton et al., 2003). Therefore, in view of a transfer of results to the olfactory bulb, it would be interesting to adjust our model of adult dentate gyrus neurogenesis accordingly. For example, a voltagebased synaptic plasticity rule could be used to account for subthreshold plasticity mechanisms (Clopath et al., 2010).
Our model of transition from an early cooperative phase to a late competitive phase makes specific predictions, at the behavioral and cellular level. In our model, the early cooperative phase of maturation can only drive the growth of synaptic weights onto newborn cells if they are indirectly activated by mature DGCs through GABAergic input, which has an excitatory effect due to the high NKCC1/KCC2 ratio early in maturation. Therefore, our model predicts that NKCC1knockout mice would be impaired in discriminating similar contexts or objects because newborn cells stay silent due to lack of indirect activation. The feedforward weight vector onto newborn DGCs could not grow in the early phase and newborn DGCs could not become selective for novel inputs. Therefore, our model predicts that since newborn DGCs are poorly integrated into the preexisting circuit, they are unlikely to survive. If, however, in the same paradigm newborn cells are activated by lightinduced or electrical stimulation, we predict that they become selective to novel patterns. Thus discrimination abilities would be restored and newborn DGCs are likely to survive. Analogously, we predict that using inducible NKCC1knockout mice, animals would gradually be impaired in discrimination tasks after induced knockout and reach a stable maximum impairment about 3 weeks after the start of induced knockout.
Experimental observations support the importance of the switch from early excitation to late inhibition of the GABAergic input onto newborn DGCs. An absence of early excitation using NKCC1knockout mice has been shown to strongly affect synapse formation and dendritic development in vivo (Ge et al., 2006). Conversely, a reduction in inhibition in the dentate gyrus through decrease in KCC2 expression has been associated with epileptic activity (Pathak et al., 2007; Barmashenko et al., 2011). An analogous switch of the GABAergic input has been observed during development, and its proper timing has been shown to be crucial for sensorimotor gating and cognition (Wang and Kriegstein, 2011; Furukawa et al., 2017). In addition to early excitation and late inhibition, our theory also critically depends on the time scale of the switching process. In our model, the switch makes an instantaneous transition between early and late phase of maturation. Several experimental results have suggested that the switch is indeed sharp and occurs within a single day, both during development (Khazipov et al., 2004; Tyzio et al., 2007; Leonzino et al., 2016) and adult dentate gyrus neurogenesis (Heigele et al., 2016). Furthermore, in hippocampal cell cultures, expression of KCC2 is upregulated by GABAergic activity but not affected by glutamatergic activity (Ganguly et al., 2001). A similar process during adult dentate gyrus neurogenesis would increase the number of newborn DGCs available for representing novel features by advancing the timing of their switch. In this way, instead of a few thousands of newborn DGCs ready to switch (3–6% of the whole population [van Praag et al., 1999; Cameron and McKay, 2001], divided by 30 days), a larger fraction of newborn DGCs would be made available for coding, if appropriate stimulation occurs. Finally, while neurotransmitter switching has been observed following sustained stimulation for hours to days (Li et al., 2020), it is still unclear if it has the same functional role as the GABAswitch in our model. In particular, it remains an open question if neurotransmitter switching promotes the integration of neurons in the same way as our model GABAswitch does in the context of adult dentate gyrus neurogenesis.
To conclude, our theory for integration of adultborn DGCs suggests that newborn cells have a coding – rather than a modulatory – role during dentate gyrus pattern separation function. Our theory highlights the importance of GABAergic input in adult dentate gyrus neurogenesis and links the switch from excitation to inhibition to the integration of newborn DGCs into the preexisting circuit. Finally, it illustrates how Hebbian plasticity of EC to DGC synapses along with the switch make newborn cells suitable to promote pattern separation of similar but not distinct stimuli, a longstanding mystery in the field of adult dentate gyrus neurogenesis (Sahay et al., 2011b; Aimone et al., 2011).
Materials and methods
Network architecture and neuronal dynamics
Request a detailed protocolDGCs are the principal cells of the dentate gyrus. They mainly receive excitatory projections from the EC through the perforant path and GABAergic inputs from local interneurons, as well as excitatory input from Mossy cells. They project to CA3 pyramidal cells and inhibitory neurons, as well as local Mossy cells (Acsády et al., 1998; Henze et al., 2002; Amaral et al., 2007; Temprana et al., 2015; Figure 1—figure supplement 1). In our model, we omit Mossy cells for simplicity and describe the dentate gyrus as a competitive circuit consisting of ${N}_{DGC}$ DGCs and ${N}_{I}$ GABAergic interneurons (Figure 1b). The activity of ${N}_{EC}$ neurons in EC represents an input pattern $\overrightarrow{x}=({x}_{1},{x}_{2},\mathrm{\dots},{x}_{{N}_{EC}})$. Because the perforant path also induces strong feedforward inhibition in the dentate gyrus (Li et al., 2013), we assume that the effective EC activity is normalized, such that $\overrightarrow{x}=1$ for any input pattern $\overrightarrow{x}$ (Figure 1—figure supplement 1). We use $P$ different input patterns ${\overrightarrow{x}}^{\mu}$, $1\u2a7d\mu \u2a7dP$ in the simulations of the model.
In our network, model EC neurons have excitatory alltoall connections to the DGCs. In rodent hippocampus, spiking mature DGCs activate interneurons in dentate gyrus, which in turn inhibit other mature DGCs (Temprana et al., 2015; Alvarez et al., 2016). In our model, the DGCs are thus recurrently connected with inhibitory neurons (Figure 1b). Connections from DGCs to interneurons exist in our model with probability ${p}_{IE}$ and have a weight ${w}_{IE}$. Similarly, connections from interneurons to DGCs occur with probability ${p}_{EI}$ and have a weight ${w}_{EI}$. All parameters are reported in Table 1 (Biologically plausible network).
Before an input pattern is presented, all rates of model DGCs are initialized to zero. We assume that the DGCs have a frequency–current curve that is given by a rectified hyperbolic tangent (Dayan and Abbott, 2001), which is similar to the frequency–current curve of spiking neuron models with refractoriness (Gerstner et al., 2014). Moreover, we exploit the equivalence of two common firing rate equations (Miller and Fumarola, 2012) and let the firing rate ${\nu}_{i}$ of DGC $i$ upon stimulation with input pattern $\overrightarrow{x}$ evolve according to:
where ${[.]}_{+}$ denotes rectification: ${[a]}_{+}=a$ for $a>0$ and zero otherwise. Here, b_{i} is a firing threshold, $L$ is the smoothness parameter of the frequency–current curve (${L}^{1}$ is the slope of the frequency–current curve at the firing threshold), and ${I}_{i}$ the total input to cell $i$:
with x_{j} the activity of EC input neuron $j$, ${w}_{ij}\u2a7e0$ the feedforward weight from EC input neuron $j$ to DGC $i$, and ${w}_{ik}^{EI}$ the weight from inhibitory neuron $k$ to DGC $i$. The sum runs over all inhibitory neurons, but the weights are set to ${w}_{ik}^{EI}=0$ if the connection is absent. The firing rate ${\nu}_{i}$ is unitfree and normalized to a maximum of 1, which we interpret as a firing rate of 10 Hz. We take the synaptic weights as unitless parameters such that ${I}_{i}$ is also unitfree.
The firing rate ${\nu}_{k}^{I}$ of inhibitory neuron $k$, is defined as:
with ${p}^{*}$ a parameter which relates to the desired ensemble sparsity, and ${I}_{k}^{I}$ the total input toward interneuron $k$, given as:
with ${w}_{ki}^{IE}$ the weight from DGC $i$ to inhibitory neuron $k$. (We set ${w}_{ki}^{IE}=0$ if the connection is absent.) The feedback from inhibitory neurons ensures a sparse activity of model DGCs for each pattern. With ${p}^{*}=0.1$ we find that more than 70% of model DGCs are silent (firing rate < 1 Hz [Senzai and Buzsáki, 2017]) when an input pattern is presented, and less than 10% are highly active (firing rate > 1 Hz) (Figure 2c,d), consistent with the experimentally observed sparse activity in dentate gyrus (Chawla et al., 2005).
Plasticity rule
Projections from EC onto newborn DGCs exhibit Hebbian plasticity (SchmidtHieber et al., 2004; Ge et al., 2007; McHugh et al., 2007). Therefore, in our model, the connections from EC neurons to DGCs are plastic, following a Hebbian learning rule that exhibits LTD or LTP depending on the firing rate ${\nu}_{i}$ of the postsynaptic cell (Bienenstock et al., 1982; Artola et al., 1990; Sjöström et al., 2001; Pfister and Gerstner, 2006). Input patterns, ${\overrightarrow{x}}^{\mu}$, $1\u2a7d\mu \u2a7dP$, are presented in random order. For each input pattern, we let the firing rate converge for a time $T$ where $T$ was chosen long enough to achieve convergence to a precision of 10^{−6}. After $n1$ presentations (i.e. at time $(n1)\cdot T$), the weight vector has value ${w}_{ij}^{(n1)}$. We then present the next pattern and update at time $n\cdot T$ (${w}_{ij}^{(n)}={w}_{ij}^{(n1)}+\mathrm{\Delta}{w}_{ij}$), according to the following plasticity rule (Equation (1), written here for convenience):
where x_{j} is the firing rate of presynaptic EC input neuron $j$, ${\nu}_{i}$ the firing rate of postsynaptic DGC $i$, η the learning rate, θ marks the transition from LTD to LTP, and the relative strength α, γ of LTP and LTD depend on θ via $\alpha =\frac{{\alpha}_{0}}{{\theta}^{3}}>0$ and $\gamma ={\gamma}_{0}\theta >0$. The values of the parameters ${\alpha}_{0}$, ${\gamma}_{0}$, β, and θ are given in Table 1 (Biologically plausible network). The weights are hardbounded from below at 0, i.e., if Equation (1) leads to a new weight smaller than zero, ${w}_{ij}$ is set to zero. The first two terms of Equation (1) are a variation of the BCM rule (Bienenstock et al., 1982). The third term implements heterosynaptic plasticity (Chistiakova et al., 2014; Zenke and Gerstner, 2017) with three important features: first, heterosynaptic plasticity has a negative sign and therefore leads to synaptic depression; second, heterosynaptic plasticity sets in above a threshold (${\nu}_{i}>\theta $) that is the same threshold as that for LTP, so that if LTP occurs at some synapses LTD is induced at other synapses; third, above threshold the dependence upon the postsynaptic firing rate ${\nu}_{i}$ is supralinear. The interaction of the three different terms in the plasticity rule has several consequences. Because the first two terms of the plasticity rule are Hebbian (‘homosynaptic’) and proportional to the presynaptic activity x_{j}, the active DGCs (${\nu}_{i}>\theta $) update their feedforward weights in direction of the input pattern $\overrightarrow{x}$. Moreover, whenever LTP occurs at some synapses, all weights onto neuron $i$ are downregulated heterosynaptically by an amount that increases supralinearly with the postsynaptic rate ${\nu}_{i}$, implicitly controlling the length of the weight vector (see below) similar to synaptic homeostasis (Turrigiano et al., 1998) but on a rapid time scale (Zenke and Gerstner, 2017). Analogous to learning in a competitive network (Kohonen, 1989; Hertz et al., 1991), the vector of feedforward weights onto active DGCs will move toward the center of mass of the cluster of patterns they are selective for, as we will discuss now.
For a given input pattern ${\overrightarrow{x}}^{\mu}$, there are three fixed points for the postsynaptic firing rate: ${\nu}_{i}=0$, ${\nu}_{i}=\theta $, and ${\nu}_{i}={\widehat{\nu}}_{i}$ (the negative root is omitted because ${\nu}_{i}\u2a7e0$ due to Equation (2)). For ${\nu}_{i}<\theta $, there is LTD, so the weights move toward zero: ${w}_{ij}\to 0$, while for ${\nu}_{i}>\theta $, there is LTP, so the weights move toward ${w}_{ij}\to \frac{\gamma {x}_{j}^{\mu}}{\beta {\widehat{\nu}}_{i}^{2}}$ (Figure 1c). The value of ${\widehat{\nu}}_{i}$ is defined implicitly by the network Equations (2–5). If a pattern ${\overrightarrow{x}}^{\mu}$ is presented only for a short time these fixed points are not reached during a single pattern presentation.
Winners, losers, and quasiorthogonal inputs
Request a detailed protocolWe define the winners as the DGCs that become strongly active (${\nu}_{i}>\theta $) during presentation of an input pattern. Since the input patterns are normalized to have an L2norm of 1 (${\overrightarrow{x}}^{\mu}=1$ by construction), and the L2norm of the feedforward weight vectors is bounded (see Section Direction and length of the weight vector), the winning units are the ones whose weight vectors ${\overrightarrow{w}}_{i}$ (row of the feedforward connectivity matrix) align best with the current input pattern ${\overrightarrow{x}}^{\mu}$.
We emphasize that all synaptic weights and all presynaptic firing rates ${\nu}_{j}$ are nonnegative: ${w}_{ij}\u2a7e0$ and ${\nu}_{j}\u2a7e0$. Thus, both the weight vectors and the vectors of input firing rates live in the positive quadrant. The angle between an input pattern ${\overrightarrow{x}}^{\mu}$ and the weight vector ${\overrightarrow{w}}_{i}$ of neuron $i$ can be at most ninety degrees. We say that an input pattern ${\overrightarrow{x}}^{\mu}$ is ‘quasiorthogonal’ to a weight vector ${\overrightarrow{w}}_{i}$ if, in the stationary state, the input is not sufficient to activate neuron $i$, i.e., ${I}_{i}={\sum}_{j=1}^{{N}_{EC}}{w}_{ij}{x}_{j}+{\sum}_{k=1}^{{N}_{I}}{w}_{ik}^{EI}{\nu}_{k}^{I}<{b}_{i}$. If an input pattern ${\overrightarrow{x}}^{\mu}$ is quasiorthogonal to a weight vector ${\overrightarrow{w}}_{i}$, then neuron $i$ does not fire in response to ${\overrightarrow{x}}^{\mu}$ after the stimulus has been applied for a long enough time. Note that for a case without inhibitory neurons and with ${b}_{i}\to 0$, we recover the standard orthogonality condition, but for finite ${b}_{i}>0$ quasiorthogonality corresponds to angles larger than some reference angle.
Direction and length of the weight vector
Request a detailed protocolLet us denote the ensemble of patterns for which neuron $i$ is a winner by ${C}_{i}$ and call this the set of winning patterns (${C}_{i}=\{\mu {\nu}_{i}>\theta \}$). Suppose that neuron $i$ is quasiorthogonal to all other patterns, so that for all $\mu \notin {C}_{i}$, we have ${\nu}_{i}=0$. Then the feedforward weight vector of neuron $i$ converges in expectation to:
where ${G}_{1}({\nu}_{i})=({\nu}_{i}\theta ){\nu}_{i}$ and ${G}_{2}({\nu}_{i})=({\nu}_{i}\theta ){\nu}_{i}^{3}$. Hence $\overrightarrow{{w}_{i}}$ is a weighted average over all winning patterns.
The squared length of the feedforward weight vector can be computed by multiplying Equation (6) with ${\overrightarrow{w}}_{i}$:
Since input patterns have length one, the scalar product on the righthand side can be rewritten as ${\overrightarrow{w}}_{i}\cdot \overrightarrow{x}=\overrightarrow{{w}_{i}}\mathrm{cos}(\alpha )$ where α is the angle between the weight vector and pattern $\overrightarrow{x}$. Division by $\overrightarrow{{w}_{i}}$ yields the L2norm of the feedforward weight vector:
where the averages run, as before, over all winning patterns.
Let us now derive bounds for $\overrightarrow{{w}_{i}}$. First, since $\mathrm{cos}(\alpha )\u2a7d1$ we have ${\u27e8{G}_{1}({\nu}_{i})\mathrm{cos}(\alpha )\u27e9}_{\mu \in {C}_{i}}\u2a7d{\u27e8{G}_{1}({\nu}_{i})\u27e9}_{\mu \in {C}_{i}}$. Second, since for all winning patterns ${\nu}_{i}>\theta $, where θ is the LTP threshold, we have ${\u27e8{G}_{2}({\nu}_{i})\u27e9}_{\mu \in {C}_{i}}\u2a7e\u27e8({\nu}_{i}\theta ){\nu}_{i}\u27e9{\theta}^{2}$. Thus the length of the weight vector is finite and bounded by:
It is possible to make the second bound tighter if we find the winning pattern with the smallest firing rate ${\nu}_{\text{min}}$ such that ${\nu}_{i}\u2a7e{\nu}_{\text{min}}\forall i\in {C}_{i}$:
The bound is reached if neuron $i$ is winner for a single input pattern.
We can also derive a lower bound. For a pattern $\mu \in {C}_{i}$, let us write the firing rate of neuron $i$ as ${\nu}_{i}(\mu )={\overline{\nu}}_{i}+\mathrm{\Delta}{\nu}_{i}(\mu )$ where ${\overline{\nu}}_{i}$ is the mean firing rate of neuron $i$ averaged across all winning patterns and ${\u27e8\mathrm{\Delta}{\nu}_{i}\u27e9}_{\mu \in {C}_{i}}=0$. We assume that the absolute size of $\mathrm{\Delta}{\nu}_{i}$ is small, i.e., ${\u27e8{(\mathrm{\Delta}{\nu}_{i})}^{2}\u27e9}_{\mu \in {C}_{i}}\ll {({\overline{\nu}}_{i})}^{2}$. Linearization of Equation (8) around ${\overline{\nu}}_{i}$ yields:
Elementary geometric arguments for a neuron model with monotonically increasing frequency–current curve yield that the value of ${\u27e8\mathrm{cos}(\alpha )\mathrm{\Delta}{\nu}_{i}\u27e9}_{\mu \in {C}_{i}}$ is positive (or zero) because an increase in the angle α lowers both the cosine and the firing rate, giving rise to a positive correlation. Since we are interested in a lower bound, we can therefore drop the term proportional to ${G}_{1}^{\prime}$ and evaluate the ratio ${G}_{1}/{G}_{2}$ to find:
where ${\nu}_{\mathrm{max}}$ is the maximal firing rate of a DGC and $\widehat{\alpha}={\mathrm{max}}_{\mu \in {\mathrm{C}}_{\mathrm{i}}}\{\alpha \}$ is the angle of the winning pattern that has the largest angle with the weight vector. The first bound is tight and is reached if neuron $i$ is winner for only two patterns.
To summarize we find that the length of the weight vector remains bounded in a narrow range. Hence, for a reasonable distribution of input patterns and weight vectors, the value of $\overrightarrow{{w}_{i}}$ is similar for different neurons $i$, so that the weight vector will have, after convergence, similar lengths for all DGCs that are winners for at least one pattern. In our simulations with the MNIST data set, we find that the length of feedforward weight vectors lies in the range between 9.3 and 11.1 across all responsive neurons with a mean value close to 10; Figure 2e.
Early maturation phase
Request a detailed protocolDuring the early phase of maturation, the GABAergic input onto a newborn DGC with index $l$ has an excitatory effect. In the model, it is implemented as follows: ${w}_{lk}^{EI}={w}_{EI}>0$ with probability ${p}_{EI}$ for any interneuron $k$ and ${w}_{lk}^{EI}=0$ otherwise (no connection). Since newborn cells do not project yet onto inhibitory neurons (Temprana et al., 2015), we have ${w}_{kl}^{IE}=0\forall l$. Newborn DGCs are known to have enhanced excitability (SchmidtHieber et al., 2004; Li et al., 2017), so their threshold is kept at ${b}_{l}=0\forall l$. Because the newborn model DGCs receive lateral excitation via interneurons and their thresholds are zero during the early phase of maturation, the lateral excitatory GABAergic input is always sufficient to activate them. Hence, if the firing rate of a newborn DGC exceeds the LTP threshold θ, the feedforward weights grow toward the presented input pattern, Equation (1).
Presentation of all patterns of the data set once (one epoch) is sufficient to reach convergence of the feedforward weights onto newborn DGCs. We define the end of the first epoch as the end of the early phase, i.e., simulation of one epoch of the model corresponds to about 3 weeks of biological time.
Late maturation phase
Request a detailed protocolDuring the late phase of maturation (starting at about 3 weeks [Ge et al., 2006]), the GABAergic input onto newborn DGCs switches from excitatory to inhibitory. In terms of our model, it means that all existing ${w}_{lk}^{EI}$ connections switch their sign to ${w}_{EI}<0$. Furthermore, since newborn DGCs develop lateral connections to inhibitory neurons in the late maturation phase (Temprana et al., 2015), we set ${w}_{kl}^{IE}={w}_{IE}$ with probability ${p}_{IE}$, and ${w}_{kl}^{IE}=0$ otherwise. The thresholds of newborn DGCs are updated after presentation of pattern μ at time $n\cdot T$ (${b}_{l}^{(n)}={b}_{l}^{(n1)}+\mathrm{\Delta}{b}_{l}$) according to $\mathrm{\Delta}{b}_{l}={\eta}_{b}\left({\nu}_{l}{\nu}_{0}\right)$, where ${\nu}_{0}$ is a reference rate and ${\eta}_{b}$ a learning rate, to mimic the decrease of excitability as newborn DGCs mature (Table 1, Biologically plausible network). Therefore, the distribution of firing rates of newborn DGCs is shifted to the left (toward lower firing rates) at the end of the late phase of maturation compared to the early phase of maturation (Figure 2c,d). A sufficient condition for a newborn DGC to win the competition upon presentation of patterns of the novel cluster is that the scalar product between a pattern of the novel cluster and the feedforward weight vector onto the newborn DGC is larger than the scalar product between the pattern of the novel cluster and the feedforward weight vector onto any of the mature DGCs. Analogous to the early phase of maturation, presentation of all patterns of the data set once (one epoch) is sufficient to reach convergence of the feedforward weights onto newborn DGCs. We therefore consider that the late phase of maturation has been finished after one epoch.
Input patterns
Request a detailed protocolTwo different sets of input patterns are used. Both data sets have a number $K$ of clusters and several thousands of patterns per cluster. As a first data set, we use the MNIST 12 × 12 patterns (Lecun et al., 1998) (${N}_{EC}=144$), normalized such that the L2norm of each pattern is equal to 1. Normalization of inputs (be it implemented algorithmically as done here or by explicit inhibitory feedback) ensures that, once weight growth due to synaptic plasticity has ended and weights have stabilized, the overall strength of input onto DGCs is approximately identical for all cells (see Section Direction and length of the weight vector). Equalized lengths of weight vectors are, in turn, an important feature of classic soft or hard competitive networks (Kohonen, 1989; Hertz et al., 1991). The training set contains approximately 6000 patterns per digit, while the testing set contains about 1000 patterns per digit (Figure 1d). Both training patterns and test patterns contain a large variety of different writing styles indicating that the clusters of input patterns for each class are broadly distributed around their center of mass.
As a second data set, we use handmade artificial patterns designed such that the distance between the centers of any two clusters, or in other words their pairwise similarity, is the same. All clusters lie on the positive quadrant of the surface of a hypersphere of dimension ${N}_{EC}1$. The cluster centers are Walsh patterns shifted along the diagonal (Figure 5b):
with $\xi <1$ a parameter that determines the spacing between clusters. c_{0} is a normalization factor to ensure that the center of mass of all clusters has an L2norm of 1:
The number of input neurons ${N}_{EC}$ is ${N}_{EC}={2}^{K}$. The scalar product, and hence the angle Ω, between the center of mass of any pair of clusters $k$ and $l$ ($k\ne l$) is a function of ξ (Figure 5a):
We define the pairwise similarity $s$ of two clusters as: $s=1\xi $. Highly similar clusters have a large $s$ due to the small distance between their centers (hence a small ξ).
To make the artificial data set comparable to the MNIST 12 × 12 data set, we choose $K=7$, so ${N}_{EC}=128$, and we generate 6000 noisy patterns per cluster for the training set and 1000 other noisy patterns per cluster for the testing set. Since our noisy highdimensional input patterns have to be symmetrically distributed around the centers of mass ${\overrightarrow{P}}^{k}$, yet lie on the hypersphere, we have to use an appropriate sampling method. The patterns ${\overrightarrow{x}}^{\mu (k)}$ of a given cluster $k$ with center of mass ${\overrightarrow{P}}^{k}$ are thus sampled from a Von Mises–Fisher distribution (Mardia and Jupp, 2009):
with $\overrightarrow{\zeta}$ an L2normalized vector taken in the space orthogonal to ${\overrightarrow{P}}^{k}$. The vector $\overrightarrow{\zeta}$ is obtained by performing the singularvalue decomposition of ${\overrightarrow{P}}^{k}$ ($U\mathrm{\Sigma}{V}^{*}={\overrightarrow{P}}^{k}$) and multiplying the matrix $U$ (after removing its first column), which corresponds to the leftsingular vectors in the orthogonal space to ${\overrightarrow{P}}^{k}$, with a vector whose elements are drawn from the standard normal distribution. Then the L2norm of the obtained pattern is set to 1, so that it lies on the surface of the hypersphere. A rejection sampling scheme is used to obtain $a$ (Mardia and Jupp, 2009). The sample $a$ is kept if $\kappa a+({N}_{EC}1)\text{ln}(1\psi a)c\u2a7e\text{ln}(u)$, with κ a concentration parameter, $\psi =\frac{1b}{1+b}$, $c=\kappa \psi +({N}_{EC}1)\text{ln}(1{\psi}^{2})$, $u$ drawn from a uniform distribution $u\sim U[0,1]$, $a=\frac{1(1+b)z}{1(1b)z}$, $b=\frac{{N}_{EC}1}{\sqrt{4{\kappa}^{2}+{({N}_{EC}1)}^{2}}+2\kappa}$, and $z$ drawn from a beta distribution $z\sim \mathcal{B}e(\frac{{N}_{EC}1}{2},\frac{{N}_{EC}1}{2})$.
The concentration parameter κ characterizes the spread of the distribution around the center ${\overrightarrow{P}}^{k}$. In the limit where $\kappa \to 0$, sampling from the Von Mises–Fisher distribution becomes equivalent to sampling uniformly on the surface of the hypersphere, so the clusters become highly overlapping. In dimension ${N}_{EC}=128$, if $\kappa >{10}^{3}$, the probability of overlap between clusters is negligible. We use a value $\kappa ={10}^{4}$.
Classification performance (readout network)
Request a detailed protocolIt has been observed that classification performance based on DGC population activity is a good proxy for behavioral discrimination (Woods et al., 2020). Hence, to evaluate whether the newborn DGCs contribute to the function of the dentate gyrus network, we study classification performance. Once the feedforward weights have been adjusted upon presentation of many input patterns from the training set (Section Plasticity rule), we keep them fixed and determine classification on the test set using artificial readout units (RO).
To do so, the readout weights (${w}_{ki}^{RO}$ from model DGC $i$ to readout unit $k$) are initialized at random values drawn from a uniform distribution: ${w}_{ki}^{RO}\sim \sigma \mathcal{U}(0,1)$, with $\sigma =0.1$. The number of readout units, ${N}_{RO}$, corresponds to the number of learned classes. To adjust the readout weights, all patterns of the training data set that belong to the learned classes are presented one after the other. For each pattern ${\overrightarrow{x}}^{\mu}$, we let the firing rate of the DGCs converge (values at convergence: ${\nu}_{i}^{\mu}$). The activity of a readout unit $k$ is given by:
As we aim to assess the performance of the network of DGCs, the readout weights are adjusted by an artificial supervised learning rule. The loss function, which corresponds to the difference between the activity of the readout units and a onehot representation of the corresponding pattern label (Hertz et al., 1991),
with ${L}_{k}^{\mu}$ the element $k$ of a onehot representation of the correct label of pattern ${\overrightarrow{x}}^{\mu}$, is minimized by stochastic gradient descent:
The readout units have a rectified hyperbolic tangent frequencycurrent curve: $g(x)=\text{tanh}\left(2{[x]}_{+}\right)$, whose derivative is: ${g}^{\prime}(x)=2\left(1{\left(\text{tanh}\left(2{[x]}_{+}\right)\right)}^{2}\right)$. We learn the weights of the readout units over 100 epochs of presentations of all training patterns with $\eta =0.01$, which is sufficient to reach convergence.
Thereafter, the readout weights are fixed. Each test set pattern belonging to one of the learned classes is presented once, and the firing rates of the DGCs are let to converge. Finally, the activity of the readout units ${\nu}_{k}^{RO,\mu}$ is computed and compared to the correct label ${L}_{k}^{\mu}$ of the presented pattern. If the readout unit with the highest activity value is the one that represents the class of the presented input pattern, the pattern is said to be correctly classified. Classification performance is given by the number of correctly classified patterns divided by the total number of test patterns of the learned classes.
Control cases
Request a detailed protocolIn our standard setting, patterns from a third digit are presented to a network that has previously only seen patterns from two digits. The question is whether neurogenesis helps when adding the third digit. We use several control cases to compare with the neurogenesis case. In the first control case, all three digits are learned in parallel (Figure 3a, control 1). In the two other control cases, we either keep all feedforward connections toward the DGCs plastic (Figure 3c, control 3) or fix the feedforward connections for all selective DGCs but keep unselective neurons plastic (as in the neurogenesis case) (Figure 3b, control 2). However, in both instances, the DGCs do not mature in the twostep process induced by the GABAswitch that is part of our model of neurogenesis.
Pretraining with two digits
Request a detailed protocolAs we are interested by neurogenesis at the adult stage, we pretrain the network with patterns from two digits, such that it already stores some memories before neurogenesis takes place. To do so, we randomly initialize the weights from EC neurons to DGCs: they are drawn from a uniform distribution (${w}_{ij}\sim U[0,1]$). The L2norm of the feedforward weight vector onto each DGC is then normalized to 1, to ensure fair competition between DGCs during learning. Then we present all patterns from digits 3 and 4 in random order, as many times as needed for convergence of the weights. During each pattern presentation the firing rates of the DGCs are computed (Section Network architecture and neuronal dynamics) and their feedforward weights are updated according to our plasticity rule (Section Plasticity rule). We find that we need approximately 40 epochs for convergence of the weights and use 80 epochs to make sure that all weights are stable. At the end of pretraining, our network is considered to correspond to an adult stage, because some DGCs are selective for prototypes of the pretrained digits (Figure 1e).
Projection on pairwise discriminatory axes
Request a detailed protocolTo assess how separability of the DGC activation patterns develops during the late phase of maturation of newborn DGCs, we project the population activity onto axes which are optimized for pairwise discrimination (patterns from digit 3 versus patterns from digit 5, 4 versus 5, and 3 versus 4). Those axes are determined using Fisher linear discriminant analysis, as explained below.
We determine the vector of DGC firing rates, $\overrightarrow{\nu}$, at the end of the late phase of maturation of newborn DGCs upon presentation of each pattern, $\overrightarrow{x}$, from digits 3, 4, and 5 of the training MNIST dataset. The mean activity in response to all training patterns μ from digit $m$, ${\overrightarrow{\mu}}_{m}=\frac{1}{{N}_{m}}{\sum}_{\mu \in m}{\overrightarrow{\nu}}^{\mu}$, is computed for each of the three digits (${N}_{m}$ is the number of training patterns of digit $m$). The pairwise Fisher linear discriminant is defined as the linear function ${\overrightarrow{w}}^{T}\overrightarrow{\nu}$ that maximizes the distance between the means of the projected activity in response to two digits (e.g. $m$ and $n$), while normalizing for withindigit variability. The objective function to maximize is thus given as:
with ${S}_{B}=({\overrightarrow{\mu}}_{m}{\overrightarrow{\mu}}_{n}){({\overrightarrow{\mu}}_{m}{\overrightarrow{\mu}}_{n})}^{T}$ the betweendigit scatter matrix, and ${S}_{W}={\mathrm{\Sigma}}_{m}+{\mathrm{\Sigma}}_{n}$ the withindigit scatter matrix (${\mathrm{\Sigma}}_{m}$ is the covariance matrix of the DGC activity in response to pattern of digit $m$, and ${\mathrm{\Sigma}}_{n}$ is the covariance matrix of the DGC activity in response to pattern of digit $n$). It can be shown that the direction of the optimal discriminatory axis between digit $m$ and $n$ is given by the eigenvector of ${S}_{W}^{1}{S}_{B}$ with the corresponding largest eigenvalue.
We arbitrarily set ‘axis 1’ as the optimal discriminatory axis between digit 3 and digit 5, ‘axis 2’ as the optimal discriminatory axis between digit 4 and digit 5, and ‘axis 3’ as the optimal discriminatory axis between digit 3 and digit 4. For each of the three discriminatory axes, we define its origin (i.e. projection value of 0) as the location of the average projection of all training patterns of the three digits on the corresponding axis. Figure 4 represents the projections of DGC activity upon presentation of testing patterns at the end of the early and late phase of maturation of newborn DGCs onto the abovedefined axes.
Statistics
In the main text, we present a representative example with three digits from the MNIST data set (3, 4, and 5). It is selected from a set of 10 random combinations of three different digits. For each combination, one network is pretrained with two digits for 80 epochs. Then the third digit is added and neurogenesis takes place (one epoch of early phase of maturation, and one epoch of late phase of maturation). Furthermore, another network is pretrained directly with the three digits for 80 epochs. Classification performance is reported for all combinations (Supplementary file 1).
Simplified rate network
We use a toy network and the artificial data set to determine whether our theory of integration of newborn DGCs can explain why adult dentate gyrus neurogenesis helps for the discrimination of similar, but not for distinct patterns.
The rate network described above is simplified as follows. We use $K$ DGCs for $K$ clusters. Their firing rate ${\nu}_{i}$ is given by:
where $\mathscr{H}$ is the Heaviside step function. As before, b_{i} is the threshold, and ${I}_{i}$ the total input toward neuron $i$:
with x_{j} the input of presynaptic EC neuron $j$, ${w}_{ij}$ the feedforward weight between EC neuron $j$ and DGC $i$, and ${\nu}_{k}$ the firing rate of DGC $k$. Inhibitory neurons are modeled implicitly: each DGC directly connects to all other DGCs via inhibitory recurrent connections of value ${w}_{rec}<0$. During presentation of pattern ${\overrightarrow{x}}^{\mu}$, the firing rates of the DGCs evolve according to Equation (21). After convergence, the feedforward weights are updated: ${w}_{ij}^{(\mu )}={w}_{ij}^{(\mu 1)}+\mathrm{\Delta}{w}_{ij}$. The synaptic plasticity rule is the same as before, see Equation (1), but with the parameters reported in Table 1 (Simple network). They are different from those of the biologically plausible network because we now aim for a single winning neuron for each cluster. Note that for an LTP threshold $\theta <1$ all active DGCs update their feedforward weights because of the Heaviside function for the firing rate (Equation 21).
Assuming a single winner ${i}^{*}$ for each pattern presentation, the input (Equation 22) to the winner is:
while the input to the losers is:
Therefore, two conditions need to be satisfied for a solution with a single winner:
for the winner to actually be active, and:
to prevent nonwinners to become active. The value of b_{i} in the model is lower in the early phase than in the late phase of maturation to mimic enhanced excitability (SchmidtHieber et al., 2004; Li et al., 2017).
Similar versus distinct patterns with the artificial data set
Request a detailed protocolUsing the artificial data set with $\xi <1$ (Equation 13), the scalar product between the centers of mass of two different clusters, given by Equation (15), satisfies: $0.5\u2a7d\frac{1}{1+{\xi}^{2}}\u2a7d1$. This corresponds to ${0}^{\circ}\u2a7d\mathrm{\Omega}\u2a7d{\mathrm{\Omega}}_{\text{max}}={60}^{\circ}$.
After stimulation with a pattern $\overrightarrow{x}$, it takes some time before the firing rates of the DGCs converge. We call two patterns ‘similar’ if they activate, at least initially, the same output unit, while we consider two patterns as ‘distinct’ if they do not activate the same output unit, not even initially. We now show that, with a large concentration parameter κ, patterns of different clusters are similar if $\xi <\sqrt{\frac{{\overrightarrow{w}}_{i}}{{b}_{i}}1}$ and distinct if $\xi >\sqrt{\frac{{\overrightarrow{w}}_{i}}{{b}_{i}}1}$.
We first consider a DGC $i$ whose feedforward weight vector has converged toward the center of mass of cluster $k$. If an input pattern ${\overrightarrow{x}}^{\mu (k)}$ from cluster $k$ is presented, it will receive the following initial input:
where ${\vartheta}_{\text{kk}}$ is the angle between the pattern ${\overrightarrow{x}}^{\mu (k)}$ and the center of mass ${\overrightarrow{P}}^{k}$ of the cluster to which it belongs. The larger the concentration parameter κ for the generation of the artificial data set, the smaller the dispersion of the clusters, and thus the larger $\mathrm{cos}({\vartheta}_{\text{kk}})$. If instead, an input pattern from cluster $l$ is presented, that same DGC will receive a lower initial input:
The approximation holds for a small dispersion of the clusters (large concentration parameter κ). We note that there is no subtraction of the recurrent input yet because output units are initialized with zero firing rate before each pattern presentation. By definition, similar patterns stimulate (initially) the same DGCs. A DGC can be active for two clusters only if its threshold is:
Therefore, with a high concentration parameter κ, patterns of different clusters are similar if $\xi <\sqrt{\frac{{\overrightarrow{w}}_{i}}{{b}_{i}}1}$, while patterns of different clusters are distinct if $\xi >\sqrt{\frac{{\overrightarrow{w}}_{i}}{{b}_{i}}1}$.
Parameter choice
Request a detailed protocolThe upper bound of the expected L2norm of the feedforward weight vector toward the DGCs at convergence can be computed, see Equation (10). With the parameters in Table 1 (Simple network), the value is ${\overrightarrow{w}}_{i}\u2a7d1.5$. Moreover, the input patterns for each cluster are highly concentrated; hence, their angle with the center of mass of the cluster they belong to is close to 0, so we have ${\overrightarrow{w}}_{i}\approx 1.5$. Therefore, at convergence, a DGC selective for a given cluster $k$ receives an input ${I}_{{i}^{\ast}}={\overrightarrow{w}}_{{i}^{\ast}}\cdot {\overrightarrow{x}}^{\mu (k)}\approx 1.5$ upon presentation of input patterns ${\overrightarrow{x}}^{\mu (k)}$ belonging to cluster $k$. We choose ${b}_{i}=1.2$ to satisfy Equation (25). Given b_{i} the threshold value ${\xi}_{\text{thresh}}$ for which two clusters are similar (and above which two clusters are distinct) can be determined by Equation (29) : ${\xi}_{\text{thresh}}=0.5$. We created a handmade data set with $\xi =0.2$ for the case of similar clusters (therefore with similarity $s=0.8$), and a handmade data set with $\xi =0.8$ for the distinct case (hence with similarity $s=0.2$).
Let us suppose that the weights of DGC $i$ have converged and made this cell respond to patterns of cluster $i$. If another DGC $k$ of the network is selective for cluster $k$, cell $i$ gets the input $I}_{i}={\overrightarrow{w}}_{i}\cdot {\overrightarrow{x}}^{\mu (k)}+{w}_{\text{rec}}\approx \frac{1.5}{1+{\xi}^{2}}+{w}_{\text{rec}$ upon presentation of input patterns ${\overrightarrow{x}}^{\mu (k)}$ belonging to cluster $k\ne i$. Hence, to satisfy Equation (26), we need ${w}_{\text{rec}}<{b}_{i}{\mathrm{max}}_{\xi}\left(\frac{1.5}{1+{\xi}^{2}}\right)\approx 0.24$. We set ${w}_{\text{rec}}=1.2$.
Furthermore, a newborn DGC is born with a null feedforward weight vector so that at birth, its input consists only of the indirect excitatory input from mature DGCs, which vanishes if all DGCs are quiescent and takes a value ${I}_{i}={w}_{\text{rec}}>0$ if a mature DGC responds to the input. For the feedforward weight vector to grow, the newborn cell $i$ needs to be active. This could be achieved through spontaneous activity that could be implemented by setting the intrinsic firing threshold at birth to a value ${b}_{\text{birth}}<0$. In this case, a difference between similar and distinct patterns is not expected. Alternatively, activity of newborn cells can be achieved in the absence of spontaneous activity under the condition ${w}_{\text{rec}}>{b}_{\text{birth}}$. For the simulations with the toy model, we set ${b}_{\text{birth}}=0.9$, which leads to weight growth in newborn cells for similar, but not distinct patterns.
Neurogenesis with the artificial data set
Request a detailed protocolTo save computation time, we initialize the feedforward weight vectors of two mature DGCs at two training patterns randomly chosen from the first two clusters, normalized such that they have an L2norm of 1.5. We then present patterns from clusters 1 and 2 and let the feedforward weights evolve according to Equation (1) until they reach convergence.
We thereafter fix the feedforward weights onto the two mature cells and introduce a novel cluster of patterns as well as a newborn DGC in the network. The sequence of presentation of patterns from the three clusters (a novel one and two pretrained ones) is random. The newborn DGC is born with a null feedforward weight vector, and its maturation follows the same rules as before (plastic feedforward weights). In the early phase, GABAergic input has an excitatory effect (Ge et al., 2006) and the newborn DGC does not inhibit the mature DGCs (Temprana et al., 2015). This is modeled by setting ${w}_{\text{rec}}^{NM}={w}_{\text{rec}}$ for the connections from mature to newborn DGC, and ${w}_{\text{rec}}^{MN}=0$ for the connections from newborn to mature DGCs. The threshold of the newborn DGC starts at ${b}_{\text{birth}}=0.9$ at birth, mimicking enhanced excitability (SchmidtHieber et al., 2004; Li et al., 2017), and increases linearly up to 1.2 (same threshold as that of mature DGCs) over 12,000 pattern presentations, reflecting loss of excitability with maturation. The exact time window is not critical. In the late phase of maturation of the newborn DGC, GABAergic input switches to inhibitory (Ge et al., 2006), and the newborn DGC recruits feedback inhibition onto mature DGCs (Temprana et al., 2015). It is modeled by switching the sign of the connection from mature to newborn DGC: ${w}_{\text{rec}}^{NM}={w}_{\text{rec}}$ and establishing connections from newborn to mature DGCs: ${w}_{\text{rec}}^{MN}={w}_{\text{rec}}$. Each of the 6000 patterns is presented once during the early phase of maturation and once during the late phase of maturation.
The above paradigm is run separately for each of the two handmade data sets: the one where clusters are similar ($s=0.8$) and the one where clusters are distinct ($s=0.2$).
Analytical computation of the L2norm and angle
Request a detailed protocolWe consider the case where two mature DGCs have learned their synaptic connections, such that the first mature DGC with feedforward weight vector ${\overrightarrow{w}}_{1}$ is selective for cluster 1 with normalized center of mass ${\overrightarrow{P}}^{1}$, and the second mature DGC with feedforward weight vector ${\overrightarrow{w}}_{2}$ is selective for cluster 2 with normalized center of mass ${\overrightarrow{P}}^{2}$. After convergence, we have ${\overrightarrow{w}}_{1}=\u27e8{\overrightarrow{w}}_{1}\u27e9{\overrightarrow{P}}^{1}$ and ${\overrightarrow{w}}_{2}=\u27e8{\overrightarrow{w}}_{2}\u27e9{\overrightarrow{P}}^{2}$, where $\u27e8{\overrightarrow{w}}_{k}\u27e9$ is the expected L2norm of the feedforward weight vector onto mature DGC $k$ that is selective for pretrained cluster $k$. In addition, the upper bound for the L2norm of the weight vectors of the mature DGCs can be determined $\u27e8{\overrightarrow{w}}_{1}\u27e9=\u27e8{\overrightarrow{w}}_{2}\u27e9\u2a7d1.5$. In our case, we obtain $\u27e8{\overrightarrow{w}}_{1}\u27e9=\u27e8{\overrightarrow{w}}_{2}\u27e9\approx 1.49$ because of the dispersion of the patterns around their center of mass; hence, we will use this value for the numerical computations below.
We represent the feedforward weight vector ${\overrightarrow{w}}_{i}$ onto a newborn DGC as an arrow of length $\u27e8{\overrightarrow{w}}_{1}\u27e9$ (Figure 6—figure supplement 1). We compute analytically its L2norm at the end of the early phase of maturation of the newborn DGC, as well as its angle φ with the center of mass of the novel cluster ${\overrightarrow{P}}^{i}$, to confirm the results obtained numerically (Figure 6, Figure 6—figure supplement 1).
In the early phase of maturation, the feedforward weight vector onto the newborn DGC grows. The norm stabilizes at a higher value in the case of similar patterns ($s=0.8$, Figure 6—figure supplement 1) than in the case of distinct patterns ($s=0.2$, Figure 6—figure supplement 1). It is due to the fact that the center of mass of three similar clusters lies closer to the surface of the sphere than the center of mass of two distinct clusters (see below). In the late phase of maturation, for similar clusters we observe a slight increase of the L2norm of the feedforward weight vector onto the newborn DGC concomitantly with the decrease of angle with the center of mass of the novel cluster (Figure 6—figure supplement 1), because the center of mass of the novel cluster lies closer to the surface of the sphere than the center of mass of the three clusters.
Similar clusters
Request a detailed protocolThe angle between the center of mass of any pair of similar clusters ($s=0.8$, $\xi =0.2$) is given by Equation (15):
Half the distance between the projections of the center of mass of any pair of two similar clusters on a concentric sphere with radius $\u27e8{\overrightarrow{w}}_{1}\u27e9$ is given by (Figure 6—figure supplement 1):
The triangle that connects the centers of masses of the three clusters is equilateral, and $y$ separates one of its angle in two equal parts ($\pi /6$ [rad] each). So the length $y$ can be calculated:
Using Pythagoras formula, we can thus determine the expected L2norm $\u27e8{\overrightarrow{w}}_{i}\u27e9$ of the feedforward weight vector onto the newborn DGC at the end of the early phase of maturation:
and finally its angle with the center of mass of the novel cluster:
The numerical values are as follows: $\u27e8{\overrightarrow{w}}_{i}\u27e9\approx 1.47$ and $\phi \approx 9.21{[}^{\circ}]$, which correspond to the values on Figure 6—figure supplement 1.
Distinct clusters
Request a detailed protocolIn the case of distinct patterns ($s=0.2$, $\xi =0.8$), the angle between the center of mass of any pair of clusters is given by Equation (15):
We can directly compute the expected L2norm of the feedforward weight vector onto the newborn DGC at the end of the early phase of maturation (Figure 6—figure supplement 1):
We can then calculate the length $z$ between the projection of the center of mass of one of the two pretrained clusters on a concentric sphere with radius $\u27e8{\overrightarrow{w}}_{1}\u27e9$ and the feedforward weight vector onto the newborn DGC:
Analogous to the similar case, we observe that $y$ separates one angle of the equilateral triangle connecting the projections of the center of mass of the clusters on the sphere in two equal parts, consequently:
Finally, the angle between the center of mass of the novel cluster and the feedforward weight vector onto the newborn DGC at the end of the early phase of maturation is:
We obtain the following approximate values: $\u27e8{\overrightarrow{w}}_{i}\u27e9\approx 1.34$ and $\phi \approx 47.2{[}^{\circ}]$, which correspond to the values on Figure 6—figure supplement 1. The angle φ is smaller in the similar case than in the distinct case, hence the norm is larger in the similar case, as observed in Figure 6—figure supplement 1.
Effective dimensionality and participation ratio
Request a detailed protocolThe effective dimensionality of the input is measured as the participation ratio (PR) defined as $PR={(\text{Tr}(C))}^{2}/\text{Tr}({C}^{2})$, where $C$ is the covariance matrix of the input patterns, and $\text{Tr}(C)$ denotes the trace of matrix $C$ (Mazzucato et al., 2016; LitwinKumar et al., 2017).
Data availability
Simulation and plotting scripts can be found at: https://github.com/ogozel/NeurogenesisModel (copy archived at https://archive.softwareheritage.org/swh:1:rev:e46f2dfc10c21d69ac057f31c5800f46644b004a).

THE MNIST DATABASEID yann.lecun.com/exdb/mnist/. The MNIST database of handwritten digits.
References

GABAergic cells are the major postsynaptic targets of mossy fibers in the rat HippocampusThe Journal of Neuroscience 18:3386–3403.https://doi.org/10.1523/JNEUROSCI.180903386.1998

A theory of cerebellar functionMathematical Biosciences 10:25–61.https://doi.org/10.1016/00255564(71)900514

Transition to chaos in random networks with celltypespecific connectivityPhysical Review Letters 114:088101.https://doi.org/10.1103/PhysRevLett.114.088101

The dentate gyrus: fundamental neuroanatomical organization (dentate gyrus for dummies)Progress in Brain Research 163:3–22.https://doi.org/10.1016/S00796123(07)630015

BookThe Hippocampus BookOxford University Press.https://doi.org/10.1093/acprof:oso/9780195100273.001.0001

Additive neurogenesis as a strategy for avoiding interference in a sparselycoding dentate gyrusNetwork: Computation in Neural Systems 20:137–161.https://doi.org/10.1080/09548980902993156

Excitatory actions of gaba during development: the nature of the nurtureNature Reviews Neuroscience 3:728–739.https://doi.org/10.1038/nrn920

Adult neurogenesis produces a large pool of new granule cells in the dentate gyrusThe Journal of Comparative Neurology 435:406–417.https://doi.org/10.1002/cne.1040

Becoming a new neuron in the adult olfactory bulbNature Neuroscience 6:507–518.https://doi.org/10.1038/nn1048

GABA depolarization is required for experiencedependent synapse unsilencing in adultborn neuronsJournal of Neuroscience 33:6614–6622.https://doi.org/10.1523/JNEUROSCI.078113.2013

Heterosynaptic plasticity: multiple mechanisms and multiple rolesThe Neuroscientist : A Review Journal Bringing Neurobiology, Neurology and Psychiatry 20:483–498.https://doi.org/10.1177/1073858414529829

Connectivity reflects coding: a model of voltagebased STDP with homeostasisNature Neuroscience 13:344–352.https://doi.org/10.1038/nn.2479

Apoptosis, neurogenesis, and information content in Hebbian networksBiological Cybernetics 94:9–19.https://doi.org/10.1007/s0042200500268

Shortterm and longterm survival of new neurons in the rat dentate gyrusThe Journal of Comparative Neurology 460:563–572.https://doi.org/10.1002/cne.10675

Hippocampal neurogenesis reduces the dimensionality of sparsely coded representations to enhance memory encodingFrontiers in Computational Neuroscience 12:99.https://doi.org/10.3389/fncom.2018.00099

New neurons and new memories: how does adult hippocampal neurogenesis affect learning and memory?Nature Reviews Neuroscience 11:339–350.https://doi.org/10.1038/nrn2822

ConferenceAdding a conscience to competitive learningIEEE International Conference on Neural Networks. pp. 117–124.https://doi.org/10.1109/ICNN.1988.23839

Clustering: a neural network approachNeural Networks 23:89–107.https://doi.org/10.1016/j.neunet.2009.08.007

Neurogenesis paradoxically decreases both pattern separation and memory interferenceFrontiers in Systems Neuroscience 9:136.https://doi.org/10.3389/fnsys.2015.00136

Neonatal maternal separation delays the GABA excitatorytoinhibitory functional switch by inhibiting KCC2 expressionBiochemical and Biophysical Research Communications 493:1243–1249.https://doi.org/10.1016/j.bbrc.2017.09.143

BookNeuronal Dynamics: From Single Neurons to Networks and Models of CognitionCambridge University Press.https://doi.org/10.1017/CBO9781107447615

Single granule cells reliably discharge targets in the hippocampal CA3 network in vivoNature Neuroscience 5:790–795.https://doi.org/10.1038/nn887

Interneurons of the dentate gyrus: an overview of cell types, terminal fields and neurochemical identityProgress in Brain Research 163:217–232.https://doi.org/10.1016/S00796123(07)630131

On the role of the hippocampus in learning and memory in the ratBehavioral and Neural Biology 60:9–26.https://doi.org/10.1016/01631047(93)906644

Paradox of pattern separation and adult neurogenesis: a dual role for new neurons balancing memory resolution and robustnessNeurobiology of Learning and Memory 129:60–68.https://doi.org/10.1016/j.nlm.2015.10.013

Developmental changes in GABAergic actions and seizure susceptibility in the rat hippocampusEuropean Journal of Neuroscience 19:590–600.https://doi.org/10.1111/j.0953816X.2003.03152.x

BookSelfOrganization and Associative MemorySpringerVerlag.https://doi.org/10.1007/9783642881633

Gradientbased learning applied to document recognitionProceedings of the IEEE 86:2278–2324.https://doi.org/10.1109/5.726791

Decoding neurotransmitter switching: the road forwardThe Journal of Neuroscience 40:4078–4089.https://doi.org/10.1523/JNEUROSCI.000520.2020

A theory of cerebellar cortexThe Journal of Physiology 202:437–470.https://doi.org/10.1113/jphysiol.1969.sp008820

Simple memory: a theory for archicortexPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 262:23–81.https://doi.org/10.1098/rstb.1971.0078

Stimuli reduce the dimensionality of cortical activityFrontiers in Systems Neuroscience 10:11.https://doi.org/10.3389/fnsys.2016.00011

Is there more to GABA than synaptic inhibition?Nature Reviews Neuroscience 3:715–727.https://doi.org/10.1038/nrn919

Triplets of spikes in a model of spike timingdependent plasticityJournal of Neuroscience 26:9673–9682.https://doi.org/10.1523/JNEUROSCI.142506.2006

BookNeural Networks and Brain FunctionOxford: Oxford University Press.https://doi.org/10.1093/acprof:oso/9780198524328.001.0001

Feature discovery by competitive learningCognitive Science 9:75–112.https://doi.org/10.1207/s15516709cog0901_5

Young adultborn neurons improve odor coding by mitral cellsNature Communications 11:5867.https://doi.org/10.1038/s41467020194728

Defined types of cortical interneurone structure space and spike timing in the hippocampusThe Journal of Physiology 562:9–26.https://doi.org/10.1113/jphysiol.2004.078915

Running increases cell proliferation and neurogenesis in the adult mouse dentate gyrusNature Neuroscience 2:266–270.https://doi.org/10.1038/6368

Monosynaptic inputs to new neurons in the dentate gyrusNature Communications 3:1107.https://doi.org/10.1038/ncomms2101

Hebbian plasticity requires compensatory processes on multiple timescalesPhilosophical Transactions of the Royal Society B: Biological Sciences 372:20160259.https://doi.org/10.1098/rstb.2016.0259
Decision letter

Tatyana O SharpeeReviewing Editor; Salk Institute for Biological Studies, United States

John R HuguenardSenior Editor; Stanford University School of Medicine, United States

Paul MillerReviewer; Brandeis University, United States
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This paper demonstrates through theoretical modelling how the switch from excitatory to inhibitory signaling occurring in new born neurons can aid the integration of new neurons into the existing neural circuit. The modelling analysis also analyzes how this can aid the temporal integration of relevant memories.
Decision letter after peer review:
Thank you for submitting your article "A functional model of adult dentate gyrus neurogenesis" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and John Huguenard as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Paul Miller (Reviewer #1).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential Revisions:
1. Provide analysis of the model where mature neurons also exhibit plasticity but at reduced levels.
2. Examine how network behaves when inputs have different statistics.
Reviewer #1:
The authors propose a role for newborn cells in the dentate gyrus that relies upon their input from interneurons being initially excitatory before switching to inhibitory as the cells mature. The computational modeling and accompanying analyses show how, when receiving only excitatory input, the newborn cells become responsive to stimuli similar to those that already cause high responses in other cells, but then, following the developmental switch such that they receive inhibitory input, those cells then gain responses to novel, but similar, inputs. In a simplified model, the authors are able to quantify the criterion of "sufficient similarity", such that if the novel inputs are not similar enough to the original ones the newborn cells to not gain responses to them. The authors demonstrate that such only when newborn cells are incorporated into the network and respond to the novel stimuli, can those stimuli be categorized by the network, as necessary for them to be recognized.
A major achievement of the paper is to identify a role in information processing for the developmental shift in reversal potential of chloride ions. Such a role is well supported by the results in the paper.
As in all modeling papers, some choices must be made so as to simplify the system to render it tractable and its behavior understandable. Some of the choices could be better justified or discussed, as highlighted below.
The normalization of inputs such that the L2norm is fixed seems rather unusual, and is not clearly the actual impact of feedforward inhibition. It would be nice to know whether this feature of the input vector is important. Would it matter if the L1norm were used for normalization, or if normalization were not precise? Perhaps a comment on the importance of this could be made, as well as a justification for the choice.
Throughout the manuscript, the authors employ a homeostatic term in the postsynaptic firing rate. The term is called "heterosynaptic" by the authors, but strictly it is not, since it does not depend on other presynaptic inputs. Rather, the plasticity rule effects "firing rate homeostasis" and is implemented in a manner similarly to Renart, Song, and Wang, (2003). I think this should be mentioned at a minimum and perhaps entirely renamed throughout the manuscript.
The authors consider the ability of the network to discriminate novel patterns as the newborn cells gain responses to the novel patterns. I assume that the formation of new responses arises when the novel patterns are presented randomly amidst the set of previously learned patterns. It would be valuable to see if there are prediction of any differences in behavior if the novel patterns were presented alone, or if there are any impacts of different manners of interspersing learned patterns with novel patterns.
Lines 6891 contain a lot of details of the circuitry, many of which are not included in the model. It would be helpful to have a figure showing the full circuit based on the information written (which is rather hard to take in through one reading) and beside it to include a figure of the model circuit so the reader can easily see what is being simplified and omitted.
Lines 118119: I think you should mention here or in the Discussion that you have selected a specific set of synapses to undergo plasticitythat is, if I understand correctly, you have ignored any plasticity with the Dentate Gyrus.
Line 1678, Equation 1: This equation appears to be different from that of Equation6 in the methods. In particular the "HET" function depends on postsynaptic ratecubed, not just the difference between rate and threshold as suggested here. Why not just write the exact equation and indicate/describe the behavior of each term?
Line 260261: The terminology is a bit confusing, as activation is not clearly a "change in membrane potential" but a change in firing rate, so has different units to the reversal potential. Especially as the membrane potential must be venturing above threshold to produce some spiking activity. Perhaps the criterion is equivalent to "the activity is low enough that the mean membrane potential remains below the reversal potential of the chloride channels"?
Line 2723: The statement about the switch in excitability here assumes we already know it, though it is described in the methods much later. Perhaps this sort of issue is inevitable in journals where methods are placed after results, but it would be better if the order were reversed!
Line 320: I see no justification for a oneway ttest. I think they should be twoway unless it is only possible a priori for a change in one direction.
Line 338: The mention of fixing one set of inputs arises out of nowhere without justification – though that justification comes later as this is just one of two controls. I think it would be better with the order reversed. Or, at least when it is first mentioned here, please be clear why this was chosen, as – I assume – the feedforward weights to selective cells are not fixed in the main set of results. If feedforward weights to selective cells are fixed in the earlier sections, then it should be clearly mentioned, as I did not notice it.
Line 364: Following the previous comment, this line suggests that feedforward weights are not fixed in your primary results with good discrimination. Please clarify if this statement is constrained to the networks without neurogenesis.
Reviewer #2:
Gozel and Gerstner investigated the functional role of adult neurogenesis in the dentate gyrus using simulations and mathematical analysis of a computational model. The novelty of the paper compared to numerous previous studies in the field is the inclusion of the GABAergic switch from excitation to inhibition of new neurons during the maturation process. So far this has been overlooked in the computational literature. G&G propose an elegant and potentially interesting idea for how the two phase maturation process could be functionally beneficial for an animal tasked with discriminating stimuli, and would be the first to recapitulate the experimental finding of adult neurogenesis contributing to pattern separation of similar but not distinct stimuli.
However, my assessment is that the current model simulations and analysis are not sufficient to support for the claims made in the paper. Furthermore, the main experimental finding that can be understood based on this modeling work is emergence pattern separation for similar but not distinct stimuli. While interesting, this is rather technical, and may depend quite strongly on the details of the model.
1. The input stimuli from the MNIST dataset presented to the network are low dimensional to a very good approximation ("3", "4", "5"), in contrast to the type of stimuli a real network would be presented with which are expected to be highdimensional.
1.1. In the model analyzed, the narrowness of the distribution of synaptic weight vector norms is important for network stability. This narrow distribution could at least in part be inherited from the low dimensionality of stimuli (all "3's" have large overlap with the "average 3"). If the overlaps of different stimuli are broadly distributed, so will the distribution of how many input patterns each neuron is a "winner" in. It is important to test this stability in more realistic stimulus ensembles, perhaps by controlling the width of the overlap distribution using the binary model the authors present towards the end of the paper.
1.2. The authors claim that synapses of newborn DGCs starting the maturation process from 0 is important for solving the problem of unresponsive neurons. The reason is that during this phase the synaptic weight vector becomes aligned with a specific direction of the input space. It is possible that unresponsive neurons are stuck in a local minimum (like the case with no neurogenesis) precisely because stimuli (and overlaps) are narrowly distributed around the mean. If stimuli are more broadly distributed (~higher dimension), the basins of attraction are expected to be more numerous and more shallow. Therefore one may expect the problem of the system getting stuck in a local minimum to be far less severe in this case, and for "control 2" networks to learn well.
2. Setting no plasticity (eta = 0) for mature cells is a very strong assumption. Some protocols (e.g., TBS_{2} in SchmidtHieber, 2004; and others in Ge 2007) lead to ~2 fold increase in plasticity in young vs. mature neurons. Since mature neurons significantly outnumber young neurons, the effect of plasticity in mature neurons cannot be neglected altogether, especially since the paper's main focus is on the integration of newborn neurons into the circuit. Given the actual degree of synaptic plasticity in mature neurons (according to the papers that the authors themselves cite) I expect the behavior of the authors model to be much closer to "control 3". To support their claims, I think the authors should show that their network compares favorably to control 3 even if DGCs remain plastic throughout (but to a lesser extent). In this scenario I expect the fraction of neurons that are new at any given time to be much more important than the current model, since the mature part of the network is fixed. Therefore this fraction should also be matched to experiments.
3. It is not clear to me how the two phase maturation process of DGCs would be affected in a scenario where at any given point some DGCs are in the excitatory phase of GABA and others are in the inhibitory phase. This would be expected if there is a continuous stream of new neurons. Would the plasticity of the neurons in the inhibitory phase not interfere with aligning the activity to similar stimuli due to plasticity of neurons in the excitatory phase? If there is interference, would the authors then predict that neurogenesis occurs in waves (i.e., some kind of global signal would coordinate transition from phase 1 to 2 across synapses)?
Is there evidence supporting that?
It seems to me that the calculation in the section "Analytical computation of the L2norm and angle"at least in principlebe extended to estimate the interference: the competition due to plasticity of neurons in the inhibitory phase increases the angle phi, and thus slows down the alignment of the weights due to plasticity of neurons in the excitatory phase.
107, Review of functional role of DGCs.
Aljadeff et al., 2015,
ShaniNarkiss et al., 2020
suggest a dynamical role for new neurons.
312, It would be interesting if the advantage of adding newborn neurons stimulated with "5" to a network pretrained with "3" and "4" over a network pretrained with "3", "4", and "5" would persist if some amount of plasticity remains in mature neurons (Figure 3d).
614, It would be good to discuss the possibility that neurotransmitter switch (without neurogenesis) has the same functional role as GABA switch in the current model. See e.g., Li et al., (2020) J Neuroscience.
Furthermore, can this model teach us anything about neurogenesis in the olfactory bulb? Is there an E to I switch there too?
725, Miller and Fumarola may not be the right reference to cite here. This specific nonlinearity (rectified tanh) is not standard and is not included in that paper.
778, Definition of quasi orthogonal is not clear. The inhibitory rates can have fluctuations and temporal dynamics of their own even if the network is assumed to be silent when each stimulus is presented. Therefore inputs might be quasiorthogonal at one time but not at another. If in this is used just to qualitatively understand the network behavior, this somewhat sloppy definition is ok, but I think this caveat should be mentioned to avoid confusion.
https://doi.org/10.7554/eLife.66463.sa1Author response
Essential Revisions:
1. Provide analysis of the model where mature neurons also exhibit plasticity but at reduced levels.
We have included a new paragraph in the Results section “Robustness of the model” (lines 473495) and a Supplementary File 4 to address this point.
2. Examine how network behaves when inputs have different statistics.
We have included a new Results section “The cooperative phase of maturation promotes pattern separation for any dimensionality of input data” (lines 601678) as well as a new Supplementary File 5 to address this point (and a Method section “Effective dimensionality and participation ratio”, lines 13491353, to define our dimensionality measure).
Reviewer #1:
The authors propose a role for newborn cells in the dentate gyrus that relies upon their input from interneurons being initially excitatory before switching to inhibitory as the cells mature. The computational modeling and accompanying analyses show how, when receiving only excitatory input, the newborn cells become responsive to stimuli similar to those that already cause high responses in other cells, but then, following the developmental switch such that they receive inhibitory input, those cells then gain responses to novel, but similar, inputs. In a simplified model, the authors are able to quantify the criterion of "sufficient similarity", such that if the novel inputs are not similar enough to the original ones the newborn cells to not gain responses to them. The authors demonstrate that such only when newborn cells are incorporated into the network and respond to the novel stimuli, can those stimuli be categorized by the network, as necessary for them to be recognized.
A major achievement of the paper is to identify a role in information processing for the developmental shift in reversal potential of chloride ions. Such a role is well supported by the results in the paper.
As in all modeling papers, some choices must be made so as to simplify the system to render it tractable and its behavior understandable. Some of the choices could be better justified or discussed, as highlighted below.
The normalization of inputs such that the L2norm is fixed seems rather unusual, and is not clearly the actual impact of feedforward inhibition. It would be nice to know whether this feature of the input vector is important. Would it matter if the L1norm were used for normalization, or if normalization were not precise? Perhaps a comment on the importance of this could be made, as well as a justification for the choice.
We thank the reviewer for raising an important point. In our model, it is merely a practical simplification to consider input patterns which all have an L2norm of 1. Indeed, it ensures that the upper bound of the L2norm of the feedforward weight vectors onto newborn DGCs is fixed and identical for all newborn DGCs (see Methods, section “Direction and length of the weight vector”).
In competitive networks, it is in general important that the length of the feedforward weight vectors onto different DGCs is similar, otherwise the cells with longer weight vectors would have an unfair advantage and thus have higher probability to win the competition. Since the input is normalized and the weight vector converges to the center of inputs for which it becomes active, it follows that all the final weights vectors are expected to be of similar length. Indeed, in our model, at the end of maturation, newborn DGCs do not get identical weight vector lengths, though the distribution is rather narrow. For unprecise L2normalization of the inputs, we expect that if the imprecision is uniform over the input space, then our neurogenesis model should still do fine, because the weight vector length of each newborn DGC would depend on input pattern lengths that have the same statistics. However, if the input space has some regions of input patterns that have much higher L2norm than other regions of input patterns, then we expect the clusters with higher norm to be well represented by model newborn DGCs, while clusters with low L2norms would be poorly represented, therefore discrimination of input patterns would decrease.
If the L1norm was used for input normalization, it would yield weight vectors with slightly variable L2norms nonuniformly distributed in the input space. Without loss of generality, we can consider 2dimensional inputs for a visual explanation. If the L1norm of input patterns is fixed to a value R, then inputs whose direction is horizontal (arrowhead at (R,0)) or vertical (arrowhead at (0,R)) will have larger L2norm (R) than inputs whose direction is diagonal to the xyplane (arrowhead at (R/2,R/2), so L2norm=sqrt(½)*R). Therefore, difference between L1 and L2 normalization will show up when one compares diagonally oriented input vectors with inputs aligned with one of the axes.
We added an explanation for the reason for our L2normalization implementation choice in lines 10331039. It reads:
“Normalization of inputs (be it implemented algorithmically as done here or by explicit inhibitory feedback) ensures that, once weight growth due to synaptic plasticity has ended and weights have stabilized, the overall strength of input onto DGCs is approximately identical for all cells (see Section Direction and length of the weight vector). Equalized lengths of weight vectors are, in turn, an important feature of classic soft or hard competitive networks (Kohonen, 1989; Hertz et al., 1991).”
Throughout the manuscript, the authors employ a homeostatic term in the postsynaptic firing rate. The term is called "heterosynaptic" by the authors, but strictly it is not, since it does not depend on other presynaptic inputs. Rather, the plasticity rule effects "firing rate homeostasis" and is implemented in a manner similarly to Renart, Song, and Wang (2003). I think this should be mentioned at a minimum and perhaps entirely renamed throughout the manuscript.
In our manuscript, we used the definition of “heterosynaptic” that can be found in Chistiakova et al., (2014) and Zenke and Gerstner, (2017). It is a homeostatic mechanism but differs from standard experimentally observed synaptic scaling mechanisms (Turrigiano et al., 1998) mainly by the fact that it occurs on a much shorter timescale: “Homeostatic plasticity differs from the heterosynaptic plasticity […] in two important aspects. First, homeostatic plasticity requires nonspecific dramatic changes of neuronal activity over prolonged periods, which are unlikely to happen during everyday life and learning. Second, it operates on a very long time scale, hours and days, and thus cannot counteract runaway dynamics induced within seconds and minutes by Hebbiantype learning rules.” (Chistiakova et al., 2014). In Renart et al., (2003), they model a “homeostatic” mechanism: it has a long characteristic timescale. However, their actual implementation differs: “since what matters for the steady state of the very slow scaling process is the integrated activity of each cell across different stimuli, we have replaced this temporal average by a spatial average carried out over several network simulations run in parallel” (Renart et al., 2003).
In this terminology, “homeostatic” and “heterosynaptic” mechanisms both depend on postsynaptic activity but are independent of presynaptic activity (the “heterosynaptic” term in our synaptic plasticity rule does not depend on the presynaptic firing rate x_{j}). Since they do not depend on the identity of the presynaptic neuron, heterosynaptic plasticity affects several synapses in parallel, hence the terminology. For example, a strong presynaptic input at synapse j may cause the postsynaptic neuron to fire at a very high rate. The heterosynaptic term then lowers the strength of other synapses k nonequal j, independent of the presynaptic activity. On the other hand, “homosynaptic” terms are Hebbian: they depend on pre and postsynaptic activity.
As a personal aside, I (Wulfram) would add that we switched to the term heterosynaptic in my lab after the work of Friedemann Zenke. Whenever I gave a talk and mentioned homeostatic plasticity, all participants immediately had in mind the beautiful work of Turrigiano – but this work considers changes on the time scale of 24 hours. However, Zenke showed that you cannot stabilize firing rates in a recurrent network if the homeostatic mechanism works on the time scale of several hours. We had a meeting at the Royal Academy of Sciences in London some years ago where homeostatic mechanisms were debated controversially. I would like to pull out of the controversy by talking about heterosynaptic plasticity that contributes to firing rate homeostatis. Importantly, the heterosynaptic plasticity that we use is in the same equation (and on the same time scale) as the Hebbian plasticity. Two review papers of Friedemann Zenke give the main arguments for this terminology.
To make this issue clearer to the reader and avoid potential confusion of terminology, we now say in the text always heterosynaptic plasticity inducing rapid homeostatic weight stabilization, for example:
– lines 185192: “The third term on the righthandside of equation (1) implements heterosynaptic plasticity (Chistiakova et al., 2014; Zenke and Gerstner, 2017): whenever the postsynaptic neuron fires at a rate above theta, all weights are downregulated independent of presynaptic activity. It ensures that the weights cannot grow without bounds (Methods). In this sense, the third term has a 'homeostatic' function (Zenke and Gerstner, 2017), yet acts on a time scale faster than experimentally observed homeostatic synaptic plasticity (Turrigiano et al., 1998).”
– lines 240244: “A detailed mathematical analysis (Methods) shows that heterosynaptic plasticity in equation (1) ensures that the total strength of the receptive field of each selective DGC converges to a stable value which is similar for selective DGCs confirming the homeostatic function of heterosynaptic plasticity (Zenke and Gerstner, 2017).”
– lines 921925: “Moreover, all weights onto neuron i are downregulated heterosynaptically by an amount that increases supralinearly with the postsynaptic rate nu_{i}, implicitly controlling the length of the weight vector (see below) similar to synaptic homeostasis (Turrigiano et al., 1998) but on a rapid time scale (Zenke and Gerstner, 2017).”
The authors consider the ability of the network to discriminate novel patterns as the newborn cells gain responses to the novel patterns. I assume that the formation of new responses arises when the novel patterns are presented randomly amidst the set of previously learned patterns. It would be valuable to see if there are prediction of any differences in behavior if the novel patterns were presented alone, or if there are any impacts of different manners of interspersing learned patterns with novel patterns.
It is correct that in our implementation, novel patterns are presented randomly amidst the set of previously learned patterns. But an aspect that is even more important (as shown with our simplified network model) is whether novel patterns are, or are not, similar enough to familiar patterns. Let us discuss four situations:
First, if similar novel patterns are presented alone during the early phase of maturation, we expect that the feedforward weight vector onto newborn DGCs will grow directly in the direction of the center of mass of the novel patterns. Indeed, if novel patterns are similar to familiar patterns, they will activate mature DGCs which will indirectly activate the newborn DGCs. Thus, whether novel patterns similar to familiar patterns are presented alone or amidst familiar patterns ultimately makes no difference at the end of maturation.
Second, in the late phase of maturation we expect the feedforward weight vector onto newborn DGCs to represent different prototypes of the novel similar patterns, no matter if only novel patterns are presented or if both novel and familiar patterns are presented. (However, if only familiar patterns are presented during the late competitive phase of maturation, newborn DGCs will never win the competition (because they were selective for novel patterns at the end of the early phase), hence stay silent and not update their weights.)
Third, we show that the timing of introduction of the novel patterns is important (lines 455472). Introduction of similar novel patterns must be early enough, otherwise feedforward weight vectors onto newborn DGCs grow in a direction which is too far from novel patterns. However, if novel patterns are introduced towards the end of the early phase but are not interspersed with familiar patterns, we expect that a shorter period is sufficient for changing the direction of the weight vector to a direction which is close enough from novel patterns to be able to become selective for them in the late phase of maturation.
Fourth, we expect the L2norm of the feedforward weight vector onto newborn DGCs to grow faster in the early phase if only familiar patterns are presented initially.
In summary, a blocked presentation of novel patterns that are similar to familiar patterns timed towards the end of the early phase is actually helpful – and an interspersed presentation is actually a scenario which yields slower learning. The advantage of the interspersed presentation is that the timing does not need to be optimized, which is the reason for our implementation choice.
We now specify more clearly our implementation and its rationale (lines 267271):
“To mimic exposure of an animal to a novel set of stimuli, we now add input patterns from digit 5 to the set of presented stimuli, which was previously limited to patterns of digits 3 and 4. The novel patterns from digit 5 are randomly interspersed into the sequence of patterns from digits 3 and 4; in other words, the presentation sequence was not optimized with a specific goal in mind.”
Lines 6891 contain a lot of details of the circuitry, many of which are not included in the model. It would be helpful to have a figure showing the full circuit based on the information written (which is rather hard to take in through one reading) and beside it to include a figure of the model circuit so the reader can easily see what is being simplified and omitted.
Thanks for the suggestion. We added a new Figure 1 —figure supplement 1 and refer to it in the main text where we write about the circuitry (lines 71, 92, 141, 216, 854, 860).
Lines 118119: I think you should mention here or in the Discussion that you have selected a specific set of synapses to undergo plasticitythat is, if I understand correctly, you have ignored any plasticity with the Dentate Gyrus.
Thank you for pointing out the need for more clarity. In our model, the only synapses that are plastic and follow our plasticity rule are those between EC and DGCs. There is no plasticity rule involved within the dentate gyrus: connections are absent or present (with fixed values). However, the connections between newborn DGCs (E) and inhibitory neurons (I) within dentate gyrus still changes as a function of maturation: from no EtoI and fixed positive ItoE connections in the early phase to fixed positive EtoI and fixed negative ItoE connections in the late phase.
To clarify this important point, we now specify that it is the synaptic connections between EC and newborn DGCs which are plastic, and write in lines 120123 (Introduction):
“[…] our model uses an unsupervised biologically plausible Hebbian learning rule that makes synaptic connections between EC and newborn DGCs either disappear or grow from small values at birth to values that eventually enable feedforward input from EC to drive DGCs.”
Furthermore, we explicitly mention the lack of plasticity at other synapses within the dentate gyrus in lines 200202 (Results):
“For simplicity, no plasticity rule was implemented within the dentate gyrus: connections between newborn DGCs and inhibitory cells are either absent or present with a fixed value (see below).”
Line 1678, Equation 1: This equation appears to be different from that of Equation6 in the methods. In particular the "HET" function depends on postsynaptic ratecubed, not just the difference between rate and threshold as suggested here. Why not just write the exact equation and indicate/describe the behavior of each term?
We agree that our “simplified” expression ended up being more confusing than selfexplanatory. As suggested, we wrote the exact equation in the Results (line 174175).
Line 260261: The terminology is a bit confusing, as activation is not clearly a "change in membrane potential" but a change in firing rate, so has different units to the reversal potential. Especially as the membrane potential must be venturing above threshold to produce some spiking activity. Perhaps the criterion is equivalent to "the activity is low enough that the mean membrane potential remains below the reversal potential of the chloride channels"?
Thanks for pointing out the inaccuracy of our terminology, and for the reformulation suggestion. We modified lines 281285 as follows:
“We assume that in natural settings, the activation of GABA_{A} receptors is low enough that the mean membrane potential remains below the chloride reversal potential at which shunting inhibition would be induced (Heigele et al., 2016). In this regime, the net effect of synaptic activity is hence excitatory.”
Line 2723: The statement about the switch in excitability here assumes we already know it, though it is described in the methods much later. Perhaps this sort of issue is inevitable in journals where methods are placed after results, but it would be better if the order were reversed!
Even though we mentioned the change in excitability in the Introduction (lines 3839), we indeed did not say that we were including it in our model. We hope that we solved this issue by now mentioning that we do model the change in excitability of newborn DGCs by modifying their firing threshold early in the Results section, lines 143148:
“Firing rates are modeled by neuronal frequencycurrent curves that vanish for weak input and increase if the total input into a neuron is larger than a firing threshold. Since newborn DGCs exhibit enhanced excitability early in maturation (SchmidtHieber et al., 2004; Li et al., 2017), the firing threshold of model neurons increases during maturation from a lower to a higher value (Methods).”
Line 320: I see no justification for a oneway ttest. I think they should be twoway unless it is only possible a priori for a change in one direction.
Thanks for your careful reading. It is true that the change in classification performance could be in both directions. However, we were only interested by a potential difference from a zero change: we did not compare the two distributions themselves (a twoway ttest would test if the mean of one distribution is different from the mean of the other distribution). We clarify this in the new version (lines 353360):
“Across ten simulation experiments, classification performance is significantly higher when a novel ensemble of patterns is learned sequentially by newborn DGCs (P_{2}; Supplementary File 1), than if all patterns are learned simultaneously (P_{1}; Supplementary File 1). Indeed, the distribution of P_{2}P_{1} for the ten simulation experiments has a mean which is significantly different from zero (Wilcoxon signed rank test: pval = 0.0020, Wilcoxon signed rank = 55; oneway ttest: pval = 0.0269, tstat = 2.6401, df = 9; Supplementary File 1).”
Line 338: The mention of fixing one set of inputs arises out of nowhere without justification – though that justification comes later as this is just one of two controls. I think it would be better with the order reversed. Or, at least when it is first mentioned here, please be clear why this was chosen, as – I assume – the feedforward weights to selective cells are not fixed in the main set of results. If feedforward weights to selective cells are fixed in the earlier sections, then it should be clearly mentioned, as I did not notice it.
Thanks for your comment. In addition to the next comment, it made us realize that our text from section “Mature neurons represent prototypical input patterns” to section “The GABAswitch guides learning of novel representations” was not completely clear and the flow was sometimes maybe misleading. Therefore, we slightly reorganized the text, and added some details on the implementation. Here are the main changes we made:
– We added a sentence to mention that pretraining starts from random (nonzero) weights, and that all DGCs have the same learning rate during pretraining, lines 206210: “Hence we start with a network that already has strong random ECtoDGC connection weights (Methods). We then pretrain our network of 100 DGCs using the same learning rule (equation (1), with identical learning rate eta for all DGCs) that we will use later for the integration of newborn cells.”
– We moved the information about setting the learning rate of selective cells to zero (which was previously at the beginning of section “Newborn neurons become selective for novel patterns during maturation”) to the end of section “Mature neurons represent prototypical input patterns” and explicitly state that mature cells are not plastic in our main implementation (but we included results for when they are), now lines 250256: “After convergence of synaptic weights during pretraining, selective DGCs are considered mature cells. Mature cells are less plastic than newborn cells (SchmidtHieber et al., 2004; Ge et al., 2007). So in the following, unless specified otherwise, we set eta=0 in equation (1) for mature cells (feedforward connection weights from EC to mature cells remain therefore fixed). A scenario where mature cells retain synaptic plasticity is also investigated (see Robustness of the model and Supplementary File 4).”
– We state that newborn DGCs in the main neurogenesis model start with zero feedforward weights, lines 261263: “In our main neurogenesis model, we replace unresponsive model units by plastic newborn DGCs (eta > 0 in equation (1)) which receive lateral GABAergic input but do not receive feedforward input yet (all weights from EC are set to zero).”
– In lines 348350, we allude that “pretraining with three digits” (our control 1) is implemented in the exact same way as “pretraining with 2 digits”, except that patterns from three digits (instead of two) are presented: “We compare this performance with that of a network where all three digit ensembles are directly simultaneously pretrained starting from random weights (Figure 3a, control1).”
– We moved the paragraph about introducing two novel digits (previously in lines 322331) to the end of the previous section “Newborn neurons become selective for novel patterns during maturation” (now lines 329338). We believe that the flow is better in this way, as this paragraph talks about a variant of the main neurogenesis results (where two novel digits instead of one are introduced during maturation of the newborn cells), and not about the pretraining control (as it may have previously suggested).
We slightly modified section “The GABAswitch guides learning of novel representations” (now lines 361394) to more clearly contrast controls 2 and 3 with the main neurogenesis model in terms of learning rate of the selective cells and the connectivity of the unresponsive cells.
Line 364: Following the previous comment, this line suggests that feedforward weights are not fixed in your primary results with good discrimination. Please clarify if this statement is constrained to the networks without neurogenesis.
We hope that with the reorganization of the text as explained in the previous point, this statement (now at lines 392393) is now clear. Specifically, this statement applies to control 3: all DGCs keep plastic feedforward weights. However, unresponsive units at the end of pretraining are not replaced by newborn DGCs which undergo maturation. Upon presentation of novel patterns, those unresponsive units have low probability to become selective, because their feedforward weight vector has a low norm, and they probably point outside of the space of presented inputs (otherwise they would have become selective earlier). Therefore, novel patterns are learned by cells that were already selective for (similar) familiar patterns. On the other hand, in the main neurogenesis model, newborn DGCs (previously unresponsive units) learn the novel patterns, while the selectivity of mature DGCs is not overwritten since they are not plastic.
Reviewer #2:
Gozel and Gerstner investigated the functional role of adult neurogenesis in the dentate gyrus using simulations and mathematical analysis of a computational model. The novelty of the paper compared to numerous previous studies in the field is the inclusion of the GABAergic switch from excitation to inhibition of new neurons during the maturation process. So far this has been overlooked in the computational literature. G&G propose an elegant and potentially interesting idea for how the two phase maturation process could be functionally beneficial for an animal tasked with discriminating stimuli, and would be the first to recapitulate the experimental finding of adult neurogenesis contributing to pattern separation of similar but not distinct stimuli.
However, my assessment is that the current model simulations and analysis are not sufficient to support for the claims made in the paper. Furthermore, the main experimental finding that can be understood based on this modeling work is emergence pattern separation for similar but not distinct stimuli. While interesting, this is rather technical, and may depend quite strongly on the details of the model.
1. The input stimuli from the MNIST dataset presented to the network are low dimensional to a very good approximation ("3", "4", "5"), in contrast to the type of stimuli a real network would be presented with which are expected to be highdimensional.
1.1. In the model analyzed, the narrowness of the distribution of synaptic weight vector norms is important for network stability. This narrow distribution could at least in part be inherited from the low dimensionality of stimuli (all "3's" have large overlap with the "average 3"). If the overlaps of different stimuli are broadly distributed, so will the distribution of how many input patterns each neuron is a "winner" in. It is important to test this stability in more realistic stimulus ensembles, perhaps by controlling the width of the overlap distribution using the binary model the authors present towards the end of the paper.
1.2. The authors claim that synapses of newborn DGCs starting the maturation process from 0 is important for solving the problem of unresponsive neurons. The reason is that during this phase the synaptic weight vector becomes aligned with a specific direction of the input space. It is possible that unresponsive neurons are stuck in a local minimum (like the case with no neurogenesis) precisely because stimuli (and overlaps) are narrowly distributed around the mean. If stimuli are more broadly distributed (~higher dimension), the basins of attraction are expected to be more numerous and more shallow. Therefore one may expect the problem of the system getting stuck in a local minimum to be far less severe in this case, and for "control 2" networks to learn well.
We thank the reviewer for this important suggestion. Our datasets are indeed rather lowdimensional. To investigate how our network behaves when the input space is higher dimensional, as suggested, we used our simple network and we created new handmade datasets with higher dimensionality. Our dimensionality measure is now defined in a new Method section “Effective dimensionality and participation ratio” (lines 13491353). We included those results in Supplementary File 5 and in a new Results section “The cooperative phase of maturation promotes pattern separation for any dimensionality of input data” (lines 601678):
“Despite the fact that input patterns in our model represent the activity of 144 or 128 model EC cells, the effective dimensionality of the input data was significantly below 100 because the clusters for different input classes were rather concentrated around their respective center of mass. We define the effective input dimensionality as the participation ratio (Mazzucato et al., 2016; LitwinKumar et al., 2017) (Methods). Using this definition, the input data of both the MNIST 12x12 patterns from digits 3, 4 and 5 and the seven clusters of the handmade dataset for similar patterns (s=0.8) are relatively lowdimensional (PR=19 out of a maximum of 144, and PR=11 out of a maximum of 128, respectively). We emphasize that in both cases the spread of the input data around the cluster center implies that the effective dimensionality is larger than the number of clusters. In natural settings, we expect the input data to have even higher dimension. Therefore, here we investigate the effect of dimensionality of the input data on our neurogenesis model by increasing the spread around the cluster centers.
We use our simplified network model and create similar artificial datasets (s=0.8) with different values for the concentration parameter kappa (Methods). The smaller the kappa, the broader the distributions around their center of mass, hence the larger the overlap of patterns generated from different cluster distributions. Therefore, we can increase the effective dimensionality of the input by decreasing the concentration parameter kappa. First, as expected from our analytical analysis (Methods), we find that the broader the cluster distributions the smaller the length of the feedforward weight vector onto newborn DGCs (from just below 1.5 with kappa = 10^{4} to about 1.35 with kappa = 6 * 10^{2}). Second, we examine the ability of the simplified network to discriminate input patterns coming from input spaces with different dimensionalities. To do so, we compare our neurogenesis model (Neuro.) with a random initialization model (RandInitL.). In both cases, two DGCs are pretrained with patterns from two clusters, as above. Then we fix the weights of the two mature DGCs and introduce patterns from a third cluster as well as a newborn DGC. For the neurogenesis case, after maturation of the newborn DGC we fix its weights (while for the random initialization model we keep them plastic) upon introduction of patterns from a fourth cluster as well as another newborn DGC, and so on until the network contains seven DGCs and patterns from the full dataset of seven clusters have been presented. We compare our neurogenesis model, where each newborn DGC starts with zero weights and undergo a twophase maturation (1 epoch per phase), with a random initialization model where each newborn DGC is directly fully integrated into the circuit and whose feedforward weight vector is randomly initialized with a length of 0.1 (RandInitL.) and is then learned for 2 epochs.
Since clusters can be highly overlapping, we assess discrimination performance by computing the reconstruction error at the end of training. Reconstruction error is evaluated analogously to classification error, except that the readout layer has the task of an autoencoder: it contains as many readout units as there are input units. Reconstruction error is the mean squared distance between the input vector and the reconstructed output vector based on testing patterns. We observe that for any dimensionality of the input space, even as high as 97dimensional, the neurogenesis model performs better (has a lower total reconstruction error) than the random initialization model (Supplementary File 5). Indeed, in the neurogenesis case newborn DGCs grow their feedforward weights (from zero) in the direction of presented input patterns in their early cooperative phase of maturation and can later become selective for novel patterns during the competitive phase. In contrast, since the random initialization model has no early cooperative phase, the newborn DGC weight vector does not grow unless an input pattern is by chance well aligned with its randomly initialized weight vector (which is unlikely in a high dimensional input space). We get similar results for a larger initialization of the synaptic weights (e.g., the length of the weight vector at birth is set to 1, results not shown). Importantly, in high input dimensions, the advantage of a larger weight vector length at birth in the random initialization model is overridden by the capability of newborn DGCs to grow their weight vector in the appropriate direction during their early cooperative phase of maturation. Finally, we note that even if the length of the feedforward weight vector onto newborn DGCs is set to 1.5 (RandInitH., Supplementary File 5), which is the upper bound according to our analytical results (Methods), the random initialization model performs worse than the neurogenesis model for low up to relatively highdimensional input spaces (PR=83, Supplementary File 5) despite its advantage in the competition conferred by the longer weight vector. It is only when input clusters are extremely broad and overlapping that the random initialization model performs similarly to the neurogenesis model (PR=90,97, Supplementary File 5). In other words, a random initialization at full length of weight vectors works well if input data is homogeneously distributed on the positive quadrant of the unit sphere but fails if the input data is clustered in a few directions. Moreover, random initialization requires that synaptic weights are large from the start which is biologically not plausible. In summary, the twophase neurogenesis model is advantageous because the feedforward weights onto newborn cells can start at arbitrarily small values; their growth is, during the cooperative phase, guided to occur in a direction that is relevant for the task at hand; the final competitive phase eventually enables specialization onto novel inputs.”
Reply to point 1.1: The narrowness of the distribution of synaptic weight vector norms is indeed important. If the norm of a feedforward weight vector onto a particular DGC would be much larger than the norm of all other feedforward weight vectors onto other DGCs, then that particular DGC would be a winner of the competition upon presentation of more heterogeneous input patterns. In other words, that particular neuron would have much broader tuning, while other DGCs would have narrower tuning. However, our model ensures that the L2norm of the feedforward weight vectors onto newborn DGCs at the end of their maturation reaches a value between a lower bound and an upper bound (see Methods, “Direction and length of the weight vector”). It implies that if input patterns are highly concentrated around a center of mass (for example in our standard handmade artificial dataset where the concentration parameter kappa=10^{4}) the L2norm will end up being higher than if patterns are broadly distributed around their center of mass (for example cases where kappa < 10^{4}). The only way for L2norms to being widely different is if input patterns are heterogeneously distributed: for example if there is a high concentration around a center of mass (CM1) and a low concentration around another (CM2) then the DGC selective for CM1 will have a larger L2norm than the DGC selective for CM2. We do not expect this to be a problem if the patterns from cluster 2 are far enough from the patterns of cluster 1 in input space. Furthermore, we note that if patterns from cluster 2 are distributed so broadly that they overlap with patterns from concentrated cluster 1, then: (1) those patterns will be classified by the network as belonging to cluster 1, even though they were initially generated from a “broad cluster 2” distribution; and (2) since those patterns now belong to cluster 1, they do not activate the DGC selective for cluster 2 anymore, hence the distribution of patterns that activate DGC2 becomes narrower and the L2norm of DGC2 increases (and thus gets closer to the one of DGC1).
Reply to point 1.2: Our new results (see section “The cooperative phase of maturation promotes pattern separation for any dimensionality of input data”) shed light on this point. Briefly, our neurogenesis model still performs better than a random initialization model (aka “control 2”) even for relatively high input space dimensionalities, because the early cooperative phase acts as a smart initialization for the growth of the feedforward weight vector onto the newborn DGC in the appropriate direction. The higher the input space dimensionality, the more advantageous this smart initialization, as there is low probability that a randomly initialized feedforward weight vector onto a newborn cell is sufficiently well aligned with input patterns to become selective for them. It is only when classes are extremely broad and overlapping and that the feedforward weight vector starts with a large norm that the random initialization model performs as well as our neurogenesis model. These results agree with the experimental observation that adult dentate gyrus neurogenesis helps for the discrimination of similar patterns, but not distinct patterns. Similar patterns are close to each other's in input space, therefore smaller deeper basins of attraction are needed to discriminate them. On the other hand, distinct patterns are far from each other's in input space, hence larger and shallower basins of attraction are sufficient to discriminate between them.
2. Setting no plasticity (eta = 0) for mature cells is a very strong assumption. Some protocols (e.g., TBS_{2} in SchmidtHieber 2004; and others in Ge 2007) lead to ~2 fold increase in plasticity in young vs. mature neurons. Since mature neurons significantly outnumber young neurons, the effect of plasticity in mature neurons cannot be neglected altogether, especially since the paper's main focus is on the integration of newborn neurons into the circuit. Given the actual degree of synaptic plasticity in mature neurons (according to the papers that the authors themselves cite) I expect the behavior of the authors model to be much closer to "control 3". To support their claims, I think the authors should show that their network compares favorably to control 3 even if DGCs remain plastic throughout (but to a lesser extent). In this scenario I expect the fraction of neurons that are new at any given time to be much more important than the current model, since the mature part of the network is fixed. Therefore this fraction should also be matched to experiments.
The reviewer raises an interesting point. We addressed this concern with new simulations with the main and extended networks. We added a new Supplementary File 4, and report the new results in section “Robustness of the model” (lines 473495):
“Finally, in our neurogenesis model, we have set the learning rate of mature DGCs to zero despite the observation that mature DGCs retain some plasticity (SchmidtHieber et al., 2004; Ge et al., 2007). We therefore studied a variant of the model in which mature DGCs also exhibit plasticity. First, we used our main model with 100 DGCs and 21 newborn DGCs. The implementation was identical, except that the learning rate of the mature DGCs was kept at a nonzero value during the maturation of the 21 newborn DGCs. We do not observe a large change in classification performance, even if the learning rate of the mature cells is the same as that of newborn cells (Supplementary File 4). Second, we used our extended network with 700 DGCs to be able to investigate the effect of plastic mature DGCs while having a proportion of newborn cells matching experiments. We find that with 35 newborn DGCs (corresponding to the experimentally reported fraction of about 5%), plastic mature DGCs (with a learning rate half of that of newborn cells) improve classification performance (Supplementary File 4). This is due to the fact that several of the mature DGCs (that were previously selective for '3's or '4's) become selective for prototypes of the novel digit 5. Consequently, more than the 35 newborn DGCs specialize for digit 5, so that digit 5 is eventually represented better by the network with mature cell plasticity than the standard network where plasticity is limited to newborn cells. Note that those mature DGCs that had earlier specialized on writing styles of digits 3 or 4 similar to a digit 5 are most likely to retune their selectivity. If the novel inputs were very distinct from the pretrained familiar inputs, mature DGCs would be unlikely to develop selectivity for the novel inputs.”
3. It is not clear to me how the two phase maturation process of DGCs would be affected in a scenario where at any given point some DGCs are in the excitatory phase of GABA and others are in the inhibitory phase. This would be expected if there is a continuous stream of new neurons. Would the plasticity of the neurons in the inhibitory phase not interfere with aligning the activity to similar stimuli due to plasticity of neurons in the excitatory phase? If there is interference, would the authors then predict that neurogenesis occurs in waves (i.e., some kind of global signal would coordinate transition from phase 1 to 2 across synapses)?
Is there evidence supporting that?
It seems to me that the calculation in the section "Analytical computation of the L2norm and angle"at least in principlebe extended to estimate the interference: the competition due to plasticity of neurons in the inhibitory phase increases the angle phi, and thus slows down the alignment of the weights due to plasticity of neurons in the excitatory phase.
It is a good point that in a true online scenario where newborn DGCs are born continuously, some of them would be in their early phase (excitatory phase of GABA) while others would be in their late phase (inhibitory phase of GABA). It is indeed interesting to determine if this would lead to interference in the proper maturation of any of these newborn DGCs. We tackle this question in two ways. (1) Do newborn DGCs in the late (GABA inhibitory) phase interfere with the ability of newborn DGCs in the early (GABA excitatory) phase to become selective for familiar inputs? (2) Do newborn DGCs in the early (GABA excitatory) phase interfere with the ability of newborn DGCs in the late (GABA inhibitory) phase to become selective for novel patterns?
1) In our model, newborn DGCs in the early phase of maturation (=earlyDGCs) receive indirect GABAergic excitation (through inhibitory neurons) from surrounding mature DGCs and from newborn DGCs which are in their late phase of maturation (=lateDGCs). Throughout their late phase of maturation, lateDGCs will become more and more selective for novel input patterns. Therefore, as they go through their late phase of maturation, they will push the configuration of the feedforward weight vector onto the earlyDGCs from towards the center of mass of input patterns that are well represented by mature DGCs to the center of mass of all input patterns (i.e. also the ones that were introduced during the early phase of maturation of lateDGCs). We therefore do not expect lateDGCs to interfere with the ability of earlyDGCs to become selective for familiar inputs. Rather, at the end of their maturation, earlyDGCs will either ultimately become selective for novel patterns that are similar to the ones for which mature DGCs are selective (aka different prototypes of familiar inputs), or similar to the ones for which lateDGCs eventually became selective. The alignment itself will depend on the stage of lateDGCs that indirectly activate the earlyDGCs.
2) In our model, newborn DGCs in the early phase of maturation (=earlyDGCs) do not project yet to inhibitory neurons (or any neurons). Therefore, they do not affect the circuit, and the activity of newborn DGCs in the late phase of maturation (=lateDGCs) is independent of the activity of earlyDGCs. Therefore, we do not expect a “slowing down of the alignment of the weights due to plasticity of neurons in the excitatory phase”.
Accordingly, our model does not predict that neurogenesis occurs in waves, and we are not aware of experimental evidence suggesting it.
107, Review of functional role of DGCs.
Aljadeff at al., 2015,
ShaniNarkiss et al., 2020
suggest a dynamical role for new neurons.
Thanks, they are now included in lines 108109.
312, It would be interesting if the advantage of adding newborn neurons stimulated with "5" to a network pretrained with "3" and "4" over a network pretrained with "3", "4", and "5" would persist if some amount of plasticity remains in mature neurons (Figure 3d).
We hope that we answered this point in the “main concern 2” and related further analyses. Plastic mature DGCs do not affect classification performance of our neurogenesis model. Rather, it does improve it in some cases. Therefore, it is closer to the main model in terms of performance, and still better than simultaneous pretraining of digits 3, 4 and 5 (our “control 1”).
614, It would be good to discuss the possibility that neurotransmitter switch (without neurogenesis) has the same functional role as GABA switch in the current model. See e.g., Li et al., (2020) J Neuroscience.
Furthermore, can this model teach us anything about neurogenesis in the olfactory bulb? Is there an E to I switch there too?
We now touch upon a potential link between neurotransmitter switching and the GABA switch in our model in lines 832837:
“Finally, while neurotransmitter switching has been observed following sustained stimulation for hours to days (Li et al., 2020), it is still unclear if it has the same functional role as the GABA switch in our model. In particular, it remains an open question if neurotransmitter switching promotes the integration of neurons in the same way as our model GABA switch does in the context of adult dentate gyrus neurogenesis.”
The suitability of our model to investigate adult olfactory bulb neurogenesis is now considered in the Discussion (lines 779790):
“The parallel of neurogenesis in dentate gyrus and olfactory bulb suggests that similar mechanisms could be at work in both areas. Yet, even though adult olfactory bulb neurogenesis seems to have a similar functional role to adult dentate gyrus neurogenesis (Sahay et al., 2011b), follow a similar integration sequence and undergo a GABA switch from excitatory to inhibitory, the circuits are different in several aspects. First, while newborn neurons in dentate gyrus are excitatory, newborn cells in the olfactory bulb are inhibitory. Second, the newborn olfactory cells start firing action potentials only once they are well integrated (Carleton et al., 2003). Therefore, in view of a transfer of results to the olfactory bulb, it would be interesting to adjust our model of adult dentate gyrus neurogenesis accordingly. For example, a voltagebased synaptic plasticity rule could be used to account for subthreshold plasticity mechanisms (Clopath et al., 2010).”
725, Miller and Fumarola may not be the right reference to cite here. This specific nonlinearity (rectified tanh) is not standard and is not included in that paper.
Thanks for pointing out that our formulation was too dense here. In fact, there are two different aspects.
i) In Miller and Fumarola, (2012), they show the mathematical equivalence of two expressions commonly used for rate models: a “voltage equation” (their equation (1)) and a “firing rate equation” (their equation (2)). The function ‘f’ in their firing rate equation can be any nonlinear function. Therefore, our rate equation (2) corresponds to their equation (2) with a particular choice for the nonlinear function ‘f’: the rectified hyperbolic tangent.
ii) The rectified hyperbolic tangent function is a choice that “combines a hard threshold with a saturation” (equation (2.11) in the “Theoretical Neuroscience” book from Dayan and Abbott). Through rectification, negative firing rates, which are not biologically plausible, are avoided. Another way to avoid negative firing rate without rectification of the input would be to use ½ + (½)*tanh(input). The main difference is that the firing rate increases more slowly from 0 in that case, while it increases linearly with our rectified tanh. The rectified tanh is also closer to the fI curve of spiking neuron models such as the leaky integrateandfire model with absolute refractory period.
To clarify these issues, in the new version of the paper we now write (lines 871876):
“We assume that the DGCs have a frequencycurrent curve that is given by a rectified hyperbolic tangent (Dayan and Abbott, 2001) which is similar to the frequencycurrent curve of spiking neuron models with refractoriness (Gerstner et al., 2014). Moreover, we exploit the equivalence of two common firing rate equations (Miller and Fumarola, 2012) and let the firing rate nu_{i} of DGC i upon stimulation with input pattern $\overrightarrow{x}$ evolve according to: […]”
778, Definition of quasi orthogonal is not clear. The inhibitory rates can have fluctuations and temporal dynamics of their own even if the network is assumed to be silent when each stimulus is presented. Therefore inputs might be quasiorthogonal at one time but not at another. If in this is used just to qualitatively understand the network behavior, this somewhat sloppy definition is ok, but I think this caveat should be mentioned to avoid confusion.
Thanks for pointing this out. We now clarify that the definition is applied to the stationary state (i.e., after the transient); note that the model is noisefree and expected to be nonchaotic so that, apart from an initial transient, fluctuations cannot appear with stationary input. We now write (lines 943954):
“We emphasize that all synaptic weights, and all presynaptic firing rates nu_{j} are nonnegative: w_{ij} ≥ 0 and nu_{j} ≥ 0. […] Note that for a case without inhibitory neurons and with b_{i} > 0, we recover the standard orthogonality condition, but for finite b_{I >} 0 quasiorthogonality corresponds to angles larger than some reference angle.”
https://doi.org/10.7554/eLife.66463.sa2Article and author information
Author details
Funding
Swiss National Science Foundation (200020 184615)
 Wulfram Gerstner
Horizon 2020 Framework Programme (785907)
 Wulfram Gerstner
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Josef Bischofberger and Laurenz Wiskott for great discussions and useful remarks, as well as Paul Miller and an anonymous reviewer for constructive comments and suggestions. This research was supported by the Swiss National Science Foundation (no. 200020 184615) and by the European Union Horizon 2020 Framework Program under grant agreement no. 785907 (HumanBrain Project, SGA2).
Senior Editor
 John R Huguenard, Stanford University School of Medicine, United States
Reviewing Editor
 Tatyana O Sharpee, Salk Institute for Biological Studies, United States
Reviewer
 Paul Miller, Brandeis University, United States
Publication history
 Received: January 12, 2021
 Accepted: June 16, 2021
 Accepted Manuscript published: June 17, 2021 (version 1)
 Version of Record published: July 6, 2021 (version 2)
Copyright
© 2021, Gozel and Gerstner
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,355
 Page views

 198
 Downloads

 1
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading

 Computational and Systems Biology
 Evolutionary Biology
Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machinelearningdriven protein design – instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.

 Computational and Systems Biology
 Evolutionary Biology
Using a neural network to predict how green fluorescent proteins respond to genetic mutations illuminates properties that could help design new proteins.