A reservoir of timescales emerges in recurrent circuits with heterogeneous neural assemblies
Abstract
The temporal activity of many physical and biological systems, from complex networks to neural circuits, exhibits fluctuations simultaneously varying over a large range of timescales. Longtailed distributions of intrinsic timescales have been observed across neurons simultaneously recorded within the same cortical circuit. The mechanisms leading to this striking temporal heterogeneity are yet unknown. Here, we show that neural circuits, endowed with heterogeneous neural assemblies of different sizes, naturally generate multiple timescales of activity spanning several orders of magnitude. We develop an analytical theory using rate networks, supported by simulations of spiking networks with celltype specific connectivity, to explain how neural timescales depend on assembly size and show that our model can naturally explain the longtailed timescale distribution observed in the awake primate cortex. When driving recurrent networks of heterogeneous neural assemblies by a timedependent broadband input, we found that large and small assemblies preferentially entrain slow and fast spectral components of the input, respectively. Our results suggest that heterogeneous assemblies can provide a biologically plausible mechanism for neural circuits to demix complex temporal input signals by transforming temporal into spatial neural codes via frequencyselective neural assemblies.
Editor's evaluation
This fundamental work uses computational network models to suggest a possible origin of the wide range of time scales observed in cortical activity. This claim is supported by convincing evidence based on comparisons between mathematical theory, simulations of spiking network models, and analysis of recordings from the orbitofrontal cortex. This manuscript will be of interest to the broad community of systems and computational neuroscience.
https://doi.org/10.7554/eLife.86552.sa0Introduction
Experimental evidence shows that the temporal activity of many physical and biological systems exhibits fluctuations simultaneously varying over a large range of timescales. In condensed matter physics, spin glasses typically exhibit aging and relaxation effects whose timescales span several orders of magnitude (Bouchaud, 1992). In biological systems, metabolic networks of E. coli generate fluxes with a powerlaw distribution of rates (Almaas et al., 2004; Emmerling et al., 2002). Gas release in yeast cultures exhibits frequency distributions spanning many orders of magnitude (Roussel and Lloyd, 2007), endowing them with robust and flexible responses to the environment (Aon et al., 2008). In the mammalian brain, a hierarchy of timescales in the activity of single neurons is observed across different cortical areas from occipital to frontal regions (Murray et al., 2014; Siegle et al., 2019; Gao et al., 2020). Moreover, neurons within the same local circuit exhibit a large range of timescales from milliseconds to minutes (Bernacchia et al., 2011; Cavanagh et al., 2016; Miri et al., 2011). This heterogeneity of neuronal timescales was observed in awake animals during periods of ongoing activity in the absence of external stimuli or behavioral tasks, suggesting that longtailed distributions of intrinsic timescales may be an intrinsic property of recurrent cortical circuits. Recent studies highlighted the benefits of leveraging computations on multiple timescales when performing complex tasks in primates (Iigaya et al., 2019) as well as in artificial neural networks (PerezNieves et al., 2021). However, the neural mechanisms underlying the emergence of multiple timescales are not yet understood.
Here, we present a simple and robust neural mechanism generating heterogeneous timescales of activity in recurrent circuits. The central feature of our model is a heterogeneous distribution of cell assemblies, a common ingredient observed in cortical architecture (Perin et al., 2011; Marshel et al., 2019; Figure 1a). We first demonstrate that rate networks, whose rate units represent cellassemblies, can generate longtailed distributions of timescales when endowed with heterogeneous assemblies (Figure 1b). We then show that the heterogeneity of timescales, observed in electrophysiological recordings from awake primate cortex (Cavanagh et al., 2016), can be explained by the presence of heterogeneous cell assemblies (Figure 1b). Using methods from statistical physics, we develop an analytical framework explaining how an assembly’s intrinsic timescale depends on size, revealing the emergence of a new chaotic regime where activity is bistable. We show that our theory applies to biologically plausible models of cortical circuits based on spiking networks with celltype specific clustered architectures.
We then study the stimulusresponse properties of networks with heterogeneous assemblies. In networks with homogeneous assemblies, chaotic activity is suppressed at a single resonant frequency (Rajan et al., 2010). However, when we drive heterogeneous networks with a timedependent broadband input featuring a superposition of multiple frequencies, we find that the chaotic activity is suppressed across multiple frequencies which depend on the assembly own size. Large and small assemblies are preferentially entrained by the low and highfrequency components of the input, respectively (Figure 1c). This spectral specificity suggests that a reservoir of timescales may be a natural mechanism for cortical circuits to flexibly demix different spectral features of complex timevarying inputs. This mechanism may endow neural circuits with the ability to transform temporal neural codes into spatial neural codes via frequencyselective neural assemblies.
Results
To develop a theory of heterogeneous timescales, we first focus on random neuronal networks whose rate units are recurrently connected, with couplings that are chosen randomly. In this model, we will be able to leverage analytical methods from statistical field theory (Sompolinsky et al., 1988; Buice and Chow, 2013; Helias and Dahmen, 2020) to link analytical model parameters to circuit dynamics. In our rate network model, each network unit represents a functional assembly of cortical neurons with similar response properties. We interpret the unit’s selfcoupling as the size of the corresponding neural assembly (if recurrent couplings across the population vary significantly, we also interpret the selfcoupling as representing the average coupling strength within an assembly). In the case where the selfcouplings are zero or weak (order $1/\sqrt{N}$, with $N$ being the size of the network), random networks are known to undergo a phase transition from silence to chaos when the variance of the random couplings exceeds a critical value (Sompolinsky et al., 1988). When the selfcouplings are strong (order 1) and are all equal, a third phase appears, featuring multiple stable fixed points accompanied by long transient activity (Stern et al., 2014). In all these cases, all network units exhibit the same intrinsic timescale, estimated from their autocorrelation function. Here, we demonstrate a novel class of recurrent networks, capable of generating temporally heterogeneous activity whose multiple timescales span several orders of magnitude. We show that when the selfcouplings are heterogeneous, a reservoir of multiple timescales emerges, where each unit’s intrinsic timescale depends both on its own selfcoupling and the network’s selfcoupling distribution.
Random networks with heterogeneous selfcouplings
We start by considering a recurrent network of $N$ rate units obeying the dynamical equations
where the random couplings $J}_{ij$ from unit $j$ to unit $i$ are drawn independently from a Gaussian distribution with mean 0 and variance $\displaystyle 1/N$; $g$ represents the network gain and we chose a transfer function $\varphi (x)\equiv \mathrm{tanh}(x)$. The selfcouplings $s}_{i$ are drawn from a distribution $P(s)$. The special case of equal selfcouplings (${s}_{i}=s$) was studied by Stern et al., 2014 and a summary of the results can be found in Appendix 1 for convenience. Here, we study the network properties in relation to both discrete and continuous distributions $P(s)$.
Using standard methods of statistical field theory (Buice and Chow, 2013; Helias and Dahmen, 2020, see Methods: 'Dynamic meanfield theory with multiple selfcoupling' for details), in the limit of large $N$ we can average over realizations of the disordered couplings $J}_{ij$ to derive a set of selfconsistent dynamic meanfield equations for each population of units $x}_{\alpha$ with selfcoupling strengths ${s}_{\alpha}\in S$
In our notation, $S$ denotes the set of different values of selfcouplings $s}_{\alpha$, indexed by $\alpha \in A$, and we denote by $N}_{\alpha$ the number of units with the same selfcoupling $s}_{\alpha$, and accordingly by ${n}_{\alpha}={N}_{\alpha}/N$ their fraction. The meanfield $\eta (t)$ is the same Gaussian process for all units and has zero mean $\u27e8\eta (t)\u27e9=0$ and autocorrelation
where $\u27e8\cdot \u27e9$ denotes an average over the meanfield.
We found that networks with heterogeneous selfcouplings exhibit a complex landscape of fixed points $x}_{\alpha}^{\ast$, obtained as the selfconsistent solutions to the static version of Equation 2 and Equation 3, subject to stability conditions (see Methods: 'Fixed points and transition to chaos' and 'Stability conditions'). For fixed values of the network gain $g$, these fixed points can be destabilized by varying the selfcouplings of different assemblies, inducing a transition to timevarying chaotic activity (Figure 2). The fixed points landscape exhibits remarkable features inherited directly from the single value selfcoupling case, as was extensively researched in Stern et al., 2014. Here, we focus on the dynamical properties of the timevarying chaotic activity, which constitute new features resulting from the heterogeneity of the selfcouplings. We illustrate the network’s dynamical features in the case of a network with two subpopulations with $n}_{1$ and $n}_{2}=1{n}_{1$ portions of the units with selfcouplings $s}_{1$ and $s}_{2$, respectively. In the $({s}_{1},{s}_{2})$ plane, this model gives rise to a phase diagram with a single chaotic region separating four disconnected stable fixedpoint regions (Figure 2a). In the case of a Gaussian distribution of selfcouplings in the stable fixed point regime, a complex landscape of stable fixed points emerges. The unit values at the stable fixed points continuously interpolate between around zero (for units with $\displaystyle {s}_{i}<1$) and a bimodal distribution (for units with $\displaystyle {s}_{i}>1$) within the same network (Figure 3a).
A reservoir of heterogeneous timescales explains cortical recordings
In the chaotic phase, we can estimate the intrinsic timescale $\tau}_{i$ of a unit $x}_{i$ from its autocorrelation function $C(\tau )=\u27e8\varphi [{x}_{i}(t)]\varphi [{x}_{i}(t+\tau )]{\u27e9}_{t}$ as the halfwidth at its autocorrelation half maximum (Figure 2aiv, $\tau}_{1$, and $\tau}_{2$). The chaotic phase in the network, Equation 1, is characterized by a large range of timescales that can be simultaneously realized across the units with different selfcouplings. In a network with two selfcouplings $s}_{1$ and $s}_{2$ in the chaotic regime, we found that the ratio of the timescales $\tau}_{2}/{\tau}_{1$ increases as we increase the selfcouplings ratio $s}_{2}/{s}_{1$ (Figure 2b). The separation of timescales depends on the relative fractions $n}_{1$ and $n}_{2}=1{n}_{1$ of the fast and slow populations, respectively. When the fraction of $n}_{2$ approaches zero, (with ${n}_{1}\to 1$), the log of the timescale ratio exhibits a supralinear dependence on the selfcouplings ratio, as described analytically in Methods ('Universal colorednoise approximation to the FokkerPlanck theory'), with a simplified estimation given in Equation 4, leading to a vast separation of timescales. Other selfcouplings ratios $s}_{2}/{s}_{1$ approach the timescale supralinear separation as the fraction of $n}_{1$ increases. We note that all uses of ‘log’ to evaluate the timescale growth and otherwise assume the base e.
In the case of a lognormal distribution of selfcouplings, in the chaotic regime the network generates a reservoir of multiple timescales $\tau}_{i$’s of chaotic activity across network units, spanning across several orders of magnitude (Figure 3b). For longtailed distributions such as the lognormal, meanfield theory can generate predictions for rare units with large selfcouplings from the tail end of the distribution by solving Equation 2 and the continuous version of Equation 3, see Methods ('Dynamic meanfield theory with multiple selfcouplings') Equation 13. The solution highlights the exponential relation between a unit’s selfcoupling and its autocorrelation decay time (Figure 3aii, purple dashed line).
During periods of ongoing activity, the distribution of singlecell autocorrelation timescales in primate cortex was found to be rightskewed and approximately lognormal, ranging from 10 ms to a few seconds (Cavanagh et al., 2016; Figure 3bi). Can the reservoir of timescales generated in our heterogeneous network model explain the distribution of timescales observed in primate cortex? We found that a model with a lognormal distribution of selfcouplings can generate a longtailed distribution of timescales which fits the distribution observed in awake primate orbitofrontal cortex (Figure 3bi). This result shows that neural circuits with heterogeneous assemblies can naturally generate the heterogeneity in intrinsic timescales observed in cortical circuits from awake primates.
Separation of timescales in the bistable chaotic regime
To gain an analytical understanding of the parametric separation of timescales in networks with heterogeneous selfcouplings, we consider the special case of a network with two selfcouplings where a large subpopulation (${N}_{1}=N1$) with ${s}_{1}=1$ comprises all but one slow probe unit, $x}_{2$, with large selfcoupling $s}_{2}\gg {s}_{1$ (see Methods:'Universal colorednoise approximation to the FokkerPlanck theory' for details). In the large $N$ limit, we can neglect the backreaction of the probe unit on the meanfield and approximate the latter as an external Gaussian colored noise $\eta (t)$ with autocorrelation ${g}^{2}C(\tau )={g}^{2}\u27e8\varphi [{x}_{1}(t)]\varphi [{x}_{1}(t+\tau )]\u27e9$, independent of $x}_{2$. The noise $\eta (t)$ then represents the effect on the probe unit $x}_{2$ of all other units in the network and can be parameterized by the noise strength $D$ and its timescale (color) $\tau}_{1$. For large $s}_{2$, the dynamics of the probe unit $x}_{2$ leads to the emergence of a bistable chaotic phase whereby its activity is localized around the critical points $x}^{\pm}\simeq \pm {s}_{2$ (Figure 4a–i) and switches between them at random times. In the regime of colored noise (as we have here, with $\displaystyle {\tau}_{1}\simeq 7.9\gg 1$), the stationary probability distribution $p({x}_{2})$ (Figure 4a–ii and b) satisfies the unified colored noise approximation to the Fokker Planck equation (see Methods:'Universal colorednoise approximation to the FokkerPlanck theory', Hänggi and Jung, 1995; Jung and Hänggi, 1987), based on an analytical expression for its effective potential ${U}_{eff}(x)$ as a function of the selfcoupling $s}_{2$ and the noise color ($\tau}_{1$). The distribution $p({x}_{2})$ is concentrated around the two minima $x}^{\pm}\simeq \pm {s}_{2$ of $U}_{eff$. The main effect of the strong color ${\tau}_{1}\gg 1$ is to sharply decrease the variance of the distribution around the minima $x}^{\pm$, compared to the white noise case (${\tau}_{1}=0$). This is evident from comparing the colored noise with white noise (Figure 4aiv,v,vi).
In our network with colored noise, the probe unit’s temporal dynamics are captured by the mean first passage time $\u27e8T\u27e9$ for the escape out of the potential well defined by the effective potential $U}_{eff$, yielding good agreement with simulations at increasing $s}_{2$, as expected on theoretical ground (Hänggi and Jung, 1995; Jung and Hänggi, 1987; Figure 4c). The asymptotic scaling of the mean first passage time for large $s}_{2$ is
In this slow probe regime, we thus achieved a parametric separation of timescales between the population $x}_{1$, with its intrinsic timescale $\tau}_{1$, and the probe unit $x}_{2$ whose activity fluctuations exhibit two separate timescales: the slow timescale $<T>$ of the dwelling in each of the bistable states and the fast timescale $\tau}_{1$ of the fluctuations around the metastable states (obtained by expanding the dynamical equation around the metastable values $x}^{\pm}=\pm {s}_{2$). One can generalize this metastable regime to a network with $Np$ units which belong to a group with ${s}_{1}=1$ and $p\ll N$ slow probe units $x}_{\alpha$, for $\alpha =2,\dots ,p+1$, with large selfcouplings $s}_{\alpha$. The slow dynamics of each probe unit $x}_{\alpha$ is captured by its own mean first passage time (between the bistable states) $<T{>}_{\alpha}$ in (Equation 22) and all slow units are driven by a shared external colored noise $\eta (t)$ with timescale $\tau}_{1$. In summary, in our model multiple timescales can be robustly generated with specific values, varying over several orders of magnitude.
Is the relationship between the unit’s selfcoupling and its timescale relying on singleunit properties, or does it rely on network effects? To answer this question, we compare the dynamics of a unit when driven by a white noise input vs. the selfconsistent input generated by the rest of the recurrent network (i.e. the meanfield). If the neural mechanism underlying the timescale separation was a property of the singlecell itself, we would observe the same effect regardless of the details of the input noise. We found that the increase in the unit’s timescale as a function of $s}_{2$ is absent when driving the unit with white noise, and it only emerges when the unit is driven by the selfconsistent meanfield (Figure 4c). We thus concluded that this neural mechanism is not an intrinsic property of a single unit but requires the unit to be part of a recurrently connected network.
A reservoir of timescales in EI spiking networks
We next investigated whether the neural mechanism endowing the rate network (1) with a reservoir of timescales could be implemented in a biologically plausible model based on spiking activity and excitatory/inhibitory celltype specific connectivity. To this end, we modeled the local cortical circuit as a recurrent network of excitatory (E) and inhibitory (I) currentbased leaky integratedandfire neurons (see Methods:'Spiking network model' for details), where both E and I populations were arranged in neural assemblies (Figure 5a; Amit and Brunel, 1997; LitwinKumar and Doiron, 2012; Wyrick and Mazzucato, 2021). Synaptic couplings between neurons in the same assembly were potentiated compared to those between neurons in different assemblies. Using meanfield theory, we found that the recurrent interactions of celltype specific neurons belonging to the same assembly can be interpreted as a selfcoupling, expressed in terms of the underlying network parameters as $s}_{i}^{E}={\overline{J}}_{EE}^{(in)}{C}_{i}^{E$, where $\displaystyle {C}_{i}^{E}$ is the assembly size and $\overline{J}}_{EE}^{(in)$ is the average synaptic coupling between E neurons within the assembly (see Methods: 'Spiking network model' for details). The spiking network timevarying activity unfolds through sequences of metastable attractors (LitwinKumar and Doiron, 2012; Wyrick and Mazzucato, 2021), characterized by the coactivation of different subsets of neural assemblies (Figure 5a). These dynamics rely on the bistable activity of each assembly, switching between high and low firing rate states. The dwell time of an assembly in a highactivity state increases with larger sizes and with stronger intraassembly coupling strength (Figure 5b). This metastable regime in spiking networks is similar to the bistable, heterogeneous timescales activity observed in the random neural networks endowed with heterogeneous selfcouplings. We further examined the features of the metastable regime in spiking networks in order to compare the mechanism underlying the heterogeneous timescale distributions in the rate and spiking models. The two models exhibit clear differences in their “building blocks”. In the rate network, the transfer function is odd ($\mathrm{tanh}$) leading to bistable states with values localized around ${x}^{\pm}\simeq \pm s$. In the spiking model, the single neuron currenttorate transfer function is strictly positive so that the bistable states can have both high and low firing rate values. As a result, in the spiking network, unlike in the rate network, a variety of firing rate levels can be attained during the high activity epochs, depending on both the assembly size and the the number of other simultaneously active assemblies. In other words, the rate level depends on the specific network attractor visited within the realization of the complex attractor landscape (Mazzucato et al., 2015; Wyrick and Mazzucato, 2021).
Despite these differences, we found crucial similarities between the timevarying activity in the two models related to the underlying neural mechanism. The characteristic timescale $T$ of the assembly metastable dynamics can be estimated from its average activation time (Figure 5a and b). We tested whether the heterogeneity in the assembly selfcoupling distribution could lead to a heterogeneous distribution of timescales. We endowed the network with a heterogeneous distribution of assembly sizes (an additional source of heterogeneity originates from the ErdosRenyi connectivity), yielding a heterogeneous distribution of selfcouplings (Figure 5b). We found that the assembly activation timescales $T$ grew as the assembly’s selfcoupling increased, spanning overall a large range of timescales from 20 ms to 100 s i.e. the whole range of our simulation epochs (Figure 5b). In particular, the functional dependence of $\mathrm{log}(T)$ vs. selfcoupling $s}_{E$ was best fit by a quadratic polynomial (Figure 5b inset, see Methods: 'Spiking network model' for details), in agreement with the functional dependence obtained from the analytical calculation in the rate model (4). We thus concluded that a reservoir of timescales can naturally emerge in biologically plausible spiking models of cortical circuits from a heterogeneous distribution of assembly sizes. Both the range of timescales (20 ms100 s) (Cavanagh et al., 2016) and the distribution of assembly sizes (50–100 neurons) (Perin et al., 2011; Marshel et al., 2019) are consistent with experimental observations.
What is the relationship between the distribution of timescales in the E/I spiking model and the chaotic rate network? We can obtain crucial insights by combining analysis of the synaptic weight matrix together with a meanfield approach in a linear approximation, following the approach of Murphy and Miller, 2009; Schaub et al., 2015. In the spiking network, the nonnormal weight matrix $J$ exhibits the typical E/I structure with the four submatrices representing E/I celltype specific connectivity; within each of the four E/I submatrices, diagonal blocks highlight the paired E/I clustered architecture (the heterogeneous distribution of cluster sizes is manifest in the increasing size of the diagonal blocks, Appendix 2—figure 1a). To interpret the dynamical features emerging from this weight matrix, we examined a meanfield reduction of the $N$dimensional network to a set of $2p+2$ meanfield variables, representing the $2p$ E and I clusters plus the two unclustered background E and I populations (see Appendix 2). The $2p+2$ eigenvalues of the meanfieldreduced weight matrix $J}^{MF$ comprise a subset of the full weight matrix $J$, capturing the salient features of the spiking network dynamics in a linear approximation (Figure 5c; see Appendix 2 and Murphy and Miller, 2009; Schaub et al., 2015). The weights matrix $J}^{MF$ exhibits a spectral gap, beyond which a distribution of $p1$ eigenvalues with real parts larger than one correspond to slow dynamical modes. To identify these slow modes, we examined the Schur eigenvectors of $J}^{MF$, which represent independent dynamical modes in the linearized theory (i.e. an orthonormal basis; see Appendix 2 and Appendix 2—figure 2).
We found that the Schur eigenvectors associated with those large positive eigenvalues can be approximately mapped onto E/I cluster pairs. More specifically, eigenvalues with increasingly larger values correspond to assemblies of increasingly larger sizes (Figure 5c), which, in turn, are associated with slower timescales (Figure 5b). We conclude that the slow switching dynamics in the spiking network is linked to large positive eigenvalues of the synaptic weight matrix, and the different timescales emerge from a heterogeneous distribution of these eigenvalues. For comparison, in the chaotic rate network, the eigenvalue distribution of the weight matrix exhibits a set of eigenvalues with large positive real parts as well, Appendix 2—figure 3a. The relation between the value of an eigenvalue and the slow dynamics holds in the rate networks as well: increasingly larger eigenvalues correspond to increasingly larger cluster selfcouplings (Appendix 2—figure 3) which are associated with slower dynamics (Figure 3aii). Therefore, the dynamical structure in the rate networks qualitatively matches the structure uncovered in the meanfieldreduced spiking network weight matrix $J}^{MF$ (Figure 5C). In summary, our analysis shows that in both spiking and rate networks the reservoir of timescales is associated with the emergence of a heterogeneous distribution of large positive eigenvalues in the weight matrix. This analysis suggests that the correspondence between rate networks and spiking networks should be performed at the level of dynamical modes associated with these large positive eigenvalues in the Schur basis, where rate units in the rate network can be related to E/I cluster pairs (Schur eigenvectors) in the spiking network. A potential difference between the two models may be related to the nature of the transitions between bistable states in the rate network vs the transitions between low and high activity states in the spiking network. In the rate network, transitions are driven by the selfconsistent colored noise, the hallmark of the chaotic activity arising from the disorder in the synaptic couplings. In the spiking network, although the disorder in the interassembly effective couplings may contribute to the state transitions, there might be finite size effects at play, due to the small assembly size.
Spatial demixing of timevarying broadband input
What are the computational benefits of having multiple timescales simultaneously operating in the same circuit? Previous work in random networks with no selfcouplings (${s}_{i}=0$ in Equation 1) showed that stimulusdriven suppression of chaos is enhanced at a particular input frequency, related to the network’s intrinsic timescale (Rajan et al., 2010). The phenomenon was preserved when a single rate of adaptation was added to all units (Muscinelli et al., 2019). We investigated whether, in our network with two different selfcouplings $\displaystyle {s}_{1}<{s}_{2}$ (in the chaotic regime), the stimulusdependent suppression of chaos exhibited different features across the two subpopulations, depending on their different intrinsic timescale. We drove each network unit $x}_{i$ with an external broadband stimulus $\displaystyle {I}_{i}(t)=A\sum _{l=1}^{L}\mathrm{sin}(2\pi {f}_{l}t+{\theta}_{i})$ consisting of the superposition of $L$ sinusoidal inputs of different frequencies $f}_{l$ in the range $1200$ Hz, with an equal amplitude $A=0.5$ and random phases $\theta}_{i$. We found that the subpopulation with a slow, or fast, intrinsic timescale preferentially entrained its activity with slower, or faster, spectral components of the broadband stimulus, respectively (Figure 6a). We quantified this effect using a spectral modulation index $m(f)=[({P}_{2}(f){P}_{1}(f))/({P}_{2}(f)+{P}_{1}(f))]$, where ${P}_{\alpha}(f)$ is the powerspectrum peak of subpopulation $\alpha$ at the frequency $f$ (Figure 6b). A positive, or negative, value of $m(f)$ reveals a stronger, or weaker, respectively, entrainment at frequency $f$ in the subpopulation $s}_{2$ compared to $s}_{1$ exhibited a crossover behavior whereby the low frequency component of the input predominantly entrained the slow population $s}_{2$, while the fast component of the input predominantly entrained the fast population $s}_{1$. When fixing ${s}_{1}=1$ and varying $s}_{2$, we found that the dependence of the crossover frequency $f}_{c$ on $s}_{2$ was strong at low input amplitudes and was progressively tamed at larger input amplitudes (Figure 6c). This is consistent with the fact that the input amplitude completely suppresses chaos beyond a certain critical value, as previously reported in network’s with no selfcouplings (Rajan et al., 2010) and with adaptation (Muscinelli et al., 2019).
Discussion
We demonstrated a new robust and biologically plausible network mechanism whereby multiple timescales emerge across units with heterogeneous selfcouplings. In our model, units are interpreted as neural assemblies consistent with experimental evidence from cortical circuits (Perin et al., 2011; Lee et al., 2016; Kiani et al., 2015; Miller et al., 2014; Marshel et al., 2019), and previous theoretical modeling (LitwinKumar and Doiron, 2012; Wyrick and Mazzucato, 2021). We found that the neural mechanism underlying the large range of timescales is the heterogeneity in the distribution of selfcouplings (representing neural assembly size). We showed that this mechanism can be naturally implemented in a biologically plausible model of a neural circuit based on spiking neurons with excitatory/inhibitory celltype specific connectivity. This spiking network represents a microscopic realization of our mechanism where neurons are arranged in assemblies, and an assembly’s selfcoupling represents the strength of the recurrent interactions between neurons belonging to that assembly, proportional to its size. A heterogeneous distribution of assembly sizes, in turn, generates a reservoir of timescales. Crucially, our model captured the distribution of intrinsic timescales observed across neurons recorded within the same area in primate cortex (Cavanagh et al., 2016).
Several experimental studies uncovered heterogeneity of timescales of neural activity across brain areas and species. Comparison of the populationaveraged autocorrelations across cortical areas revealed a hierarchical structure, varying from 50 ms to 350 ms along the occipitaltofrontal axis (Murray et al., 2014). Neurons within the same area exhibit a wide distribution of timescales as well. A heterogeneous distribution of timescales (from 0.5 s to 50 s) was found across neurons in the oculomotor system of the fish (Miri et al., 2011) and primate brainstem (Joshua et al., 2013), suggesting that timescale heterogeneity is conserved across phylogeny. During periods of ongoing activity, the distribution of singlecell autocorrelation timescales in primates was found to be rightskewed and approximately lognormal, ranging from 10 ms to a few seconds (Cavanagh et al., 2016 and Figure 3). Single neuron activity was found to encode long reward memory traces in primate frontal areas over a wide range of timescales up to 10 consecutive trials (Bernacchia et al., 2011). In these studies, autocorrelation timescales were estimated using parametric fits, which may be affected by statistical biases, although Bayesian generative approaches might overcome this issue (Zeraati et al., 2020). In our model, we estimated timescales nonparametrically as the halfwidth at halfmaximum of the autocorrelation function. In our biologically plausible model based on a spiking network with celltype specific connectivity, the distribution of timescales was in the range between 20 ms and 100 s, similar to the range of timescales observed in experiments (Miri et al., 2011; Joshua et al., 2013; Cavanagh et al., 2016). Moreover, the distribution of assembly sizes in our model is within the range of 50–100 neurons, consistent with the size of functional assemblies experimentally observed in cortical circuits (Perin et al., 2011; Marshel et al., 2019). A fundamental new prediction of our model, to be tested in future experiments, is the direct relationship between assembly size and its timescale.
Previous neural mechanisms for generating multiple timescales of neural activity relied on singlecell biophysical properties, such as membrane or synaptic time constants (Gjorgjieva et al., 2016). In feedforward networks, developmental changes in singlecell conductance can modulate the timescale of information transmission, explaining the transition from slow waves to rapid fluctuations observed in the developing cortex (Gjorgjieva et al., 2014). However, the extent to which this singlecell mechanism might persist in the presence of strong recurrent dynamics was not assessed. To elucidate this issue, we examined whether a heterogeneous distribution of singleunit integration time constants could lead to a separation of timescales in a random neural network (see Appendix 3 for details). In this model, half of the units were endowed with a fixed fast time constant and the other half with a slow time constant, whose value varied across networks. We found that, although the average network timescale increased proportionally to the value of the slower time constants, the difference in autocorrelation time between the two populations remained negligible. These results suggest that, although the heterogeneity in singlecell time constants may affect the dynamics of single neurons in isolation or within feedforward circuits (Gjorgjieva et al., 2014), the presence of strong recurrent dynamics fundamentally alters these singlecell properties in a counterintuitive way. Our results suggest that a heterogeneity in singlecell time constants may not lead to a diversity of timescales in the presence of recurrent dynamics.
Our results further clarified that the relationship between an assembly’s selfcoupling and its timescale relies on the strong recurrent dynamics. This relationship is absent when driving an isolated assembly with white noise external input (Figure 4). Indeed, the mechanism linking the selfcoupling to the timescale only emerged when driving the unit with a meanfield whose color was selfconsistently obtained from an underlying recurrent network of selfcoupled units.
Previous models showed that a heterogeneity of timescales may emerge from circuit dynamics through a combination of structural heterogeneities and heterogeneous longrange connections arranged along a spatial feedforward gradient (Chaudhuri et al., 2014; Chaudhuri et al., 2015). These networks can reproduce the populationaveraged hierarchy of timescales observed across the cortex in the range of 50–350 ms (Murray et al., 2014; Chaudhuri et al., 2015). A similar network architecture can also reproduce the heterogeneous relaxation time after a saccade, found in the brainstem oculomotor circuit (Miri et al., 2011; Joshua et al., 2013), in a range between 10–50 s (Inagaki et al., 2019; Recanatesi et al., 2022). This class of models can explain a timescale separation within a factor of 10, but it is not known whether they can be extended to several orders of magnitude, as observed between neurons in the same cortical area (Cavanagh et al., 2016). Moreover, while the feedforward spatial structure underlying these two models is a known feature of the cortical hierarchy and of the brainstem circuit, respectively, it is not known whether such a feedforward structure is present within a local cortical circuit. Our model, on the other hand, relies on strong recurrent connectivity and local functional assemblies, two hallmarks of the architecture of local cortical circuits (Perin et al., 2011; Lee et al., 2016; Kiani et al., 2015; Miller et al., 2014; Marshel et al., 2019). Other network models generating multiple timescales of activity fluctuations were proposed based on selftuned criticality with antihebbian plasticity (Magnasco et al., 2009), or multiple blockstructured connectivity (Aljadeff et al., 2015).
In our model, the dynamics of units with large selfcouplings, exhibiting slow switching between bistable states, can be captured analytically using the universal colored noise approximation (UCNA) to the FokkerPlanck equation (Hänggi and Jung, 1995; Jung and Hänggi, 1987). This is a classic tool from the theory of stochastic processes, which we successfully applied to investigate neural network dynamics for the first time. This slowswitching regime may underlie the emergence of metastable activity, ubiquitously observed in the population spiking activity of behaving mammals (Abeles et al., 1995; Jones et al., 2007; Mazzucato et al., 2015; Mazzucato et al., 2019; Recanatesi et al., 2022; Engel et al., 2016; Kadmon Harpaz et al., 2019). In these spiking networks, it is not known how to estimate the timescales of metastable activity from network parameters, and we anticipate that our UCNA may provide a powerful new tool for investigating network dynamics in these biologically plausible models.
What is the functional relevance of neural circuits exhibiting a reservoir of multiple timescales? The presence of long timescales deeply in the chaotic regime is a new feature of our model which may be beneficial for memory capacity away from the edge of chaos (Toyoizumi and Abbott, 2011). Moreover, we found that, in our model, timedependent broadband inputs suppress chaos in a populationspecific way, whereby populations of large (small) assemblies preferentially entrain slow (fast) spectral components of the input. Previously studied spiking models suggested that preferential entrainment of input is possible by cellular mechanisms (Lindner, 2016) or finitesize fluctuations in a feedforward network structure (Deger et al., 2014). Here, we presented a recurrent network mechanism for populationspecific chaos suppression, independent of the network size. This mechanism may thus grant recurrent networks with a natural and robust tool to spatially demix complex temporal inputs (PerezNieves et al., 2021) as observed in visual cortex (Mazzoni et al., 2008). Third, the presence of multiple timescales may be beneficial for performing flexible computations involving simultaneously fast and slow timescales, such as in roleswitching tasks (Iigaya et al., 2019) or as observed in time cells in the hippocampus (Kraus et al., 2013; Howard et al., 2014). A promising direction for future investigation is the exploration of the computational properties of our model in the context of reservoir computing (Sussillo and Abbott, 2009) or recurrent networks trained to perform complex cognitive tasks (Yang et al., 2019).
Methods
Dynamic meanfield theory with multiple selfcouplings
We derive the dynamic meanfield theory in the limit $N\to \mathrm{\infty}$ by using the moment generating functional (Sompolinsky and Zippelius, 1982; Crisanti and Sompolinsky, 1987). For the derivation we follow the MartinSiggiaRoseDe DominicisJanssen path integral approach formalism (Martin et al., 1973) as appears extensively in Helias and Dahmen, 2020; we borrow their notations as well.
The model includes two sets of random variables, the connectivity couplings $J}_{ij$ for $1\le i,j\le N;i\ne j$, are drawn independently from a Gaussian distribution with variance $\frac{1}{N}$ and mean 0; and the selfcouplings $s}_{i$ for $1\le i\le N$, whose values are of order 1. When we examine the dynamics governing each unit in Equation 1, the sum over the random couplings $J}_{ij$ contributes $N$ terms which, in the limit $N\to \mathrm{\infty}$, ensure that the net contribution (mean and variance) from this sum remains of order 1. Hence, in our model, as in previous models, $J$ is the quenched disorder parameter, whose sum gives rise to the meanfield. The selfcouplings (one for each unit) contribute an additional term to the moment generating functional. Each unit’s activity strongly depends on the value of its own selfcoupling, and hence can’t be averaged over when we study a unit’s dynamics. After averaging over $J$, we can study all units with the same selfcoupling together, as they obey the same meanfiled equation, Equation 2. Moreover we show that all units, regardless of their selfcoupling, obey a single meanfield due to the structure of $J$. We note that the results of this Methods section, including Equation 5 and Equation 6, will not be affected by diagonal elements in J which are not zero but rather drawn from the same distribution as the offdiagonal elements (as in the main text) since the contribution of such nonzero elements is negligible overall. To maintain the clarity of the text, and since the results are not affected by it, we left out the differentiation between including and excluding diagonal elements of J of order 1/sqrt(N) in the main text.
For our model, Equation 1, the moment generating functional is, therefore, given by:
where $\mathcal{D}x=\prod _{i}\mathcal{D}{x}_{i}$ and $\mathcal{D}\stackrel{~}{x}=\prod _{i}\mathcal{D}{\stackrel{~}{x}}_{i}/2\pi i$. To start, we calculate $\u27e8Z(J){\u27e9}_{J}$. We take advantage of the selfaveraging nature of our model, particularly by averaging over the quenched disorder, $J$. The couplings, $J}_{ij$, are i.i.d. variables extracted from a normal distribution and appear only in the last term in (Equation 5). We, hence, focus our current calculation step on that term, and we derive the result to the leading term in $N$, yielding:
The result above suggests that all the units in our network are coupled to one another equivalently (by being coupled only to sums that depend on all units’ activity). To further decouple the network, we define the quantity
We enforce this definition by multiplying the disordered averaged moment generating functional with the appropriate Dirac delta function, $\delta$, in its integral form:
where $d{Q}_{2}$ is an integral over the imaginary axis (including its $1/(2\pi i)$ factor). We can now rewrite the disordered averaged moment generating functional, using (Equation 6) to replace its last term, the definition of $Q}_{1$, and with multiplying the functional by the $\delta$ function above. All together we get:
with ${n}_{\alpha}={N}_{\alpha}/N$ the fraction of units with selfcouplings $s}_{\alpha$ across the population, for $\alpha \in A$. In the expression above we made use of the fact that $Q}_{1$ and $Q}_{2$, now in a role of auxiliary fields, couple to sums of the fields $x}_{i}^{2$ and $\varphi}_{i}^{2$ and hence the generating functional for $x}_{i$ and $\stackrel{~}{x}}_{i$ can be factorized with identical multiplications of $Z}_{\alpha$. Note that in our network, due to the dependency on $s}_{i$, $x}_{i$s are equivalent as long as $s}_{i$s are equivalent. Hence, the factorization is for $Z}_{\alpha$ for all $x}_{i$ with $s}_{i}={s}_{\alpha$. Now each factor $Z}_{\alpha$ includes the functional integrals $\mathcal{D}{x}_{\alpha}$ and $\mathcal{D}{\stackrel{~}{x}}_{\alpha}$ for a single unit with selfcoupling $s}_{\alpha$.
In the large $N$ limit we evaluate the auxiliary fields in (Equation 7) by the saddle point approach.e note variable valued at the saddle point by ($\ast$), obtaining:
and yielding the saddle point values $({Q}_{1}^{\ast},{Q}_{2}^{\ast})$:
where $C(\tau )$, with $\tau ={t}^{\mathrm{\prime}}t$, represents the average autocorrelation function of the network (as was defined in the main text, Equation 3). The second saddle point ${Q}_{2}^{\ast}=0$ vanishes due to $\u27e8{\stackrel{~}{x}}_{\alpha}(t){\stackrel{~}{x}}_{\alpha}({t}^{\mathrm{\prime}})\u27e9=0$ as can be immediately extended from Helias and Dahmen, 2020; Sompolinsky and Zippelius, 1982. The action at the saddle point reduces to the sum of actions for individual, noninteracting units with selfcoupling $s}_{\alpha$. All units are coupled to a common external field $Q}_{1}^{\ast$. Inserting the saddle point values back into Equation 7, we obtain $Z}^{\ast}=\prod _{\alpha}({Z}_{\alpha}^{\ast}{)}^{{N}_{\alpha}$ where
Thus in the large $N$ limit the network dynamics are reduced to those of a number of $A$ units ${x}_{\alpha}(t)$, each represents the subpopulation with selfcouplings $s}_{\alpha$ and follows dynamics governed by
for all $\alpha \in A$ and where $\eta (t)$ is a Gaussian meanfield with autocorrelation
The results above can be immediately extended for the continuous case of selfcoupling distribution $P(s)$ yielding:
with $p(s)$ the density function of the selfcouplings distribution in the network and the units dynamics dependent on their respective selfcouplings:
An iterative solution
We use an iterative approach to solving the meanfield equations, Equation 11 and Equation 12 for a discrete distribution of selfcouplings, or Equation 13 and Equation 14 for a continuous distribution of self couplings. The approach is adopted from Stern et al., 2014 and adapted to allow for a consideration of multiple selfcouplings. We briefly describe it in what follows. We start by making an initial guess for the meanfield autocorrelation function $C(\tau )$, as defined in Equation 3. In its Fourier space, we multiply it by a random angle and $g$ and transform it back to generate an instance of the meanfield $\eta (t)$ (see Stern et al., 2014 for more details). We create additional $\eta (t)$ instances by repeating the procedure described above. At least one instance is created per each value $s}_{\alpha$ drawn from a discrete distribution $P(s)$ of selfcouplings with support set $S$, or per each value $s}_{\alpha$ drawn from $P(s)$ in a case of a continuous distribution. We then solve Equation 11 (or equivalently Equation 14 in the case of a continuous distribution) to obtain solutions for $x}_{\alpha$, one solution for each value of $s}_{\alpha$. The set of solutions allows us to calculate the set ${c}_{\alpha}(t,{t}^{\mathrm{\prime}})=\u27e8\varphi ({x}_{\alpha}(t))\varphi ({x}_{\alpha}({t}^{\mathrm{\prime}}))\u27e9$. For a discrete distribution, we then multiply each $c}_{\alpha$ by its relative weight $n}_{\alpha$ to compute $C(\tau )$, Equation 12. For a continuous distribution, we sum all $c}_{\alpha$, multiplied by $1/n$, with $n$ their amount, to estimate $C(\tau )$, Equation 13 (since $s}_{\alpha$ values were drawn from $P(s)$ each $c}_{\alpha$ captures approximately $1/n$ of the distribution). We use these sampled meanfield autocorrelations $C(\tau )$ instead of our initial guess to repeat the entire procedure. This leads to obtaining another $C(\tau )$. We iterate until the average across iterations of $C(\tau )$ converges. We note that for the continuous distribution case, we increase the number of drawn $s}_{\alpha$ values as the iterations progress (starting from very few and ending with many). This allows us to maintain a rapid iterative process and yet receive an accurate solution thanks to the refining of the process with each iteration.
Fixed points and transition to chaos
Networks with heterogeneous selfcouplings exhibit a complex landscape of fixed points $x}_{\alpha}^{\ast$, obtained as the selfconsistent solutions to the static version of Equation 2 and Equation 3, namely
where the meanfield $\eta$ is a Gaussian random variable with zero mean and its variance is given by
The solution for each unit depends on its respective $s}_{\alpha$ (Appendix 1Figure 1). If $\displaystyle {s}_{\alpha}<1$ a single interval around zero is available. For $\displaystyle {s}_{\alpha}>1$, for a range of values of $\eta$, $x}_{\alpha}^{\ast$ can take values in 1 of 3 possible intervals. Let us consider a network with two subpopulations with $n}_{1$ and $n}_{2}=1{n}_{1$ portions of the units with selfcouplings $s}_{1$ and $s}_{2$, respectively. In the $({s}_{1},{s}_{2})$ plane, this model gives rise to a phase diagram with a single chaotic region separating four disconnected stable fixedpoint regions (Figure 2a). We will first discuss the stable fixed points, which present qualitatively different structures depending on the values of the selfcouplings. Within the region of both selfcouplings $\displaystyle {s}_{1},{s}_{2}<1$, the only possibility for a stable fixed point is the trivial solution, with all ${x}_{i}=0$ (Figure 2a), where the network activity quickly decays to zero. When at least one selfcoupling is greater than one, there are three stable fixed point regions (Figure 2a); in these three regions, the network activity starting from random initial conditions unfolds via a longlived transient period, and then it eventually settles into a stable fixed point. This transient activity with late fixed points is a generalization of the network phase found in Stern et al., 2014. When both selfcouplings are greater than one ($\displaystyle {s}_{1},{s}_{2}>1$) the fixed point distribution in each subpopulation is bimodal (Figure 2a–ii). When $\displaystyle {s}_{1}>1$ and $\displaystyle {s}_{2}<1$, the solutions for the respective subpopulations are localized around bimodal fixed points and around zero, respectively (Figure 2a–i).
However, the available solutions in the latter case are further restricted by stability conditions. In the next Methods section we derive the stability condition by expanding the dynamical Equation 1 around the fixed point and requiring that all eigenvalues of the corresponding stability matrix are negative. Briefly, the $n}_{\alpha$ fraction of units with $\displaystyle {s}_{\alpha}>1$ at a stable fixed point are restricted to have support on two disjoint intervals $\displaystyle [{x}_{\alpha}^{\ast}({s}_{\alpha})<{x}_{\alpha}^{}({s}_{\alpha})]\cup [{x}_{\alpha}^{\ast}({s}_{\alpha})>{x}_{\alpha}^{+}({s}_{\alpha})]$. We refer to this regime as multimodal, a direct generalization of the stable fixed points regime found in Stern et al., 2014 for a single selfcoupling $\displaystyle s>1$, characterized by transient dynamics leading to an exponentially large number of stable fixed points. For the $n}_{\alpha$ portion of units with $\displaystyle {s}_{\alpha}<1$, the stable fixed point is supported by a single interval around zero.
Stability conditions
To determine the onset of instability, we look for conditions such that at least one eigenvalue develops a positive real part. An eigenvalue of the stability matrix exists at a point $z$ in the complex plane if Stern et al., 2014; Ahmadian et al., 2015
The denominator of the expression above is $z$ plus the slope of the curve in Appendix 1—figure 1 ai and aii. Hence, a solution whose value $x}_{\alpha}^{\ast$ gives a negative slope (available when $\displaystyle {s}_{\alpha}>1$) leads to a vanishing value of the denominator at some positive $z$ and to a positive eigenvalue and instability. Therefore, the $n}_{\alpha$ fraction of units with $\displaystyle {s}_{\alpha}>1$ at a stable fixed point are restricted to have support on two disjoint intervals $\displaystyle [{x}_{\alpha}^{\ast}({s}_{\alpha})<{x}_{\alpha}^{}({s}_{\alpha})]\cup [{x}_{\alpha}^{\ast}({s}_{\alpha})>{x}_{\alpha}^{+}({s}_{\alpha})]$. We refer to this regime as multimodal, a direct generalization of the stable fixed points regime found in Stern et al., 2014 for a single selfcoupling $\displaystyle s>1$, characterized by transient dynamics leading to an exponentially large number of stable fixed points. For the $n}_{\alpha$ portion of units with $\displaystyle {s}_{\alpha}<1$, the stable fixed point is supported by a single interval around zero.
A fixed point solution becomes unstable as soon as an eigenvalue occurs at $z=0$, obtaining from Equation 17 the stability condition
where $q}_{\alpha}={[{s}_{\alpha}{\mathrm{cosh}}^{2}({x}_{\alpha})]}^{2$. For $\displaystyle {s}_{\alpha}>1$ the two possible consistent solutions to (Equation 15) that may result in a stable fixed point (from the two disjoint intervals in Appendix 1—figure 1ai), contribute differently to $q}_{\alpha$. Larger ${x}_{\alpha}^{\ast}$ decreases $q}_{\alpha}^{1$ (Appendix 1—figure 1b–i), thus improving stability. Choices for distributions of $x}_{\alpha}^{\ast$ along the two intervals become more restricted as $g$ increases or $s}_{\alpha$ decreases, since both render higher values for the stability condition, Equation 18, forcing more solutions of $x}_{i$ to decrease $q}_{\alpha}^{1$. This restricts a larger fraction of $x}_{\alpha}^{\ast$ at the fixed points to the one solution with a higher absolute value. At the transition to chaos, a single last and most stable solution exists with all $x}_{i$ values chosen with their higher absolute value $x}_{\alpha}^{\ast$ (Appendix 1—figure 1b–i, light green segments). For those with $\displaystyle {s}_{\alpha}<1$ only one solution is available, obtained by the distribution of $\eta$ through consistency (Equation 15) at the fixed point. In this configuration, the most stable solution is exactly transitioning from stability to instability where (Equation 18) reaches unity. Hence, the transition from stable fixed points to chaos occurs for a choice of $g$ and $P(s)$ such that solving consistently (Equation 15) and (Equation 16) leads to saturation of the stability condition (Equation 18) at one.
Universal colorednoise approximation to the FokkerPlanck theory
We consider the special case of a network with two selfcouplings where a large subpopulation (${N}_{1}=N1$) with ${s}_{1}=1$ comprises all but one slow probe unit, $x}_{2$, with large selfcoupling $s}_{2}\gg {s}_{1$. The probe unit obeys the dynamical equation $d{x}_{2}/dt=f({x}_{2})+\eta (t)$, with $f(x)=x+{s}_{2}\varphi (x)$. In the large $N$ limit, we can neglect the backreaction of the probe unit on the meanfield and approximate the latter as an external Gaussian colored noise $\eta (t)$ with autocorrelation ${g}^{2}C(\tau )={g}^{2}\u27e8\varphi [{x}_{1}(t)]\varphi [{x}_{1}(t+\tau )]\u27e9$, independent of $x}_{2$. The noise $\eta (t)$ can be parameterized by its strength, defined as $D={\int}_{0}^{\mathrm{\infty}}d\tau \phantom{\rule{thinmathspace}{0ex}}C(\tau )$ and its timescale (color) $\tau}_{1$. For large $s}_{2$, the dynamics of the probe unit $x}_{2$ can be captured by a bistable chaotic phase whereby its activity is localized around the critical points $x}_{2}={x}^{\pm}\simeq \pm {s}_{2$ (Figure 4a–i) and switches between them at random times. In the regime of colored noise (as we have here, with ${\tau}_{1}\simeq 7.9\gg 1$), the stationary probability distribution $p(x)$ (for $x\equiv {x}_{2}$, Figure 4a–ii and b) satisfies the unified colored noise approximation to the Fokker Planck equation (Hänggi and Jung, 1995; Jung and Hänggi, 1987):
where $Z$ is a normalization constant, $h(x)\equiv 1{\tau}_{1}{f}^{\mathrm{\prime}}(x)$, and the effective potential ${U}_{eff}(x)={\int}^{x}f(y)h(y)dy$ is therefore:
The distribution $p(x)$ has support in the region $\displaystyle h(x)>0$ comprising two disjoint intervals $x>{x}_{c}$ where $\mathrm{tanh}({x}_{c}{)}^{2}=1\frac{1+{\tau}_{1}}{{\tau}_{1}{s}_{2}}$ (Figure 4b). Moreover, the probability distribution is concentrated around the two minima $x}^{\pm}\simeq \pm {s}_{2$ of $U}_{eff$. The new UCNAbased term $\frac{{\tau}_{1}}{2}{f}^{\mathrm{\prime}}(x{)}^{2}$ dominates the effective potential. The main effect of the strong color ${\tau}_{1}\gg 1$ is to sharply decrease the variance of the distribution around the minima $x}^{\pm$. This is evident from comparing the colored noise with white noise, when the latter is driving the same bistable probe $d{x}_{2}/dt={x}_{2}+{s}_{2}\varphi ({x}_{2})+\xi (t)$, where $\xi (t)$ is a white noise with an equivalent strength to the colored noise, Figure 4ivvi. The naive potential for the white noise case $U={x}^{2}/2{s}_{2}\mathrm{log}\mathrm{cosh}(x)$ is obtained from Equation 19 by sending ${\tau}_{1}\to 0$ in the prefactor $h$ and in potential $U}_{eff$. It results in wider activity distribution compared to our network generated colored noise, in agreement with the simulations, Figure 4b.
In our colorednoise network, the probe unit’s temporal dynamics are captured by the mean first passage time $\u27e8T\u27e9$ for the escape out of the potential well:
where the upper limit $x}_{c$ in the outer integral is the edge of the support of $p(x)$. In the small $D$ approximation, we can evaluate the integrals by steepest descent. The inner integrand $p(x)$ is peaked at the minimum $x}^{}={s}_{2$ of the effective potential, yielding
The outer integrand can be rewritten as $\psi (x)=\mathrm{exp}\frac{\rho (x)}{D}$, where $\rho (x)={U}_{eff}(x)+D\mathrm{ln}h(x)$ peaks at ${x}_{f}$ with $\mathrm{tanh}({x}_{f}{)}^{2}\simeq 11/2{s}_{2}$. The mean first passage time can thus be estimated as
where $\mathrm{\Delta}=\rho ({x}_{f}){U}_{eff}({x}^{})$ and its asymptotic scaling for large $s}_{2$ leads to Equation 4. We validated the UCNA approach to calculate the mean first passage time by estimating the distribution of escape points $x}_{esc$ from one well to the other well, which was found to lie predominantly within the support $\displaystyle x>{x}_{c}$ of the stationary probability distribution $p(x)$. Only a fraction of activity in the simulations (1.8+/0.4) * 10^{3} (mean±SD over 10 probe units run with parameters as in Figure 4b) entered the forbidden region (see Figure 4—figure supplement 1 for details ).
A comparison with white noise
To test the impact of the input generated by the network (or equivalently as mimicked by the colored noise), we replaced this input (Equation 1, most rhs term) with white noise. The probe unit $x$ in the white noise case is, therefore, following the dynamical equation:
with $\eta (t)$ taken from a normal distribution, $\varphi \equiv \mathrm{tanh}$ and $s$,$g$, and $D$ are constants receiving their values according to the probe unit dynamics driven by the network case; specifically these constants are the probe unit selfcoupling strength, the original network gain, and strength (the integral under the autocorrelation function of the network input to the probe). Simulation results of the probe dynamics are in Figure 4aii, along with its distribution, Figure 4b (light green area, parameters’ values for $s$,$g$, and $D$ are specified in the caption). To estimate the probability (Equation 19) and the potential (Equation 20) in this case, and since $\eta$ here is white noise, we substitute ${\tau}_{1}=0$ as no correlation in the input exists. Similarly, we calculate the probe’s approximated mean first passage time when driven by white noise (Equation 4). The result (Figure 4c light green dashed line) estimates the simulations well (Figure 4c green line). Note that since $\displaystyle \mathrm{log}<T>$ depends on $\tau}_{1$ linearly, its exponent, the mean first passage time, depends on $\tau}_{1$ exponentially. Hence, the importance of the ‘color’ (correlations) in the network input in generating long timescales and the failure of these long timescales to materialize when the ‘color’ is removed (as in this particular white noisedriven probe case, which replicates the assemblies endowed network model except for its generated correlated input drive).
Spiking network model
Network architecture
We simulated a recurrent network of $N$ excitatory (E) and inhibitory (I) spiking neurons (for $N=2000,5000,10000$) with relative fractions $n}_{E}=80\mathrm{\%$ and $n}_{I}=20\mathrm{\%$ and connection probabilities ${p}_{EE}=0.2$ and ${p}_{EI}={p}_{IE}={p}_{II}=0.5$ (Figure 5). Nonzero synaptic weights from presynaptic neuron $j$ to postsynaptic neuron $i$ were $J}_{ij$, whose values only depended on the two neurons types $i,j\in \{\alpha ,\beta \}$ for $\alpha ,\beta =E,I$. Neurons were arranged in $p$ celltype specific assemblies. E assemblies had heterogeneous sizes drawn from a uniform distribution with a mean of $\displaystyle {N}_{E}^{clust}=60+N/100$ Eneurons and 30% standard deviation. The number of assemblies was determined as $p=\text{round}({n}_{E}N(1{n}_{bgr})/{N}_{E}^{clust})$, where ${n}_{bgr}=0.1$ is the fraction of background neurons in each population, i.e., not belonging to any assembly. I assemblies were paired with E assemblies and the size of each I assembly was matched to the corresponding E assembly with a proportionality factor ${n}_{I}/{n}_{E}=1/4$. Neurons belonging to the same assembly had potentiated intraassembly weights by a factor $J}_{\alpha \beta}^{+$, while those belonging to different assemblies had depressed interassembly weights by a factor $J}_{\alpha \beta}^{$, where: ${J}_{EI}^{+}=p/(1+(p1)/{g}_{EI})$, ${J}_{IE}^{+}=p/(1+(p1)/{g}_{IE})$, $J}_{EI}^{}={J}_{EI}^{+}/{g}_{EI$, $J}_{IE}^{}={J}_{IE}^{+}/{g}_{IE$ and ${J}_{\alpha \alpha}^{}=1\gamma ({J}_{\alpha \alpha}^{+}1)$ for $\alpha =E,I$, with $\gamma =f(2f(p+1){)}^{1}$. $f=(1{n}_{bgr})/p$ is the fraction of E neurons in each assembly. Parameter values are in Table 1.
Single neuron dynamics
Single neuron dynamics. We simulated currentbased leakyintegrateandfire (LIF) neurons, with membrane potential $V$ and dynamical equation
where $\tau}_{m$ is the membrane time constant. Input currents included a contribution $I}_{rec$ from the other recurrently connected neurons and a constant external current $I}_{ext}={N}_{ext}{J}_{\alpha 0}{r}_{ext$ (units of mV s^{1}), for $\alpha =E,I$, representing afferent inputs from other brain areas and $N}_{ext}={n}_{E}N{p}_{EE$. When the membrane potential $V$ hits the threshold $V}_{\alpha}^{thr$ (for $\alpha =E,I$), a spike is emitted and $V$ is held at the reset value $V}^{reset$ for a refractory period $\tau}_{refr$. We chose the thresholds so that the homogeneous network (i.e. where all ${J}_{\alpha \beta}^{\pm}=1$) was in a balanced state with average spiking activity at rates $({r}_{E},{r}_{I})=(2,5)$ spks/s. The postsynaptic currents evolved according to
where $\tau}_{s$ is the synaptic time constant, $J}_{ij$ are the recurrent couplings and $t}_{k$ is the time of the kth spike from the jth presynaptic neuron. Parameter values are in Table 1.
Selfcouplings from meanfield theory
We can estimate the Eassembly selfcouplings in this model using meanfield methods (Amit and Brunel, 1997; Wyrick and Mazzucato, 2021). This method allows obtaining, selfconsistently, the fixed point values of the firing rates $r}_{l}^{E},{r}_{l}^{I$ in the lth assembly ($l=1,\dots ,p$) via the equation
where $\displaystyle \mathbf{r}=({r}_{1}^{E},\dots ,{r}_{p}^{E},{r}_{1}^{I},\dots ,{r}_{p}^{I})$ is the leakyintegrateandfire currenttorate transfer function for each $\alpha =E,I$ population
where ${H}_{l}=({V}^{reset}{\mu}_{l}^{\alpha})/\sigma {\alpha}_{l}+ak$ and $\displaystyle \mathrm{\Theta}=({V}_{l}^{thr}{\mu}_{l}^{\alpha})/{\sigma}_{l}^{\alpha}+ak$ and $a=\zeta (1/2)/\sqrt{2}$ are terms accounting for the synaptic dynamics (Fourcaud and Brunel, 2002). The infinitesimal means $\mu}_{l}^{E},{\mu}_{l}^{I$ and variances $({\sigma}_{l}^{E}{)}^{2},({\sigma}_{l}^{I}{)}^{2}$ of the network populations comprising E and I assemblies (for $l=1,\dots ,p$ assemblies) are themselves functions of the firing rates, thus leading to selfconsistent equations for the fixed points (for more details see Wyrick and Mazzucato, 2021). The infinitesimal mean $\mu}_{1}^{E$ of the postsynaptic input to a neuron in a representative E assembly in focus is
where $r}_{1}^{E$ is the firing rate of the E assembly in focus and $r}_{1}^{I$ is the firing rate of its paired I assembly; $r}_{l}^{E},{r}_{l}^{I$, for $l=2,\dots ,p$ are the firing rates of the other E and I assemblies; $r}_{bg}^{E},{r}_{bg}^{I$ are the firing rates of the background (unclustered) populations. $f}_{i}^{E},{f}_{i}^{I$ represent the fraction of E and I neurons in each assembly, which are drawn from a uniform distribution (see above). The first line in Equation 26 represents the contribution to the input current coming from neurons within the same E assembly, or, in other words, the selfcoupling of the assembly in focus. We can thus recast the first term in the input current as $s}_{1}{r}_{1}^{E$ where $s}_{1}=N{n}_{E}{p}_{EE}{J}_{EE}{J}_{EE}^{+}{f}_{1}^{E$. The number of neurons in the assembly is given by $N}_{1}=N{n}_{E}{f}_{1}^{E$, and the average EtoE synaptic coupling is $\overline{J}}^{(in)}={p}_{EE}{J}_{EE}{J}_{EE}^{+$, from which we obtain $s}_{1}={N}_{1}{\overline{J}}_{EE}^{(in)$, which is the expression we used in Figure 5. We can thus recast Equation 26 as
where $\hat{J}$ represent effective synaptic couplings which depend on the underlying spiking network parameters in Equation 26. In the spiking model, the selfcouplings have units of [mV]. The first line in Equation 27 exhibits the same functional form as the rate model in Equation 1, if we identify each rate unit as a functional assembly with a corresponding selfcoupling. A crucial simplification occurring in the rate model Equation 1 is the absence of celltype specific connectivity and the corresponding difference in the statistics of the distribution of the effective couplings $\hat{J}$, whose mean is zero in Equation 1 but nonzero in Equation 27. If we interpret the current $x$ in the rate model as a membrane potential with units of [mV] (see Miller and Fumarola, 2012), and the currenttorate function $\varphi (x)=\mathrm{tanh}(x)$ as a normalized (minmaxed) firing rate fluctuation around baseline (see Ahmadian et al., 2015), then the selfcoupling in the rate model exhibits units of [mV] as in the spiking model. However, direct numerical comparison of the selfcouplings between the two models is hampered by the fact that the spiking model is a balanced network, where E and I contributions to the total input current are large and cancel to leading order (Wyrick and Mazzucato, 2021), whereas the rate network does not operate in the balanced regime.
Model selection for timescale fit
Model selection for timescale fit The degree of the polynomial that best fits the dependence of the logarithm of the assembly timescales on the assembly selfcouplings was estimated using crossvalidation (inset in Figure 5b), according to the following steps. (1) The dataset was split into a training set and a test set of the same size. (2) Polynomial functions with increasing degrees were fit to the training set. (3) The meansquared error of the test set was obtained from the corresponding fit. (4). A minimum was achieved for a polynomial of degree 2. All logarithms, abbreviated as ‘log’ in the main text, is in the natural base $e$.
Appendix 1
Dynamical regions of networks with identical selfcouplings
It is constructive to quickly survey the results of Stern et al., 2014 who studied the special case of including a single value selfcoupling $s$ for all assemblies in the network, $P({s}_{i})={\delta}_{s,{s}_{i}}$. In this case, the dynamics of all units in the network follow:
Two variables determine the network dynamics, the network gain $g$ and the selfcoupling value $s$. The network gain $g$ defines the strength of the network impact on its units. It brings the network into chaotic activity, without selfcoupling ($s=0$), for values $\displaystyle g>1$ (Sompolinsky et al., 1988). The selfcoupling $s$ generates bistability. Without network impact ($g=0$) the dynamical Equation 28 for each unit becomes
which has two stable solutions for $\displaystyle s>1$ (Appendix 1—figure 2a), both at $x\ne 0$. For $\displaystyle s<1$ (Appendix 1—figure 2b), a single stable solution exists at $x=0$.
When small values of network gain $g$ are introduced to the network dynamics, Equation 28, with identical bistable units ($\displaystyle s>1$), each unit solution jitters around one of its two possible fixed points. After an irregular activity, the network settles into a stable fixed point. This generates a region of transient irregular activity with stable fixed points (Appendix 1—figure 2c). As $g$ increases and $s$ decreases, different possible fixed point configurations lose their stability (as a result, the typical time spent in the transient activity increases). When the critical line ${s}_{c}\approx 1+0.157\mathrm{ln}(0.443g+1)$ is crossed, no fixed point remains stable and the network activity becomes chaotic (Stern et al., 2014). The ‘last’ stable fixed point at the transition line has a unique configuration with all unit values located farthest from $x=0$ (Appendix 1—figure2a, light green lines). Additional decrease of $s$ and $g$ leads to a region where any initial activity of the network decays and the trivial solution (${x}_{i}=0$ for all $i$) is stable (Appendix 1—figure 2c)
Appendix 2
Metastable dynamics in the spiking network model
The spiking network model with E/I clusters is based on a weight matrix $W$ with four E/I submatrices, each one exhibiting diagonal blocks representing potentiated intracluster synaptic weights (see Appendix 2—figure 1a, b). We can approximate the full $N\times N$ weight matrix with a meanfield reduction to a set of $2(p+1)$ meanfield variables $r}_{i}^{MF$, representing the $p$ E and I clusters and the two unclustered background E and I populations. In order to gain an intuitive understanding of the population dynamics encoded by $J}^{MF$, we follow the approach in Murphy and Miller, 2009; Schaub et al., 2015 and can consider an auxiliary linear system
which was shown to capture the transition from asynchronous irregular activity to metastable attractor dynamics in Schaub et al., 2015. A Schur decomposition of $J}^{MF}=VT{V}^{T$ gives an upper triangular matrix $T$, whose Schur eigenvectors (columns of $V$) represent independent dynamical modes (i.e. an orthonormal basis; Appendix 2—figure 1b). The Schur eigenvalues (diagonal values in the Schur matrix) correspond to the real part of the eigenvalues of the original matrix $J}^{MF$. The Schur eigenvectors associated with the large positive eigenvalues approximately correspond to E/I cluster pairs; larger eigenvalues correspond to clusters of increasingly larger size (Appendix 2—figure 1b).
Clustered networks generate metastable attractor dynamics where coupled E/I cluster pairs switch between periods of low and high firing rate activity, yielding a bimodal firing rate distribution (Appendix 2—figure 2a, b, see LitwinKumar and Doiron, 2012; Deco and Hugues, 2012; Mazzucato et al., 2015; Mazzucato et al., 2019; Mazzucato et al., 2016; Wyrick and Mazzucato, 2021; Schaub et al., 2015). A more detailed analysis revealed that the network activity explores metastable attractors with varying number of simultaneously active clusters (from one to four in the representative network with $N=2000$ neurons in Appendix 2—figure 2c), yielding a complex attractor landscape (Mazzucato et al., 2015; Wyrick and Mazzucato, 2021). The firing rate of neurons in active clusters is inversely proportional to the number of simultaneously active clusters. Therefore, the activity of single neurons is not simply bistable, but rather multistable, with several levels of firing rates attainable depending on which attractor the network is expressing at any given time (one inactive and four active levels in this representative case). This singleneuron multistability property is a biologically plausible effect observed in cortical activity (Mazzucato et al., 2015; Recanatesi et al., 2022), however, it is not present in the rate network, where neurons are just bistable (Figure 4). In the spiking model, clusters are uncorrelated (0.01±0.12, mean±S.D. across 20 networks; Appendix 2—figure 2c), similarly to neurons in the rate network.
Appendix 3
RNN with heterogeneous time constants
Our recurrent neural network model in Equation 1, assumes that all units share the same time constant, $\theta =1$ ms, which measures the rate of change of a neuron’s membrane potential. We examined whether a network of units with heterogeneous time constants could give rise to multiple timescales of dynamics. We simulated the model from Equation 1 with no selfcoupling term, ${s}_{i}=0$, with neuronspecific time constant $\theta}_{i$:
Following the same strategy as in Figure 2, we consider the scenario when our network contains two equalsize populations of neurons ($N}_{1}={N}_{2$) with different time constants $\theta}_{1}\ne {\theta}_{2$. We quantified each unit’s timescales, $\tau}_{i$, as the width of the autocorrelation function at the mid height. When keeping $\theta}_{1$ fixed and increasing $\theta}_{2$, we found that both populations increased their timescale Figure 1a(iv), and the ratio between the timescales of the two populations, $\tau}_{2}/{\tau}_{1$ did not appreciably change over a large range of time constant ratios $\theta}_{2}/{\theta}_{1$, Figure 1b. Hence, we conclude that heterogeneity in singlecell time constants does not lead to large separation of timescales in networks with strong recurrent dynamics.
Data availability
https://github.com/nistrate/multipleTimescalesRNN. (Copy archive at Istrate, 2022).

Dryad Digital RepositoryData from: Autocorrelation structure at rest predicts value correlates of single neurons during rewardguided choice.https://doi.org/10.5061/dryad.5b331
References

Properties of networks with partially structured and partially random connectivityPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 91:012820.https://doi.org/10.1103/PhysRevE.91.012820

Transition to chaos in random networks with celltypespecific connectivityPhysical Review Letters 114:088101.https://doi.org/10.1103/PhysRevLett.114.088101

A reservoir of time constants for memory traces in cortical neuronsNature Neuroscience 14:366–372.https://doi.org/10.1038/nn.2752

Weak ergodicity breaking and aging in disordered systemsJournal de Physique I 2:1705–1713.https://doi.org/10.1051/jp1:1992238

Beyond mean field theory: statistical field theory for neural networksJournal of Statistical Mechanics 2013:03003.https://doi.org/10.1088/17425468/2013/03/P03003

Neural network mechanisms underlying stimulus driven variability reductionPLOS Computational Biology 8:e1002395.https://doi.org/10.1371/journal.pcbi.1002395

Fluctuations and information filtering in coupled populations of spiking neurons with adaptationPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 90:062704.https://doi.org/10.1103/PhysRevE.90.062704

Metabolic flux responses to pyruvate kinase knockout in Escherichia coliJournal of Bacteriology 184:152–164.https://doi.org/10.1128/JB.184.1.152164.2002

Dynamics of the firing probability of noisy integrateandfire neuronsNeural Computation 14:2057–2110.https://doi.org/10.1162/089976602320264015

Intrinsic neuronal properties switch the mode of information transmission in networksPLOS Computational Biology 10:e1003962.https://doi.org/10.1371/journal.pcbi.1003962

Colored noise in dynamical systemsAdvances in Chemical Physics 89:239–326.https://doi.org/10.1002/9780470141489

A unified mathematical framework for coding time, space, and sequences in the hippocampal regionThe Journal of Neuroscience 34:4692–4707.https://doi.org/10.1523/JNEUROSCI.580812.2014

SoftwaremultipleTimescalesRNN, version swh:1:rev:bb4e9879442c1f3e3efa72420d75f453488bd0c4Software Heritage.

Dynamical systems: A unified colorednoise approximationPhysical Review. A, General Physics 35:4464–4466.https://doi.org/10.1103/physreva.35.4464

Movement decomposition in the primary motor cortexCerebral Cortex 29:1619–1633.https://doi.org/10.1093/cercor/bhy060

Mechanisms of information filtering in neural systemsIEEE Transactions on Molecular, Biological and MultiScale Communications 2:5–15.https://doi.org/10.1109/TMBMC.2016.2618863

Slow dynamics and high variability in balanced cortical networks with clustered connectionsNature Neuroscience 15:1498–1505.https://doi.org/10.1038/nn.3220

Selftuned critical antiHebbian networksPhysical Review Letters 102:258102.https://doi.org/10.1103/PhysRevLett.102.258102

Statistical dynamics of classical systemsPhysical Review A 8:423–437.https://doi.org/10.1103/PhysRevA.8.423

Dynamics of multistable states during ongoing and evoked cortical activityThe Journal of Neuroscience 35:8214–8231.https://doi.org/10.1523/JNEUROSCI.481914.2015

Stimuli reduce the dimensionality of cortical activityFrontiers in Systems Neuroscience 10:11.https://doi.org/10.3389/fnsys.2016.00011

Spatial gradients and multidimensional dynamics in a neural integrator circuitNature Neuroscience 14:1150–1159.https://doi.org/10.1038/nn.2888

A hierarchy of intrinsic timescales across primate cortexNature Neuroscience 17:1661–1663.https://doi.org/10.1038/nn.3862

How single neuron properties shape chaotic dynamics and signal transmission in random neural networksPLOS Computational Biology 15:e1007122.https://doi.org/10.1371/journal.pcbi.1007122

Neural heterogeneity promotes robust learningNature Communications 12:5791.https://doi.org/10.1038/s41467021260223

Stimulusdependent suppression of chaos in recurrent neural networksPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 82:011903.https://doi.org/10.1103/PhysRevE.82.011903

Emergence of slowswitching assemblies in structured neuronal networksPLOS Computational Biology 11:e1004196.https://doi.org/10.1371/journal.pcbi.1004196

Chaos in random neural networksPhysical Review Letters 61:259–262.https://doi.org/10.1103/PhysRevLett.61.259

Dynamics of random neural networks with bistable unitsPhysical Review E 90:062710.https://doi.org/10.1103/PhysRevE.90.062710

Statedependent regulation of cortical processing speed via gain modulationThe Journal of Neuroscience 41:3988–4005.https://doi.org/10.1523/JNEUROSCI.189520.2021

Task representations in neural networks trained to perform many cognitive tasksNature Neuroscience 22:297–306.https://doi.org/10.1038/s4159301803102
Decision letter

Srdjan OstojicReviewing Editor; École Normale Supérieure  PSL, France

Michael J FrankSenior Editor; Brown University, United States

Tilo SchwalgerReviewer; Technische Universität Berlin, Germany
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "A reservoir of timescales emerges in recurrent circuits with heterogeneous neural assemblies" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Frank as the Senior Editor.
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
https://doi.org/10.7554/eLife.86552.sa1Author response
Essential revisions:
– Clarify to which extent the rate model and the spiking model rely on identical mechanisms. Tone down the claims if the correspondence is more indirect then initially suggested.
We overhauled our manuscript to present a critical assessment of both the similarities and the differences between the rate and spiking model, and performed series of theoretical investigations supported by new simulations, aimed at comparing the neural mechanisms of timescale heterogeneity between the two models. First, we introduced a meanfieldtheory description of the spiking network, yielding a reduced network whose degrees of freedom are the neural assemblies themselves. In this reduced network, we identified the Schur eigenvectors of the synaptic coupling matrix as the approximate network dynamical modes, which represent strongly coupled pairs of excitatory and inhibitory assemblies. We showed that each mode is associated with a large positive eigenvalue in the synaptic matrix, and that the distribution of such gapped eigenvalues lies at the origin of the heterogeneous distribution of timescales. A similar analysis performed in the rate model suggested that the best correspondence between spiking and rate models is at the level of these Schur eigenvectors. This new analysis shows that the heterogeneity of timescales in the two model arises from a similar dynamical mechanism, namely, a heterogeneity in the eigenvalue distribution, which, in the spiking network, can be traced back to the selfcouplings of neural assemblies via the meanfield reduced picture.
– Clarify the importance of coloured noise in the mechanism by adding comparisons with the whitenoise case. When does the coloured noise theory in Figure 4c break down?
In the revised manuscript and in our detailed response to the Reviewers below, we elaborate on the comparison between colored noise and white noise in the probe approximation of the rate network for large selfcouplings, including the analysis of several alternative models based on the white noise case, as suggested by the Reviewers. Moreover, we clarify the extent to which our Universal Colored Noise Approximation and its limit of validity. Regarding specifically the white noisebased theory, we argue that it works well for a unit with selfcoupling s^{2} > g, where g is the network gain. The reason is that the network dynamics is responsible for generating the mean field "noise", g, which acts to reduce the potential barrier between the bistable states (and generates a concave part of the potential between its two convex spaces), while the selfcoupling acts to increase the barrier and the convexity of the two bistable potential states. Since the selfcoupling s impacts the dynamical behavior by its squared value (as manifest in the log timescale dependency and the meanfield solutions), while the network impact g remains linear, we estimate a good approximation at s^{2} > g.
– Show how the distribution of time scales in the spiking network depends on total network size and indegree
In the revised manuscript, we overhauled the presentation of the spiking network model by showing how network parameters scale with the network size. We show that, in network of increasingly larger sizes, the heterogeneous distribution of timescales is maintained and its agreement with the theory improves. We further highlight how the Schur decomposition into dynamical modes described above scales with network size. Because our spiking network synaptic coupling matrix was based on ErdosRenyi connectivity, neurons within the same neural assemblies have different indegrees, which we chose to scale with a small linear dependence on network size.
Reviewer #2 (Recommendations for the authors):
Relation between time scales and singleunit properties
The text in lines 223ff and Figure 4 argue that the large separation in time scales is due to an interplay between the bistability of single units due to strong selfcoupling and the colored noise input from the recurrent network. The authors show that a large time scale does not emerge if the recurrent network input is replaced by white noise input (Figure 4c). How does this result go together with the following arguments: Assume low white noise input such that one can effectively linearize the dynamics. Then the selfcoupling s effectively changes the time constant of the neuron (tau = 1/(1s)). A large time scale could then be achieved by sending s > 1.
So would for example a nonchaotic regime (small g), low external white noise input and finetuned selfcouplings be an alternative explanation for the large time scales?
Appendix 2 seems to also touch upon this issue. However, there the network is in a chaotic regime such that differences in singleneuron time scales seem to get overwritten by the large colored recurrent network input. Adding a bit of white noise input to eq 28 and lowering g to operate in the regular regime would in my opinion create a large diversity in taus for different thetas.
Can the authors please elaborate in much more detail on this issue and on possible alternative mechanisms to create a reservoir of time scales?
We thanks the Reviewer for suggesting potential alternative mechanisms, whose viability we will address below.
In the section "Separation of timescales in the bistable chaotic regime," our goal is to find an analytical expression for the relation between the timescales and the selfcoupled units (the basic structural building blocks of the network) representing neural assemblies. We also aim to explore the reason behind this relation; i.e., we aim to understand how the presence of neural assemblies generates long timescales. As part of this aim, we replace the network input, as mimicked by the colored mean field, with white noise. By comparing the resulting dynamics, we can deduce the network’s contribution, as an input generator, in creating long time scales. For the comparison to hold, we must replace the network’s meanfield mimicking input with white noise in otherwise completely equivalent dynamical scenarios. This exercise clarifies that the long timescale emergence require a combination of both the single unit property (large selfcouplings) together with the selfconsistent colored noise arising from the network dynamics (because the white noise fails to generate the long timescales even in the presence of large selfcouplings).
We agree with the reviewer that once other possibilities are considered, including weak network couplings and finetuning options, for example, long timescales in neural networks might be generated. Indeed, the Reviewer suggests interesting models, all of which use the help of external white noise to drive long timescales.
First of all, we note that one crucial ingredient missing in the alternative models proposed by the Reviewer, but which is one of the main motivation for our investigation, is that they are not selfconsistent, namely, they need to postulate the presence of external noise drive with specific properties in order to sustain activity, hence, they still require an explanation for where how this drive is generated. Nevertheless, we explore here each of the ideas suggested by the reviewer, their advantage, disadvantage, and plausibility (all references to figures in the answer below refer to figures within this response document; in all the simulations used to generate the figures, t’s unit is 1ms, as is in the main text).
Units that are driven by low white noise and have selfcoupling values approaching 1. The low noise drives weak activity (in magnitude; see Figure 2a, including for the dynamical equation and parameters). An effective linearized dynamics can be considered (due to the weak activity) with an effective time constant tau = _{/}^{1}1−s. A large timescale is achieved by sending s → 1; see 2b. For the phenomenon to occur (and the linearization to hold), the fluctuations magnitude has to be very weak (about 1% of the fixedpoint value, Figure 2a), and the selfcoupling must be tuned within 5% of its value for a single order of magnitude increase in timescale or 0.5% for two orders of magnitude, Figure 2c inset. Thus, in order to achieve a heterogeneity of timescales over orders of magnitudes requires extreme hierarchical fine tuning of the selfcouplings. In contrast, our model achieves a large heterogeneity due to the exponential relationship between transition times and the square of the self coupling. A Network that is operating in the nonchaotic regime, g < 1, with low external white noise and finetuned selfcoupled units. There are a few options here for the choice of the network gain g in relation to the noise level. If g << 1, the network drive to each unit is negligible, and the units are practically decoupled. This is effectively the scenario discussed in the previous example. If g is less but close to 1, yet in the nonchaotic regime, the noise fails to create long timescale effects since it is overridden by the network, whose impact by itself dies quickly. Hence, the interesting scenario is when g and the noise are roughly the same scale. In this case the timescales of the activity of selfcoupled units with s → 1 are somewhat lengthened, and that propagates via the network to other units, Figure 3a. Interestingly, the timescales are strongly dependent on the realization, to the point of almost no correlation between each unit’s timescale in different network couplings realization. As a result, this is an intriguing way to generate, from white noise, via a network with selfcoupled units, a correlated noise signal with randomly spread timescales.
A Network that is operating in the nonchaotic regime, g < 1, with low external white noise and different membrane time constants. To test this model, we split the network into two groups, each with a different membrane time constant (similar to Appendix 3, except the network here operates in the nonchaotic regime). We vary the membrane time constant of the second group. A lengthening of the timescale of the neurons in the second group is observed, Figure 3b. The lengthening (in log scale) was sublinear and converging, meaning a significant increase in the membrane time constant was needed to change the magnitude of the timescales. Moreover, the first group of neurons (with a fixed membrane constant) followed the changes in timescales, so the span of the timescales in the system remains bounded.
Mapping of clustered spiking network to network of bistable rate units
 The abstract is a bit misleading as it suggests that the authors build an analytic theory for time scales in relation to assembly size. Yet, the mapping between clustered spiking networks and bistable rate units is only qualitative, and the theory only applies to the rate network. There is no quantitative theoretical prediction for the spiking network. For the purpose of the current study, this rationale is fully sufficient. The authors should just make this point more clear.
We agree the Reviewer that the analytical theory holds for rate networks, and the results of the spiking models are based on simulations only. We clarified this point explicitly in the revised abstract.
Finetuning needed?
Can the authors elaborate more on how much finetuning is needed in the spiking network to obtain the large repertoire of time scales? The time scales in Figure 5b are all below 100s while the experimental data also show larger time scales. Is it possible to obtain even larger time scales in the spiking network model? What modification would be needed?
We apologize with the Reviewer for a typo in the previous version of the manuscript in Figure 3b caption: the unit for the timescales in the data and in the model is milliseconds not seconds (see also the corresponding plot in Figure 2c of (? )). We corrected the typo in the revised version. The range of timescales in the spiking network are thus well above the range observed in the data, and it would be interesting to design ad hoc experiments with chronic recordings to estimate the actual maximum range of timescales in empirical data, before circadian rhythms of secular effects come into play. Regarding the amount of fine tuning required in the model, in the revised manuscript we performed an extensive set of stimulations to investigate the scaling properties of the timescale distribution as one increases the size of the network. We found that the timescale distribution maintains a similar range when network sizes are scaled up from N = 2000 up to N = 10,000. In the revised manuscript, we overhauled the Methods section (e) and the Table to present the spiking network scaling parameterization. Moreover, we replaced the network in the main Figure 5 with the largest size N = 10,000, and introduced a new Appendix 2 to showcase the scaling properties of the spiking network and its Schur decomposition. Regarding potential fine tuning in other parameters, we believe that an extensive exploration of the full highdimensional parameter space of the model is beyond the scope of the present paper, whose goal was to present an example of a spiking model that realizes the theoretical mechanism proposed in the rate model.
https://doi.org/10.7554/eLife.86552.sa2Article and author information
Author details
Funding
National Institute of Neurological Disorders and Stroke (R01NS118461)
 Luca Mazzucato
National Institute on Drug Abuse (R01DA055439)
 Luca Mazzucato
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We would like to thank J Wallis and S Cavanagh for sharing their previously published data Cavanagh et al., 2016; and G Mongillo and G La Camera for discussions. LM was supported by National Institute of Neurological Disorders and Stroke grant R01NS118461 and by the National Institute on Drug Abuse grant R01DA055439 (CRCNS). MS was supported by The Hebrew University of Jerusalem 'Emergency response to COVID19’ grant and hosted by the Theoretical Sciences Visiting Program at the Okinawa Institute of Science and Technology while completing this study.
Senior Editor
 Michael J Frank, Brown University, United States
Reviewing Editor
 Srdjan Ostojic, École Normale Supérieure  PSL, France
Reviewer
 Tilo Schwalger, Technische Universität Berlin, Germany
Version history
 Preprint posted: October 12, 2021 (view preprint)
 Received: January 31, 2023
 Accepted: December 7, 2023
 Accepted Manuscript published: December 12, 2023 (version 1)
 Version of Record published: January 25, 2024 (version 2)
Copyright
© 2023, Stern, Istrate et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 785
 Page views

 158
 Downloads

 0
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
Basal forebrain cholinergic neurons modulate how organisms process and respond to environmental stimuli through impacts on arousal, attention, and memory. It is unknown, however, whether basal forebrain cholinergic neurons are directly involved in conditioned behavior, independent of secondary roles in the processing of external stimuli. Using fluorescent imaging, we found that cholinergic neurons are active during behavioral responding for a reward – even prior to reward delivery and in the absence of discrete stimuli. Photostimulation of basal forebrain cholinergic neurons, or their terminals in the basolateral amygdala (BLA), selectively promoted conditioned responding (licking), but not unconditioned behavior nor innate motor outputs. In vivo electrophysiological recordings during cholinergic photostimulation revealed rewardcontingencydependent suppression of BLA neural activity, but not prefrontal cortex. Finally, ex vivo experiments demonstrated that photostimulation of cholinergic terminals suppressed BLA projection neuron activity via monosynaptic muscarinic receptor signaling, while also facilitating firing in BLA GABAergic interneurons. Taken together, we show that the neural and behavioral effects of basal forebrain cholinergic activation are modulated by reward contingency in a targetspecific manner.

 Neuroscience
Orbitofrontal cortex (OFC) is classically linked to inhibitory control, emotion regulation, and reward processing. Recent perspectives propose that the OFC also generates predictions about perceptual events, actions, and their outcomes. We tested the role of the OFC in detecting violations of prediction at two levels of abstraction (i.e., hierarchical predictive processing) by studying the eventrelated potentials (ERPs) of patients with focal OFC lesions (n = 12) and healthy controls (n = 14) while they detected deviant sequences of tones in a local–global paradigm. The structural regularities of the tones were controlled at two hierarchical levels by rules defined at a local (i.e., between tones within sequences) and at a global (i.e., between sequences) level. In OFC patients, ERPs elicited by standard tones were unaffected at both local and global levels compared to controls. However, patients showed an attenuated mismatch negativity (MMN) and P3a to local prediction violation, as well as a diminished MMN followed by a delayed P3a to the combined local and global level prediction violation. The subsequent P3b component to conditions involving violations of prediction at the level of global rules was preserved in the OFC group. Comparable effects were absent in patients with lesions restricted to the lateral PFC, which lends a degree of anatomical specificity to the altered predictive processing resulting from OFC lesion. Overall, the altered magnitudes and time courses of MMN/P3a responses after lesions to the OFC indicate that the neural correlates of detection of auditory regularity violation are impacted at two hierarchical levels of rule abstraction.