Network dynamics underlying OFF responses in the auditory cortex
Abstract
Across sensory systems, complex spatiotemporal patterns of neural activity arise following the onset (ON) and offset (OFF) of stimuli. While ON responses have been widely studied, the mechanisms generating OFF responses in cortical areas have so far not been fully elucidated. We examine here the hypothesis that OFF responses are singlecell signatures of recurrent interactions at the network level. To test this hypothesis, we performed population analyses of twophoton calcium recordings in the auditory cortex of awake mice listening to auditory stimuli, and compared them to linear singlecell and network models. While the singlecell model explained some prominent features of the data, it could not capture the structure across stimuli and trials. In contrast, the network model accounted for the lowdimensional organization of population responses and their global structure across stimuli, where distinct stimuli activated mostly orthogonal dimensions in the neural statespace.
Introduction
Neural responses within the sensory cortices are inherently transient. In the auditory cortex (AC) even the simplest stimulus, for instance a pure tone, evokes neural responses that strongly vary in time following the onset and offset of the stimulus. A number of past studies have reported a prevalence of ON compared to OFFresponsive neurons in different auditory areas (Phillips et al., 2002; Luo et al., 2008; Fu et al., 2010; Pollak and Bodenhamer, 1981). As a result, the transient onset component has been long considered the dominant feature of auditory responses and has been extensively studied across the auditory pathway (Liu et al., 2019b; Kuwada and Batra, 1999; Grothe et al., 1992; Guo and Burkard, 2002; Yu et al., 2004; Heil, 1997a; Heil, 1997b), with respect to its neurophysiological basis and perceptual meaning (Phillips et al., 2002). In parallel, due to less evidence of OFFresponsive neurons in anaesthetized animals, OFF cortical responses have received comparably less attention. Yet, OFF responses have been observed in awake animals throughout the auditory pathway, and in the mouse AC they arise in 30–70% of the responsive neurons (Scholl et al., 2010; Keller et al., 2018; Joachimsthaler et al., 2014; Liu et al., 2019a; Sollini et al., 2018).
While the generation of ON responses has been attributed to neural mechanisms based on shortterm adaptation, most likely inherited from the auditory nerve fibers (Phillips et al., 2002; Smith and Brachman, 1980; Smith and Brachman, 1982), the mechanisms that generate OFF responses are more diverse and seem to be areaspecific (Xu et al., 2014; KoppScheinpflug et al., 2018). In subcortical regions, neurons in the dorsal cochlear nucleus and in the superior paraolivary nucleus of the brainstem nuclei may generate OFF responses by postinhibitory rebound, a synaptic mechanism in which a neuron emits one or more spikes following the cessation of a prolonged hyperpolarizing current (Suga, 1964; Hancock and Voigt, 1999; KoppScheinpflug et al., 2011). In the midbrain inferior colliculus and in the medial geniculate body of the thalamus, OFF responses appear to be mediated by excitatory inputs from upstream areas and potentially boosted by a postinhibitory facilitation mechanism (Kasai et al., 2012; Vater et al., 1992; Yu et al., 2004; He, 2003). Unlike in subcortical areas, OFF responses in AC do not appear to be driven by hyperpolarizing inputs during the presentation of the stimulus, since synaptic inhibition has been found to be only transient with respect to the stimulus duration (Qin et al., 2007; Scholl et al., 2010). The precise cellular or network mechanisms underlying transient OFF responses in cortical areas therefore remain to be fully elucidated.
Previous studies investigating the transient responses in the auditory system mostly adopted a singleneuron perspective (Henry, 1985; Scholl et al., 2010; Qin et al., 2007; He, 2002; Wang et al., 2005; Wang, 2007). However, in recent years, population approaches to neural data have proven valuable for understanding the role of transients dynamics in various cortical areas (Buonomano and Maass, 2009; Remington et al., 2018; Saxena and Cunningham, 2019). Work in the olfactory system has shown that ON and OFF responses encode the stimulus identity in the dynamical patterns of activity across the neural population (Mazor and Laurent, 2005; Stopfer et al., 2003; Broome et al., 2006; Friedrich and Laurent, 2001; Saha et al., 2017). In motor and premotor cortex, transient responses during movement execution form complex population trajectories (Churchland and Shenoy, 2007; Churchland et al., 2010) that have been hypothesized to be generated by a mechanism based on recurrent network dynamics (Shenoy et al., 2011; Hennequin et al., 2014; Sussillo et al., 2015; Stroud et al., 2018). In the AC, previous works have suggested a central role of the neural dynamics across large populations for the coding of different auditory features (Deneux et al., 2016; Saha et al., 2017; Lim et al., 2016), yet how these dynamics are generated remains an open question.
Leveraging the observation that the AC constitutes a network of neurons connected in a recurrent fashion (Linden and Schreiner, 2003; Oswald and Reyes, 2008; Oswald et al., 2009; Barbour and Callaway, 2008; Bizley et al., 2015), in this study we test the hypothesis that transient OFF responses are generated by a recurrent network mechanism broadly analogous to the motor cortex (Churchland et al., 2006; Hennequin et al., 2014). We first analyzed OFF responses evoked by multiple auditory stimuli in large neural populations recorded using calcium imaging in the mouse AC (Deneux et al., 2016). These analyses identified three prominent features of the auditory cortical data: (i) OFF responses correspond to transiently amplified trajectories of population activity; (ii) for each stimulus, the corresponding trajectory explores a lowdimensional subspace; and (iii) responses to different stimuli lie mostly in orthogonal subspaces. We then determined to what extent these features can be accounted for by a linear singlecell or network model. We show that the singlecell model can reproduce the first two features of population dynamics in response to individual stimuli, but cannot capture the structure across stimuli and single trials. In contrast, the network model accounts for all three features. Identifying the mechanisms responsible for these features led to additional predictions that we verified on the data.
Results
ON/OFF responses in AC reflect transiently amplified population dynamics
We analyzed the population responses of 2343 cells from the AC of three awake mice recorded using calcium imaging techniques (data from Deneux et al., 2016). The neurons were recorded while the mice passively listened to randomized presentations of different auditory stimuli. In this study we consider a total of 16 stimuli, consisting of two groups of intensity modulated UP or DOWNramping sounds. In each group, there were stimuli with different frequency content (either 8 kHz pure tones or white noise [WN] sounds), different durations (1 or 2 s) and different intensity slopes (either 50–85 dB or 60–85 dB and reversed, see Table 1 and Materials and methods, Section 'The data set').
We first illustrate the responses of single cells to the presentation of different auditory stimuli, focusing on the periods following the onset and offset of the stimulus. The activity of individual neurons to different stimuli was highly heterogeneous. In response to a single stimulus, we found individual neurons that were strongly active only during the onset of the stimulus (ON responses), or only during the offset (OFF responses), while other neurons in the population responded to both stimulus onset and offset, consistent with previous analyses (Deneux et al., 2016). Importantly, across stimuli some neurons in the population exhibited ON and/or OFF responses only when specific stimuli were presented, showing stimulusselectivity of transient responses, while others strongly responded at the onset and/or at the offset of multiple stimuli (Figure 1A).
Because of the intrinsic heterogeneity of singlecell responses, we examined the structure of the transient ON and OFF responses to different stimuli using a population approach (Buonomano and Maass, 2009; Saxena and Cunningham, 2019). The temporal dynamics of the collective response of all the neurons in the population can be represented as a sequence of states in a highdimensional state space, in which the ith coordinate corresponds to the (baselinesubtracted) firing activity ${r}_{i}(t)$ of the ith neuron in the population. At each time point, the population response is described by a population activity vector $\mathbf{\mathbf{r}}(t)$ which draws a neural trajectory in the state space (Figure 1B left panel).
To quantify the strength of the population transient ON and OFF responses, we computed the distance of the population activity vector from its baseline firing level (average firing rate before stimulus presentation), corresponding to the norm of the population activity vector $\mathbf{\mathbf{r}}(t)$ (Mazor and Laurent, 2005). This revealed that the distance from baseline computed during ON and OFF responses was larger than the distance computed for the state at the end of stimulus presentation (Figure 1B right panel). We refer to this feature of the population transient dynamics as the transient amplification of ON and OFF responses.
To examine what the transient amplification of ON and OFF responses implies in terms of stimulus decoding, we trained a simple decoder to classify pairs of stimuli that differed in their frequency content, but had the same intensity modulation and duration. We found that the classification accuracy was highest during the transient phases corresponding to ON and OFF responses, while it decreased at the end of stimulus presentation (Figure 1C). This result revealed a robust encoding of the stimulus features during ON and OFF responses, as previously found in the locust olfactory system (Mazor and Laurent, 2005; Saha et al., 2017).
OFF responses rely on orthogonal lowdimensional subspaces
To further explore the structure of the neural trajectories associated with the population OFF responses to different stimuli, we analyzed neural activity using dimensionality reduction techniques (Cunningham and Yu, 2014). We focused specifically on responses within the period starting 50 ms before stimulus offset to 300 ms after stimulus offset.
By performing principal component analysis (PCA) independently for the responses to individual stimuli, we found that the dynamics during the transient OFF responses to individual stimuli explored only about five dimensions, as 80% of the variance of the OFF responses to individual stimuli was explained on average by the first five principal components (Figure 2A; see Figure 2—figure supplement 1 for crossvalidated controls; note that, given the temporal resolution, the maximal dimensionality explaining 80% of the variance of the responses to individual stimuli was 9). The projection of the lowdimensional OFF responses to each stimulus onto the first two principal components revealed circular activity patterns, where the population vector smoothly rotated between the two dominant dimensions (Figure 2B).
A central observation revealed by the dimensionality reduction analysis was that the OFF response trajectories relative to stimuli with different frequency content spanned orthogonal lowdimensional subspaces. For instance, the response to the 8 kHz sound was poorly represented on the plane defined by the two principal components of the response to the WN sound (Figure 2B), and vice versa, showing that they evolved in distinct subspaces. To quantify the relationship between the subspaces spanned by the OFF responses to different stimuli, we proceeded as follows. We first computed the first five principal components of the (baselinesubtracted) OFF response ${\mathbf{\mathbf{r}}}^{(s)}(t)$ to each individual stimulus s. Therefore, for each stimulus s these dimensions define a fivedimensional subspace. Then, for each pair of stimuli, we computed the relative orientation of the corresponding pair of subspaces, measured by the subspace overlap (Figure 2D; see Materials and methods, Section 'Correlations between OFF response subspaces').
This approach revealed an interesting structure between the OFF response subspaces for different stimuli (Figure 2D and Figure 2—figure supplement 2). Stimuli with different frequency content evoked in general nonoverlapping OFF responses reflected in low values of the subspace overlap. Two clusters of correlated OFF responses emerged, corresponding to the 8 kHz UPramps and WN UPramps of different durations and intensity. Surprisingly, DOWNramps evoked OFF responses that were less correlated than UPramps, even for sounds with the same frequency content.
The fact that most of the stimuli evoked nonoverlapping OFF responses is reflected in the number of principal components that explain 80% of the variance for all OFF responses considered together, which is around 60 (Figure 2C and Figure 2—figure supplement 1). This number is in fact close to the number of dominant components of the joint response to all 16 stimuli (see Table 1) that we would expect if the responses to individual stimuli evolved on uncorrelated subspaces (given by #PC per stimulus × #stimuli ≈80). Notably this implies that while the OFF responses to individual stimuli span lowdimensional subspaces, the joint response across stimuli shows highdimensional structure.
We finally examined to what extent the structure observed between the OFF response trajectories ${\mathbf{\mathbf{r}}}^{(s)}(t)$ to different stimuli (Figure 2D) could be predicted from the structure of the population activity states reached at the end of stimulus presentation, corresponding to the initial states ${\mathbf{\mathbf{r}}}^{(s)}(0)$. Remarkably, we found that the initial states exhibited structure across stimuli that matched well the structure of OFF response dynamics (Figure 2E left panel and Figure 2—figure supplement 4), even though the component along the initial states was substracted from the corresponding OFF response trajectories (Figure 2E right panel and Figure 2—figure supplement 4). This suggests that initial states contribute to determining the subsequent dynamics of the OFF responses.
Singlecell model for OFF responses
Our analyses of auditory cortical data identified three prominent features of population dynamics: (i) OFF responses correspond to transiently amplified trajectories; (ii) responses to individual stimuli explore lowdimensional subspaces; (iii) responses to different stimuli lie in largely orthogonal subspaces. We next examined to what extent these three features could be accounted for by a singlecell mechanisms for OFF response generation.
The AC is not the first stage where OFF responses arise. Robust OFF responses are found throughout the auditory pathway, and in subcortical areas the generation of OFF responses most likely relies on mechanisms that depend on the interaction between the inhibitory and excitatory synaptic inputs to single cells (e.g. postinhibitory rebound or facilitation, see KoppScheinpflug et al., 2018). To examine the possibility that OFF responses in AC are consistent with a similar singlecell mechanism, we considered a simplified, linear model that could be directly fit to calcium recordings in the AC. Adapting previously used models (Anderson and Linden, 2016; Meyer et al., 2016), we assumed that the cells received no external input after stimulus offset, and that the response of neuron i after stimulus offset is specified by a characteristic linear filter, which describes the cell’s intrinsic response generated by intracellular or synaptic mechanisms (Figure 3A). We moreover assumed that the shape of this temporal response is set by intrinsic properties, and is therefore identical across different stimuli. In the model, each stimulus modulates the response of a single neuron linearly depending on its activity at stimulus offset, so that the (baselinesubtracted) OFF response of neuron i to stimulus s is written as:
where $r}_{0,i}^{(s)$ is the modulation factor for neuron i and stimulus s. We estimated the singlecell responses ${L}_{i}(t)$ from the data by fitting basis functions to the responses of neurons subject to prior normalization by the modulation factors $r}_{0,i}^{(s)$ using linear regression (see Materials and methods, Section 'Singlecell model for OFF response generation'). We first fitted the singlecell model to responses to a single stimulus, and then increased the number of stimuli.
The singlecell model accounted well for the first two features observed in the data: (i) because the singlecell responses ${L}_{i}(t)$ are in general nonmonotonic, the distance from baseline of the population activity vector displayed amplified dynamics (Figure 3B); (ii) at the population level, the trajectories generated by the singlecell model spanned lowdimensional subspaces of at least two dimensions (Figure 3C) and provided excellent fits to the trajectories in the data when the response to a single stimulus was considered (Figure 3C). However, we found that the singlecell model could not account for the structure of responses across multiple stimuli. Indeed, although fitting the model to OFF responses to a single stimulus led to an excellent match with auditory cortical data (coefficient of determination ${R}^{2}=0.75$), increasing the number of simultaneously fitted stimuli led to increasing deviations from the data (Figure 3B,C), and strongly decreased the goodness of fit (Figure 3D). When fitting all stimuli at once, the goodness of fit was extremely poor (coefficient of determination ${R}^{2}=0.1$), and therefore the singlecell model could not provide useful information about the third feature of the data, the structure of subspaces spanned in response to different stimuli. A simple explanation for this inability to capture structure across stimuli is that in the singlecell model the temporal shape of the response of each neuron is the same across all stimuli, while this was not the case in the data (Figure 3E).
Network model for OFF responses
We contrasted the singlecell model with a network model that generated OFF responses through collective interactions between neurons. Specifically, we studied a linear network of $N$ recurrently coupled linear rate units with time evolution given by:
The quantity ${r}_{i}(t)$ represents the deviation of the activity of the unit i from its baseline firing rate, while ${J}_{ij}$ denotes the effective strength of the connection from neuron j to neuron i (Figure 3F). As in the singlecell model, we assumed that the network received no external input after stimulus offset. The only effect of the preceding stimulus was to set the initial pattern of activity of the network at $t=0$. The internal recurrent dynamics then fully determined the subsequent evolution of the activity. Each stimulus s was thus represented by an initial state ${\mathbf{\mathbf{r}}}_{0}^{(s)}$ that was mapped onto a trajectory of activity ${\mathbf{\mathbf{r}}}^{(s)}(t)$. We focused on the dynamics of the network following stimulus offset, which we represent as $t=0$.
To quantify to what extent recurrent interactions could account for auditory cortical dynamics, we fitted the network model to OFF responses using linear regression (see Materials and methods, Section 'Fitting the network model'). For each stimulus, the pattern of initial activity ${\mathbf{\mathbf{r}}}_{0}^{(s)}$ was taken from the data, and the connectivity matrix $\mathbf{\mathbf{J}}$ was fitted to the responses to a progressively increasing number of stimuli.
Qualitatively, we found that the fitted network model reproduced well the first two features of the data, transient amplification and lowdimensional population trajectories (Figure 3G,H). Quantitatively, we evaluated the goodness of fit by computing the coefficient of determination ${R}^{2}$. While the goodness of fit of the network model was lower than the singlecell model when fitting the response to a single stimulus (${R}^{2}=0.52$), for the network model the goodness of fit remained consistently high as the number of simultaneously fitted stimuli was increased (Figure 3D, ${R}^{2}=0.52$ when fitted on the responses to all stimuli). Computing the goodness of fit by taking into account the number of parameters of the network and singlecell models led to the same conclusion (Figure 3—figure supplement 1). When fitted to all stimuli at once, the fitted network model captured the structure of the subspace overlaps (Figure 3I left panel) and its correlation with the structure of initial conditions (Figure 3I right panel). In contrast to the singlecell model, the network model therefore accounted well for the third feature of the data, the structure of responses across stimuli. This can be explained by the fact that, in the network model, the temporal OFFresponses of single cells can in general differ across stimuli (see Figure 6A,D), similar to the activity in the AC (Figure 3E).
Testing the network mechanisms of OFF responses on the data
Having found that the fitted network model reproduced the three main features of the data, we next analyzed this model to identify the mechanisms responsible for each feature. The identified mechanisms provided new predictions that we tested on the effective connectivity matrix $\mathbf{\mathbf{J}}$ obtained from the fit to the data.
Transiently amplified OFF responses
The first feature of the data was that the dynamics were transiently amplified, i.e. the OFF responses first deviated from baseline before eventually converging to it. A preliminary requirement to reproduce this feature is that dynamics are stable, that is, eventually decay to baseline following any initial state. This requirement leads to the usual condition that the eigenvalues of the connectivity matrix $\mathbf{\mathbf{J}}$ have real parts less than unity. Provided this basic requirement is met, during the transient dynamics from the initial state at stimulus offset, the distance from baseline can either monotonically decrease, or transiently increase before eventually decreasing. To generate the transient increase, the connectivity matrix needs to belong to the class of nonnormal matrices (Trefethen and Embree, 2005; Murphy and Miller, 2009; Goldman, 2009; Hennequin et al., 2012; see Materials and methods, Section 'Normal and nonnormal connectivity matrices'), but this is not a sufficient condition. More specifically, the largest eigenvalue of the symmetric part defined by ${\mathbf{\mathbf{J}}}_{S}=(\mathbf{\mathbf{J}}+{\mathbf{\mathbf{J}}}^{T})/2$ needs to be larger than unity, while the initial state ${\mathbf{\mathbf{r}}}_{0}$ needs to belong to a subset of amplified patterns (Bondanelli and Ostojic, 2020; see Materials and methods, Section 'Sufficient condition for amplified OFF responses').
To test the predictions derived from the condition for amplified transient responses, that is, that $\mathbf{\mathbf{J}}$ belongs to a specific subset of nonnormal matrices, we examined the spectra of the full connectivity $\mathbf{\mathbf{J}}$ and of its symmetric part ${\mathbf{\mathbf{J}}}_{S}$ (Bondanelli and Ostojic, 2020). The eigenvalues of the fitted connectivity matrix had real parts smaller than one, indicating stable dynamics, and large imaginary parts (Figure 4A). Our theoretical criterion predicted that for OFF responses to be amplified, the spectrum of the symmetric part ${\mathbf{\mathbf{J}}}_{S}$ must have at least one eigenvalue larger than unity. Consistent with this predictions, we found that the symmetric part of the connectivity had indeed a large number of eigenvalues larger than one (Figure 4B) and could therefore produce amplified responses (Figure 3G).
Lowdimensionality of OFF response trajectories
The second feature of auditory cortical data was that each stimulus generated a population response embedded in an approximately fivedimensional subspace of the full state space. We hypothesized that lowdimensional responses in the fitted network model originated from a lowrank structure in the connectivity (Mastrogiuseppe and Ostojic, 2018), implying that the connectivity matrix could be approximated in terms of $R\ll N$ modes, that is, as
where each mode was specified by two $N$dimensional vectors ${\mathbf{\mathbf{u}}}^{(r)}$ and ${\mathbf{\mathbf{v}}}^{(r)}$, which we term the right and left connectivity patterns respectively (Mastrogiuseppe and Ostojic, 2018). This set of connectivity patterns is uniquely defined from the singular value decomposition (SVD) of the connectivity matrix $\mathbf{\mathbf{J}}$ (see Materials and methods, Section 'Lowdimensional dynamics').
To test the lowrank hypothesis, we refitted the network model while constraining the rank of connectivity matrix (Figure 5A, Figure 5—figure supplement 1, Figure 5—figure supplement 2; see Materials and methods, Section 'Fitting the network model'). Progressively increasing the rank $R$, we found that $R=6$ was sufficient to capture more than 80% of the variance explained by the network model when fitting the responses to individual stimuli (Figure 5A).
The dynamics in the obtained lowrank network model are therefore fully specified by a set of patterns over the neural population: patterns corresponding to initial states ${\mathbf{\mathbf{r}}}_{0}$ determined by the stimuli, and patterns corresponding to connectivity vectors ${\mathbf{\mathbf{u}}}^{(r)}$ and ${\mathbf{\mathbf{v}}}^{(r)}$ for $r=1,\mathrm{\dots},R$ that determine the network dynamics. Recurrent neural networks based on such connectivity directly generate lowdimensional dynamics: for a given stimulus s the trajectory of the dynamics lies in the subspace spanned by the initial state ${\mathbf{\mathbf{r}}}_{0}^{(s)}$ and the set of right connectivity patterns ${\{{\mathbf{\mathbf{u}}}^{(r)}\}}_{r=1,\mathrm{\dots},R}$ (Mastrogiuseppe and Ostojic, 2018; Bondanelli and Ostojic, 2020; Schuessler et al., 2020; Beiran et al., 2020). More specifically, the transient dynamics can be written as (see Materials and methods, Section 'Lowdimensional dynamics')
where $\mathbf{\mathbf{U}}$ and $\mathbf{\mathbf{V}}$ are $N\times R$ matrices that contain respectively the $R$ right and left connectivity vectors as columns, and ${\mathbf{\mathbf{J}}}^{ov}$ is the $R\times R$ matrix of overlaps ${\mathbf{\mathbf{J}}}_{kl}^{ov}={\mathbf{\mathbf{v}}}^{(k)T}{\mathbf{\mathbf{u}}}^{(l)}$ between left and right connectivity vectors. This overlap matrix therefore fully determines the dynamics in the network (see Materials and methods, Section 'Lowdimensional dynamics').
Inspecting the overlap matrix ${\mathbf{\mathbf{J}}}^{ov}$ obtained from the fitted connectivity matrix revealed a clear structure, where left and right vectors were strongly correlated within pairs, but uncorrelated between different pairs (Figure 5B left panel). This structure effectively defined a series of $R/2$ rank2 channels that were orthogonal to each other. Within individual rank2 channels, strong correlations were observed only across the two modes (e.g. between ${\mathbf{\mathbf{v}}}^{(1)}$ and ${\mathbf{\mathbf{u}}}^{(2)}$, ${\mathbf{\mathbf{v}}}^{(2)}$ and ${\mathbf{\mathbf{u}}}^{(1)}$, etc.; Figure 5B), so that the connectivity matrix corresponding to each rank2 channel can be written as
where ${\mathrm{\Delta}}_{1}$ and ${\mathrm{\Delta}}_{2}$ are two scalar values. Rank2 matrices of this type have purely imaginary eigenvalues given by $\pm i\sqrt{{\mathrm{\Delta}}_{1}{\mathrm{\Delta}}_{2}}$, reflecting the strong imaginary components in the eigenspectrum of the full matrix (Figure 4A). These imaginary eigenvalues lead to strongly rotational dynamics in the plane defined by ${\mathbf{\mathbf{v}}}^{(1)}$ and ${\mathbf{\mathbf{v}}}^{(2)}$ (Figure 6C; see Materials and methods, Section 'Dynamics of a lowrank rotational channel'), qualitatively similar to the lowdimensional dynamics in the data (Figure 2B). Such rotational dynamics however do not necessarily imply transient amplification. In fact, a rank2 connectivity matrix as in Equation (5) generates amplified dynamics only if two conditions are satisfied (Figure 6B; see Materials and methods, Section 'Dynamics of a lowrank rotational channel'): (i) the difference ${\mathrm{\Delta}}_{2}{\mathrm{\Delta}}_{1}/2$ is greater than unity; (ii) the initial state ${\mathbf{\mathbf{r}}}_{0}$ overlaps strongly with the left connectivity patterns ${\mathbf{\mathbf{v}}}^{(r)}$. A direct consequence of these constraints is that when dynamics are transiently amplified, the population activity vector at the peak of the transient dynamics lies along a direction that is largely orthogonal to the initial state at stimulus offset (Figure 6C,E, see Materials and methods, Section 'Correlation between initial and peak state').
The theoretical analyses of the model therefore provide a new set of predictions that we directly tested on the connectivity fitted to the data. We first computed the differences ${\mathrm{\Delta}}_{2}{\mathrm{\Delta}}_{1}/2$ for each channel using the SVD of the fitted matrix ${\mathbf{\mathbf{J}}}^{(s)}$, and found that they were sufficiently large to amplify the dynamics within each rank$R$ channel (Figure 5C left panel). We next examined for each stimulus the component of the initial state ${\mathbf{\mathbf{r}}}_{0}$ on the ${\mathbf{\mathbf{v}}}^{(r)}$’s and found that it was significantly larger than the component on the ${\mathbf{\mathbf{v}}}^{(r)}$’s of a random vector, across all stimuli (Figure 5C right panel). Finally, computing the correlation between the peak state and the initial state at the end of stimulus presentation, we found that this correlation took values lower than values predicted by chance for almost all stimuli, consistent with the prediction of the network model (Figure 5D, Figure 5—figure supplement 3). The singlecell model instead predicts stronger correlations between initial and peak states, in clear conflict with the data (Figure 5E).
Orthogonal OFF response trajectories across stimuli
The third feature observed in the data was that population responses evoked by different stimuli were orthogonal for many of the considered stimuli. Orthogonal trajectories for different stimuli can be reproduced in the model under the conditions that (i) initial patterns of activity corresponding to different stimuli are orthogonal, (ii) the connectivity matrix is partitioned into a set of orthogonal lowrank terms, each individually leading to transiently amplified dynamics, and (iii) each stimulus activates one of the lowrank terms. Altogether, for $P$ stimuli leading to orthogonal responses, the connectivity matrix can be partitioned in $P$ different groups of modes:
with
where the vectors ${\mathbf{\mathbf{v}}}^{(r;s)}$ have unit norm. The set of modes indexed by s correspond to the sth stimulus, so that the pattern of initial activity ${\mathbf{\mathbf{r}}}_{0}^{(s)}$ evoked by stimulus s overlaps only with those modes. Moreover, modes corresponding to different groups are orthogonal, and generate dynamics that span orthogonal subspaces (Figure 2D). Each term in Equation (6) can therefore be interpreted as an independent transient coding channel that can be determined from the OFFresponse to stimulus s alone.
We therefore examined whether the fitted connectivity $\mathbf{\mathbf{J}}$ consisted of independent lowrank coding channels (Equation (6)), a constraint that would generate orthogonal responses to different stimuli as observed in the data. If $\mathbf{\mathbf{J}}$ consisted of fully independent channels as postulated in Equation (6), then it could be equivalently estimated by fitting the recurrent model to the OFF responses to each stimulus s independently, leading to one matrix ${\mathbf{\mathbf{J}}}^{(s)}$ for each stimulus. The hypothesis of fully independent channels then implies that (i) the connectivity vectors for the different connectivity matrices are mutually orthogonal, (ii) the full connectivity matrix can be reconstructed by summing the matrices ${\mathbf{\mathbf{J}}}^{(s)}$ estimated for individual stimuli. To test these two predictions, we estimated the connectivity matrices ${\mathbf{\mathbf{J}}}^{(s)}$ (the rank parameter for each stimulus was set to $R=5$). We then computed the overlaps between the connectivity vectors corresponding to different pairs of stimuli (see Materials and methods, Section 'Analysis of the transient channels') and compared them to the overlaps between the subspaces spanned in response to the same pairs of stimuli (see Figure 2D). We found a close match between the two quantities (Figure 7A,B): pairs of stimuli with strong subspace overlaps corresponded to high connectivity overlaps, and pairs of stimuli with low subspace overlaps corresponded to low connectivity overlaps. This indicated that some of the stimuli, but not all, corresponded to orthogonal coding channels. We further reconstructed the OFF responses using the matrix ${\mathbf{\mathbf{J}}}_{\mathrm{Sum}}={\sum}_{s}{\mathbf{\mathbf{J}}}^{(s)}$. We then compared the goodness of fit computed using the matrices ${\mathbf{\mathbf{J}}}_{\mathrm{Sum}}$, ${\mathbf{\mathbf{J}}}_{\mathrm{Full}}$ and multiple controls where the elements of the matrix ${\mathbf{\mathbf{J}}}_{\mathrm{Sum}}$ were randomly shuffled. While the fraction of variance explained when using the matrix ${\mathbf{\mathbf{J}}}_{\mathrm{Sum}}$ was necessarily lower than the one computed using ${\mathbf{\mathbf{J}}}_{\mathrm{Full}}$, the model with connectivity ${\mathbf{\mathbf{J}}}_{\mathrm{Sum}}$ yielded values of the goodness of fit significantly higher than the ones obtained from the shuffles (Figure 7C). These results together indicated that the full matrix ${\mathbf{\mathbf{J}}}_{\mathrm{Full}}$ can indeed be approximated by the sum of the lowrank channels represented by the ${\mathbf{\mathbf{J}}}^{(s)}$.
Amplification of singletrial fluctuations
So far we examined the activity obtained by averaging for each neuron the response to the same stimulus over multiple trials. We next turned to trialtotrial variability of simultaneously recorded neurons and compared the structure of the variability in the data with the network and singlecell models. Specifically, we examined the variance of the activity along a direction ${\mathbf{\mathbf{z}}}_{0}$ of statespace (Figure 8A; Hennequin et al., 2012), defined as:
where $\u27e8\cdot \u27e9$ denotes the averaging across trials. We compared the variance along the direction ${\mathbf{\mathbf{r}}}_{\mathrm{ampl}}$ corresponding to the maximum amplification (distance from baseline) of the trialaveraged dynamics with variance along a random direction ${\mathbf{\mathbf{r}}}_{\mathrm{rand}}$.
Inspecting the variability during the OFFresponses in the AC data revealed two prominent features: (i) fluctuations across trials are amplified along the same direction ${\mathbf{\mathbf{r}}}_{\mathrm{ampl}}$ of statespace as trialaveraged dynamics, but not along random directions (Figure 8B,C,D); (ii) cancelling crosscorrelations by shuffling trial labels independently for each cell strongly reduces the amplification of the variance (Figure 8C,D). This second feature demonstrates that the amplification of variance is due to noisecorrelations across neurons that have been previously reported in the AC (Rothschild et al., 2010). Indeed the variance along a direction ${\mathbf{\mathbf{z}}}_{0}$ at time $t$ can be expressed as
where $\mathbf{\mathbf{C}}(t)$ represents the covariance matrix of the population activity at time $t$. Shuffling trial labels independently for each cell effectively cancels the offdiagonal elements of the covariance matrix and leaves only the diagonal elements that correspond to singleneuron variances. The comparison between simultaneous and shuffled data (Figure 8C,D) demonstrates that an increase in singleneuron variance is not sufficient to explain the total increase in variance along the amplified direction ${\mathbf{\mathbf{r}}}_{\mathrm{ampl}}$.
Simulations of the fitted models and a mathematical analysis show that the network model reproduces both features of trialtotrial variability observed in the data (Figure 8E,F), based on a minimal assumption that variability originates from independent noise in the initial condition at stimulus offset (see Materials and methods, Section 'Structure of singletrial responses in the network model'). Surprisingly, the singlecell model also reproduces the first feature, the amplification of fluctuations along the same direction of statespate as trialaveraged dynamics, although in that model the different neurons are not coupled and noisecorrelations are absent (Figure 8G,H). Instead, the singlecell model fails to capture the second feature, as it produces amplified variability in both simultaneous and shuffled activity. This demonstrates that the amplification of variability in the singlecell model is due to an amplification of singlecell variances, that is, the diagonal elements of covariance matrix (see Materials and methods, Section 'Structure of singletrial responses in the singlecell model'). Somewhat counterintuitively, the disagreement between the singlecell model and the data is not directly due to the lack of noisecorrelations in that model, but due to the fact that the singlecell model does not predict accurately the variance of singleneuron activity during the OFFresponses, and therefore fails to capture variance in shuffled activity.
In summary, the network model accounts better than the singlecell model for the structure of singletrial fluctuations present in the AC data.
Discussion
Adopting a population approach, we showed that strong OFF responses observed in auditory cortical neurons form transiently amplified trajectories that encode individual stimuli within lowdimensional subspaces. A geometrical analysis revealed a clear structure in the relative orientation of these subspaces, where subspaces corresponding to different auditory stimuli were largely orthogonal to each other. We found that a simple, linear singleneuron model captures well the average response to individual stimuli, but not the structure across stimuli and across individual trials. In contrast, we showed that a simple, linear recurrent network model accounts for all the properties of the population OFF responses, notably their global structure across multiple stimuli and the fluctuations across single trials. Our results therefore argue for a potential role of network interactions in shaping OFF responses in AC. Ultimately, future work could combine both singlecell and network mechanisms in a network model with more complex intrinsic properties of individual neurons (Beiran and Ostojic, 2019; Muscinelli et al., 2019).
In this study, we focused specifically on the responses following stimulus offset. Strong transients during stimulus onset display similar transient coding properties (Mazor and Laurent, 2005) and could be generated by the same network mechanism as we propose for the OFF responses. However, during ONtransients, a number of additional mechanisms are likely to play a role, in particular singlecell adaptation, synaptic depression, or feedforward inhibition. Indeed, recent work has shown that ON and OFF trajectories elicited by a stimulus are orthogonal to each other in the statespace (Saha et al., 2017), and this was also the case in our data set (Figure 2—figure supplement 3). Linear network models instead produce ON and OFF responses that are necessarily correlated and cannot account for the observed orthogonality of ON and OFF responses for a given stimulus. Distinct ON and OFF response dynamics could also result from intrinsic nonlinearities known to play a role in the AC (Calhoun and Schreiner, 1998; Rotman et al., 2001; Sahani and Linden, 2003; Machens et al., 2004; Williamson et al., 2016; Deneux et al., 2016).
A major assumption of both the singlecell and network models we considered is that the AC does not receive external inputs after the auditory stimulus is removed. Indeed, the observation that the structure of the dynamics across stimuli could be predicted by the structure of their initial states (Figure 2D,E and Figure 2—figure supplement 4) indicates that autonomous dynamics at least partly shape the OFF responses. The comparisons between data and models moreover show that autonomous dynamics are in principle sufficient to reproduce a number of features of OFF responses in the absence of any external drive. However, neurons in the AC do receive direct afferents from the medial geniculate body of the thalamus, and indirect input from upstream regions of the auditory pathway, where strong OFF responses have been observed. Thus, in principle, OFF responses observed in AC could be at least partly inherited from upstream auditory regions (KoppScheinpflug et al., 2018). Disentangling the contributions of upstream inputs and recurrent dynamics is challenging if one has access only to auditory cortical activity (but see Seely et al., 2016 for an interesting computational approach). Ultimately, the origin of OFF responses in AC needs to be addressed by comparing responses between related areas, an approach recently adopted in the context of motor cortical dynamics (Lara et al., 2018). A direct prediction of our model is that the inactivation of recurrent excitation in auditory cortical areas should weaken OFF responses (Li et al., 2013). However, recurrency in the auditory system is not present only within the cortex but also between different areas along the pathway (Ito and Malmierca, 2018; Winer et al., 1998; Lee et al., 2011). Therefore OFF responses could be generated at a higher level of recurrency and might not be abolished by inactivation of AC.
The dimensionality of the activity in large populations of neurons in the mammalian cortex is currently the subject of debate. A number of studies have found that neural activity explores lowdimensional subspaces during a variety of simple behavioral tasks (Gao et al., 2017). In contrast, a recent study in the visual cortex has shown that the response of large populations of neurons to a large number of visual stimuli is instead highdimensional (Stringer et al., 2019). Our results provide a potential way of reconciling these two sets of observations. We find that the population OFF responses evoked by individual auditory stimuli are typically lowdimensional, but lie in orthogonal spaces, so that the dimensionality of the responses increases when considering an increasing number of stimuli. Note that in contrast to Stringer et al., 2019, we focused here on the temporal dynamics of the population response. The dimensionality of these dynamics is in particular limited by the temporal resolution of the recordings (Figure 2—figure supplement 1).
The analyses we performed in this study were directly inspired by an analogy between OFF responses in the sensory areas and neural activity in the motor cortices (Churchland and Shenoy, 2007; Churchland et al., 2010). Starting at movement onset, singleneuron activity recorded in motor areas exhibits strongly transient and multiphasic firing lasting a few hundreds of milliseconds. Populationlevel dynamics alternate between at least two dimensions, shaping neural trajectories that appear to rotate in the statespace. These results have been interpreted as signatures of an underlying dynamical system implemented by recurrent network dynamics (Churchland et al., 2012; Shenoy et al., 2011), where the population state at movement onset provides the initial condition able to generate the transient dynamics used for movement generation. Computational models have explored this hypothesis (Sussillo et al., 2015; Hennequin et al., 2014; Stroud et al., 2018) and showed that the complex transient dynamics observed in motor cortex can be generated in network models with strong recurrent excitation balanced by finetuned inhibition (Hennequin et al., 2014). Surprisingly, fitting a linear recurrent network model to auditory cortical data, we found that the arrangement of the eigenvalues of the connectivity matrix was qualitatively similar to the spectrum of this class of networks (Hennequin et al., 2014), suggesting that a common mechanism might account for the responses observed in both areas.
The perceptual significance of OFF responses in the auditory pathway is still matter of ongoing research. Singlecell OFF responses observed in the auditory and visual pathways have been postulated to form the basis of duration selectivity (Brand et al., 2000; Alluri et al., 2016; He, 2002; Aubie et al., 2009; Duysens et al., 1996). In the auditory brainstem and cortex, OFF responses of single neurons exhibit tuning in the frequencyintensity domain, and their receptive field has been reported to be complementary to the receptive field of ON responses (Henry, 1985; Scholl et al., 2010). The complementary spatial organization of ON and OFF receptive fields may result from two distinct sets of synaptic inputs to cortical neurons (Scholl et al., 2010), and has been postulated to form the basis for higherorder stimulus features selectivity in single cells, such as frequencymodulated (FM) sounds (Sollini et al., 2018) and amplitudemodulated (AM) sounds (Deneux et al., 2016), both important features of natural sounds (Sollini et al., 2018; Nelken et al., 1999). Complementary tuning has also been observed between cortical ON and OFF responses to binaural localization cues, suggesting OFF responses may contribute to the encoding of soundsource location or motion (Hartley et al., 2011). At the population level, the proposed mechanism for OFF response generation may provide the basis for encoding complex sequences of sounds. Seminal work in the olfactory system has shown that sequences of odors evoked specific transient trajectories that depend on the history of the stimulation (Broome et al., 2006; Buonomano and Maass, 2009). Similarly, within our framework, different combinations of sounds could bring the activity at the end of stimulus offset to different regions of the statespace, setting the initial condition for the subsequent OFF responses. If the initial conditions corresponding to different sequences are associated with distinct transient coding channels, different sound sequences would evoke transient trajectories along distinct dimensions during the OFF responses, therefore supporting the robust encoding of complex sound sequences.
Materials and methods
Data analysis
The data set
Request a detailed protocolNeural recordings
Request a detailed protocolNeural data was recorded and described in previous work (Deneux et al., 2016). We analyzed the activity of 2343 neurons in mouse AC recorded using twophoton calcium imaging while mice were passively listening to different sounds. Each sound was presented 20 times. Data included recordings from three mice across 13 different sessions. Neural recordings in the three mice comprised respectively 1251, 636, and 456 neurons. Recordings in different sessions were performed at different depths, typically with a 50 µm difference (never less than 20 µm). Since the soma diameter is ∼15 µm, this ensured that different cells were recorded in different sessions. We analyzed the trialaveraged activity of a pseudopopulation of neurons built by pooling neural activity across all recording sessions and all animals. The raw calcium traces (imaging done at 31.5 frames per second) were smoothed using a Gaussian kernel with standard deviation $\sigma =32$ ms. We then subtracted the baseline firing activity (i.e. the average neural activity before stimulus presentation) from the activity of each neuron, and used the baselinesubtracted neural activity for the analyses.
The stimuli set
Request a detailed protocolThe stimuli consisted of a randomized presentation of 16 different sounds, 8 UPramping sounds, and 8 DOWNramping sounds. For each type, sounds had different frequency content (either 8 kHz or WN), different durations (1 or 2 s), and different combinations of onset and offset intensity levels (for UPramps either 50–85 dB or 60–85 dB, while for DOWNramps 85–50 dB or 85–60 dB). The descriptions of the stimuli are summarized in Table 1.
Decoding analysis
Request a detailed protocolTo assess the accuracy of stimulus discrimination (8 kHz vs. WN sound) on singletrials, we trained and tested a linear discriminant classifier (Bishop, 2006) using crossvalidation. For each trial, the pseudopopulation activity vectors (built by pooling across sessions and animals) were built at each 50 ms time bin. We used leaveoneout crossvalidation. At each time bin we used 19 out of 20 trials for the training set, and tested the trained decoder on the remaining trial. The classification accuracy was computed as the average fraction of correctly classified stimuli over all 20 crossvalidation folds.
At each time $t$ the decoder for classification between stimuli s_{1} and s_{2} was trained using the trialaveraged pseudopopulation vectors ${\mathbf{\mathbf{c}}}_{1t}$ and ${\mathbf{\mathbf{c}}}_{2t}$. These vectors defined the decoder ${\mathbf{\mathbf{w}}}_{t}$ and the bias term ${\mathbf{\mathbf{b}}}_{t}$ as:
A given population vector $\mathbf{\mathbf{x}}$ was classified as either stimulus s_{1} or stimulus s_{2} according to the value of the function $y(\mathbf{\mathbf{x}})={\mathbf{\mathbf{w}}}_{t}^{T}\mathbf{\mathbf{x}}{\mathbf{\mathbf{b}}}_{t}$:
Random performance was evaluated by training and testing the classifier using crossvalidation on surrogate data sets built by shuffling stimulus singletrial labels at each time bin. We performed the same analysis using more sophisticated decoder algorithms, that is, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and CSVM decoders. For none of these decoder algorithms was the crossvalidated accuracy substantially better than the naive classifier reported in Figure 1C (and for the QDA algorithm the accuracy was substantially worse; not shown).
Principal component analysis
Request a detailed protocolTo perform PCA on the population responses to $C$ stimuli ${s}_{{i}_{1}},\mathrm{\dots},{s}_{{i}_{C}}$ we considered the matrix $\mathbf{\mathbf{X}}\in {\mathbb{R}}^{N\times TC}$, where $N$ is the number of neurons and $T$ is the number of time steps considered. The matrix $\mathbf{\mathbf{X}}$ contained the population OFF responses to the stimuli ${s}_{{i}_{1}},\mathrm{\dots},{s}_{{i}_{C}}$, centered around the mean over times and stimuli. If we denote by ${\lambda}_{i}$ the ith eigenvalue of the covariance matrix $\mathbf{X}{\mathbf{X}}^{T}/(TC)$, the percentage of variance explained by the ith principal component is given by:
while the cumulative percentage of variance explained by the first $M$ principal components (Figure 2A,C) is given by:
In Figure 2A, for each stimulus $s$ we consider the matrix ${\mathbf{\mathbf{X}}}^{(s)}\in {\mathbb{R}}^{N\times T}$ containing the population OFF responses to stimulus $s$. We computed the cumulative variance explained as a function of the number of principal components $M$ and then averaged over all stimuli.
Crossvalidated PCA
Request a detailed protocolWe used crossvalidation to estimate the variance of OFF responses to individual stimuli and across stimuli attributable to the stimulusrelated component, discarding the component due to trialtotrial variability (Figure 2—figure supplement 1). Specifically, we applied to the calcium activity data the method of crossvalidated PCA (cvPCA) developed in Stringer et al., 2019. This method provides an unbiased estimate of the stimulusrelated variance by computing the covariance between a training and a test set of responses (e.g. two different trials or two different sets of trials) to an identical collection of stimuli. Let ${\mathbf{\mathbf{X}}}^{(train)}$ and ${\mathbf{\mathbf{X}}}^{(test)}$ be the $N\times TC$ matrices containing the meancentered responses to the same $C$ stimuli. We consider the training and test responses to be two distinct trials. We first perform ordinary PCA on the training responses ${\mathbf{\mathbf{X}}}^{(train)}$ and find the principal component ${\mathbf{\mathbf{u}}}_{i}^{(train)}$ ($i=1,...,N$). We then evaluate the crossvalidated PCA spectrum $\{{\lambda}_{i}\}$ as:
We repeat the procedure for all pairs of trials $(i,j)$ with $i\ne j$ and average the result over pairs. The crossvalidated cumulative variance is finally computed as in Equation (13).
Correlations between OFF response subspaces
Request a detailed protocolTo quantify the degree of correlation between pairs of OFF responses corresponding to two different stimuli, termed subspace overlap, we computed the cosine of the principal angle between the corresponding lowdimensional subspaces (Figure 2D, Figure 3I). In general, the principal angle ${\theta}_{P}$ between two subspaces $U$ and $V$ corresponds to the largest angle between any two pairs of vectors in $U$ and $V$ respectively, and it is defined by Bjorck and Golub, 1973; Knyazev and Argentati, 2002:
To compute the correlations between the OFF responses to stimuli s_{1} and s_{2} we first identified the first $K=5$ principal components of the response to stimulus s_{1} and organized them in a $N\times K$ matrix $\mathbf{\mathbf{Q}}({s}_{1})$. We repeated this for stimulus s_{2}, which yielded a matrix $\mathbf{\mathbf{Q}}({s}_{2})$. Therefore the columns of $\mathbf{\mathbf{Q}}({s}_{1})$ and $\mathbf{\mathbf{Q}}({s}_{2})$ define the two subspaces on which the responses to stimuli s_{1} and s_{2} live. The cosine of the principal angle between these two subspaces is given by Bjorck and Golub, 1973; Knyazev and Argentati, 2002:
where ${\sigma}_{\mathrm{max}}(\mathbf{\mathbf{A}})$ denotes the largest singular value of a matrix $\mathbf{\mathbf{A}}$. We note that this procedure directly relates to canonical correlation analysis (CCA; see Hotelling, 1936; Uurtio et al., 2018). In particular the first principal angle corresponds to the first canonical weight between the subspaces spanned by the columns of $\mathbf{\mathbf{Q}}({s}_{1})$ and $\mathbf{\mathbf{Q}}({s}_{2})$ (Golub and Zha, 1992; Bjorck and Golub, 1973).
Controls for subspace overlaps and initial statepeak correlations
Request a detailed protocolIn this section we describe the controls that we used to evaluate the statistical significance of the measures of orthogonality between subspaces spanned by neural activity during OFF responses to different stimuli (Figure 2D, Figure 3I), and between initial and peak activity vectors for a single stimulus (Figure 5D). We assessed the significance of the orthogonality between OFF response subspaces across stimuli using two separate controls, which aim at testing for two distinct null hypotheses (Figure 2D, Figure 3I, Figure 2—figure supplement 2). The first hypothesis is that small subspace overlaps (i.e. low correlations) between OFF responses to different stimuli may be due to the high number of dimensions of the statespace in which they are embedded. To test for this hypothesis we compared the subspace overlap computed on the trialaveraged activity with the subspace overlaps computed on the trialaveraged activity where the stimulus labels had been shuffled across trials for each pair of stimuli. We shuffled stimulus labels multiple times, resulting in one value of the subspace overlap for each shuffle. For each pair of stimuli, significance levels were then computed as the fraction of shuffles for which the subspace overlap was lower than the subspace overlap for the real data (lower tail test; Figure 2—figure supplement 2A).
Alternatively, small subspace overlaps could be an artifact of the trialtotrial variability present in the calcium activity data. In fact, for a single stimulus, maximum values of the correlation between two different trials were of the order of 0.2 (Deneux et al., 2016). To test for the possibility that small subspace overlaps may be due to trialtotrial variability, for each pair of stimuli we computed the values of the subspace overlaps by computing the trialaveraged activity on only half of the trials (10 trials), subsampling the set of 10 trials multiple times for each stimulus. This yielded a set of values $\mathrm{cos}{\theta}_{\mathrm{real}}({s}_{1},{s}_{2},n)$, where s_{1} and s_{2} are the specific stimuli considered and $n=1,\mathrm{\dots},{N}_{\mathrm{shuffle}}$. We then computed the subspace overlaps between the trialaveraged responses to the same stimulus, but averaged over two different sets of 10 trials each, over multiple permutations of the trials, resulting in a set of values $\mathrm{cos}{\theta}_{\mathrm{shuffle}}(s,n)$, where $s\in \{{s}_{1},{s}_{2}\}$ is the specific stimulus considered and $n=1,\mathrm{\dots},{N}_{\mathrm{shuffle}}$. For each pair of stimuli s_{1} and s_{2}, significance levels were computed using twotailed ttest and taking the maximum between the pvalues given by $p(\mathrm{cos}{\theta}_{\mathrm{real}}({s}_{1},{s}_{2},:),\mathrm{cos}{\theta}_{\mathrm{shuffle}}({s}_{1},:))$ and $p(\mathrm{cos}{\theta}_{\mathrm{real}}({s}_{1},{s}_{2},:),\mathrm{cos}{\theta}_{\mathrm{shuffle}}({s}_{2},:))$ for those stimulus pairs for which ${\u27e8\mathrm{cos}{\theta}_{\mathrm{real}}({s}_{1},{s}_{2},:)\u27e9}_{n}<{\u27e8\mathrm{cos}{\theta}_{\mathrm{shuffled}}({s}_{1},:)\u27e9}_{n}$ and ${\u27e8\mathrm{cos}{\theta}_{\mathrm{real}}({s}_{1},{s}_{2},:)\u27e9}_{n}<{\u27e8\mathrm{cos}{\theta}_{\mathrm{shuffled}}({s}_{2},:)\u27e9}_{n}$, where the bar symbol indicated the mean over shuffles (Figure 2—figure supplement 2B).
The same null hypotheses have been used to test the significance of the orthogonality between initial state and peak state for individual stimuli (Figure 5D, Figure 5—figure supplement 3). A procedure analogous to the one outlined above was employed. Here, instead of the subspace overlaps, the quantity of interest is the correlation (scalar product) between the initial and peak state. For the first control shuffled data are obtained by shuffling the labels ‘initial state’ and ‘peak state’ across trials. Significance levels were evaluated as outlined above for the first control (Figure 5—figure supplement 3A). To test for the second hypothesis, we computed correlations between activity vectors defined at the same time point, but averaged over two different sets of 10 trials each. For each trial permutations we computed these correlations for all time points belonging to the OFF response (35 time points) and average over time points. Significance level was then assessed as outlined above for the second control (Figure 5—figure supplement 3B).
Singlecell model for OFF response generation
Request a detailed protocolIn the next section we describe the procedure used to fit the singlecell model to the auditory cortical OFF responses. We consider the singlecell model given by Equation (1), where ${r}_{0,i}^{(s)}{L}_{i}(0)$ represents the initial state of the response of unit i to stimulus s. Without loss of generality we assume that the dynamic range of the temporal filters, defined as ${\mathrm{max}}_{t}{L}_{i}(t){\mathrm{min}}_{t}{L}_{i}(t)$, is equal to unity, so that $r}_{0,i}^{(s)$ represents the firing rate range of neuron i for stimulus s. If that was not true, i.e. if the singleneuron responses were given by ${r}_{i}^{(s)}(t)={r}_{0,i}^{(s)}{K}_{i}(t)$ with ${\alpha}_{i}={\mathrm{max}}_{t}{K}_{i}(t){\mathrm{min}}_{t}{K}_{i}(t)\ne 1$ we could always write the responses as ${r}_{i}^{(s)}(t)={\stackrel{~}{r}}_{0,i}^{(s)}{\stackrel{~}{L}}_{i}(t)$, where $\stackrel{~}{r}}_{0,i}^{(s)}={r}_{0,i}^{(s)}{\alpha}_{i$ and ${\stackrel{~}{L}}_{i}(t)={K}_{i}(t)/{\alpha}_{i}$, formally equivalent to Equation (1).
Fitting the singlecell model
Request a detailed protocolTo fit the singlecell OFF responses ${r}_{i}^{(s)}(t)$, we expressed the singlecell responses on a set of basis functions (Pillow et al., 2008).
where the shape and the number of basis function ${N}_{\mathrm{basis}}$ are predetermined. We choose the functions ${f}_{i}(t)$ to be Gaussian functions centered around a value ${\overline{t}}_{i}$ and with a given width $w}_{i$, that is, ${f}_{i}(t)=\mathrm{exp}({(t{\overline{t}}_{i})}^{2}/2{w}_{i}^{2})$. The problem then consists in finding the coefficients ${a}_{ij}^{(s)}$ that best approximate Equation (17). By dividing the left and righthand side of Equation (17) by the firing rate range $r}_{0,i}^{(s)$ we obtain:
In general the coefficients ${b}_{ij}^{(s)}$ could be fitted independently for each stimulus. However, the singlecell model assumes that the coefficients ${b}_{ij}$ do not change across stimuli (see Equation (1)). Therefore, to find the stimulusindependent coefficients ${b}_{ij}$ that best approximate Equation (18) across a given set of stimuli, we minimize the mean squared error given by:
The minimization of the mean squared error can be solved using linear regression techniques. Suppose we want to fit the population responses to $C$ different stimuli simultaneously. Let ${\mathbf{\mathbf{R}}}^{(C)}$ be the matrix of size $N\times TC$ obtained by concatenating the $N\times T$ matrices $({\mathbf{r}}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}}^{(s)}{)}_{it}={r}_{i}^{(s)}(t)/{r}_{0,i}^{(s)}$ ($i=1,\mathrm{\dots},N$, $t=1,\mathrm{\dots},T$, $s=1,\mathrm{\dots},C$) corresponding to the normalized responses to the $C$ stimuli. Let ${\mathbf{\mathbf{F}}}^{(C)}$ be the ${N}_{\mathrm{basis}}\times TC$ matrix obtained by concatenating $C$ times the ${N}_{\mathrm{basis}}\times T$ matrix ${(\mathbf{\mathbf{f}})}_{it}={f}_{i}(t)$. Let $\mathbf{\mathbf{B}}$ be the $N\times {N}_{\mathrm{basis}}$ matrix given by ${\mathbf{\mathbf{B}}}_{ij}={b}_{ij}$. Then Equation (18) can be written as:
which can be solved using linear regression.
In order to avoid normalizing by very small values, we fit only the most responding neurons in the population, as quantified by their firing rate range. We set ${w}_{i}=w=35ms$ for all $i=1,\mathrm{\dots},{N}_{\mathrm{basis}}$, and we set the number of basis functions to ${N}_{\mathrm{basis}}=10$, corresponding to the minimal number of basis functions sufficient to reach the highest crossvalidated goodness of fit.
The network model
Request a detailed protocolWe consider a recurrent network model of $N$ randomly coupled linear rate units. Each unit $i$ is described by the timedependent variable ${r}_{i}(t)$, which represents the difference between the firing rate of neuron $i$ at time $t$ and its baseline firing level. The equation governing the temporal dynamics of the network reads:
where $\tau $ represents the time constant (fixed to unity), and ${J}_{ij}$ is the effective synaptic strength from neuron $j$ to neuron $i$. The system has only one fixed point corresponding to ${r}_{i}=0$ for all $i$. To have stable dynamics, we require that the real part of the eigenvalues of the connectivity matrix $\mathbf{\mathbf{J}}$ is smaller than unity, that is, ${\left(\Re \U0001d522\lambda (\mathbf{\mathbf{J}})\right)}_{\mathrm{max}}<1$. We represent each stimulus as the state of the system reached at the end of stimulus presentation, which we denote by the vector ${\mathbf{\mathbf{r}}}_{0}$. We moreover assume that during the OFF response the network receives no external input. The OFF response to the stimulus associated with the state ${\mathbf{\mathbf{r}}}_{0}$ therefore corresponds to the network dynamics starting from the initial condition ${\mathbf{\mathbf{r}}}_{0}$.
Fitting the network model
Request a detailed protocolIn this section we describe the procedure we used for fitting a linear recurrent network model (Equation (21)) to the auditory cortical data using ridge and reducedrank ridge regression. To fit the network model to the responses to $C$ stimuli, we first concatenate the $C$ matrices ${\mathbf{\mathbf{X}}}^{(s)}\in {\mathbb{R}}^{T\times D}$ (where ${\mathbf{\mathbf{X}}}_{t,i}^{(s)}={r}_{i}^{(s)}(t)$, with $s=1,\mathrm{\dots},C$ and $i=1,\mathrm{\dots},D$) containing the neural responses to stimulus s across $T$ timesteps and $D$ neurons (or PC dimensions) along the first dimension, obtaining a matrix $\mathbf{\mathbf{X}}\in {\mathbb{R}}^{CT\times D}$ contains the neural activity across $C$ stimuli, $T$ time steps, and $D$ neurons or PC dimensions. We then fit the network model $\dot{\mathbf{\mathbf{X}}}=\mathbf{\mathbf{X}}(\mathbf{\mathbf{J}}\mathbf{\mathbf{I}})$ by first computing from the data the velocity of the trajectory as
Both ridge and reducedrank ridge regression aim at minimizing the mean squared error ${\dot{\mathbf{\mathbf{X}}}\mathbf{\mathbf{X}}(\mathbf{\mathbf{J}}\mathbf{\mathbf{I}})}^{2}$ subject to different constraints, which are described below. Since ridge regression involves computationally expensive matrix inversion, we first reduced the dimensionality of the original data set by using PCA. Unless otherwise stated, we kept a number of PC components equal to K = 100 for the fit of individual stimuli (Figure 3G,H, Figure 5A–C, Figure 8, Figure 7) and for the fit of all stimuli together (Figure 3I, Figure 4, Figure 7C, Figure 3—figure supplement 2, Figure 3—figure supplement 3, Figure 5—figure supplement 1; K = 100 principal components sufficed to explain more than 90% of the variance of the responses to all stimuli together). As a result, the data matrix $\mathbf{\mathbf{X}}\in {\mathbb{R}}^{TC\times K}$ contains the activity across $T$ time bins and across all $K$ dimensions for a number $C$ of stimuli.
Ridge regression
Request a detailed protocolRidge regression aims at determining the connectivity matrix $\mathbf{\mathbf{J}}$ that minimizes the mean squared error ${\dot{\mathbf{\mathbf{X}}}\mathbf{\mathbf{X}}(\mathbf{\mathbf{J}}\mathbf{\mathbf{I}})}^{2}$ with a penalty for large entries of the matrix $\mathbf{\mathbf{J}}$, so that the cost function to minimize is given by ${\dot{\mathbf{\mathbf{X}}}\mathbf{\mathbf{X}}(\mathbf{\mathbf{J}}\mathbf{\mathbf{I}})}^{2}+\lambda {(\mathbf{\mathbf{J}}\mathbf{\mathbf{I}})}^{2}$, where $\lambda $ is a hyperparameter of the model. We can write the ridge regression optimization problem as:
where we defined ${\dot{\mathbf{\mathbf{X}}}}_{\lambda}=(\dot{\mathbf{\mathbf{X}}},\mathrm{})$ and ${\mathbf{\mathbf{X}}}_{\lambda}=(\mathbf{\mathbf{X}},\sqrt{\lambda}\mathbf{\mathbf{I}})$. The solution to Equation (23) is given by:
This procedure was used for fits shown in Figure 3, Figure 3—figure supplement 1, Figure 3—figure supplement 3, and Figure 4.
Reducedrank ridge regression
Request a detailed protocolReducedrank regression aims at minimizing the mean squared error ${\dot{\mathbf{\mathbf{X}}}\mathbf{\mathbf{X}}(\mathbf{\mathbf{J}}\mathbf{\mathbf{I}})}^{2}$ under a rank constraint on the matrix $\mathbf{\mathbf{J}}$, that is, $\mathrm{rank}\mathbf{\mathbf{J}}\le R$, where $R$ is a hyperparameter of the model (Izenman, 1975; Davies and Tso, 1982). Here we combined reducedrank and ridge regression in the same framework. The reducedrank ridge regression optimization problem with hyperparameters $R$ and $\lambda $ can be therefore written as (Mukherjee et al., 2015):
To solve Equation (25) we consider the solution to the ridge regression problem given by Equation (24). If the matrix ${\mathbf{\mathbf{X}}}_{\lambda}{\mathbf{\mathbf{J}}}_{\lambda}^{*}$ has SVD given by ${\mathbf{\mathbf{X}}}_{\lambda}{\mathbf{\mathbf{J}}}_{\lambda}^{*}=\mathbf{\mathbf{U}}\mathrm{\Sigma}{\mathbf{\mathbf{V}}}^{T}$, then it can be shown that the solution to the reducedrank ridge regression problem given by Equation (25) can be written as:
We note that each term in the sum of Equation (26) has unit rank, so that the resulting matrix ${\mathbf{\mathbf{J}}}_{R,\lambda}^{*}$ has rank equal to or less than $R$.
This procedure was used for fits shown in Figure 5, Figure 7, Figure 8, Figure 3—figure supplement 2, Figure 5—figure supplement 1, and Figure 5—figure supplement 2.
Selection of hyperparameters for ridge regression
Request a detailed protocolTo select the value of the hyperparameter $\lambda $, we fitted the network model to the data and computed the goodness of fit as the coefficient of determination ${R}^{2}(\lambda )$ using crossvalidation. We then selected the value of $\lambda $ which maximized the goodness of fit.
Selection of hyperparameters for reducedrank ridge regression
Request a detailed protocolTo select the values of the hyperparameters $\lambda $ and $R$, we fitted the network model to the data with hyperparameters $(\lambda ,R)$ and computed the goodness of fit as the coefficient of determination ${R}^{2}(\lambda ,R)$ using crossvalidation. We repeated the process for a range of values of $(\lambda ,R)$. We observed that, independent of the value of $\lambda $, the goodness of fit as a function of the rank $R$ saturates at a particular value of the rank ${R}^{*}$, but does not exhibit a clear maximum. We took the value ${R}^{*}$ as the minimal rank hyperparameter, while we defined the best ridge parameter as ${\lambda}^{*}={\mathrm{argmax}}_{\lambda}{R}^{2}(\lambda ,{R}^{*})$ (Figure 5—figure supplement 1).
For both ridge and reducedrank ridge regression, we used $K$fold crossvalidation, with $K=10$. When fitting multiple stimuli at once, for each stimulus we partitioned the temporal activity into $K$ chunks, resulting in a total of $KC$ chunks. At the $i$th iteration of the crossvalidation procedure, we leave out the $i$th partition for each stimulus to construct the training set (consisting of $(K1)C$ chunks) and test the trained model on the remaining $C$ folds.
In Figure 4A,B, the temporal window considered for the fit was extended from 350 ms to 600 ms to include the decay to baseline of the OFF responses, thus obtaining stable eigenvalues. The extension of the temporal window was possible only for the 1 s long stimuli ($n=8$), since the length of the temporal window following stimulus offset for the 2 s stimuli was limited to approximately 380 ms by the length of the neural recordings.
Control data sets
Request a detailed protocolTo evaluate whether the fitted network model captured nontrivial collective dynamics of auditory cortical OFF responses, we followed the approach introduced in Elsayed and Cunningham, 2017. We first computed the goodness of fit (as quantified by the coefficient of determination ${R}^{2}$), then repeated the model fitting on control data sets which kept no additional structure than the one defined by correlations across time, neurons and stimuli of the original neural responses. We found that the goodness of fit obtained from fitting the model to the original data set was significantly higher than the one obtained from fitting the control data sets (Figure 3—figure supplement 2, Figure 3—figure supplement 3), confirming that the recurrent network model captured nontrivial collective dynamics in the data beyond the correlations across time, neurons and stimuli.
We generated the control data sets using a recent method based on a maximum entropy model (Savin and Tkačik, 2017) and described in Elsayed and Cunningham, 2017. This method, termed Tensor Maximum Entropy, allowed us to build surrogate data sets that are maximally random (entropy maximization), but constrained in a way that their marginal means and covariances are equal to the means and covariances of the original data set.
Marginal means and covariances
Request a detailed protocolLet the temporal activity along all $K$ dimensions for all $C$ stimuli be organized in a tensor $\mathbf{\mathbf{Z}}\in {\mathbb{R}}^{T\times K\times C}$. The mean tensor $\mathbf{\mathbf{M}}$ is defined as the tensor that makes all the marginal means of $\mathbf{\mathbf{Z}}$ vanish. Specifically, if $\overline{\mathbf{\mathbf{Z}}}=\mathbf{\mathbf{Z}}\mathbf{\mathbf{M}}$, the tensor $\mathbf{\mathbf{M}}$ is such that:
The marginal covariances of the tensor $\overline{\mathbf{\mathbf{Z}}}$ across times, neural dimensions, and stimuli are therefore defined as:
Tensor maximum entropy method
Request a detailed protocolThe method generates the desired number of surrogate data sets ${\mathbf{\mathbf{S}}}^{(i)}\in {\mathbb{R}}^{T\times K\times C}$. Each of these surrogates is randomly drawn from a probability distribution that assumes a priori no structure apart from that expected from the marginal means and covariances of the original data. Let µ, ${\mathrm{\Lambda}}^{T}$, ${\mathrm{\Lambda}}^{K}$, and ${\mathrm{\Lambda}}^{C}$ be the marginal means and covariances of surrogate $\mathbf{\mathbf{S}}$. The method computes the probability $P(\mathbf{\mathbf{S}})$ over the surrogates that maximizes the entropy function
subject to the constraints
where ${\mathbb{E}}_{P}[\cdot ]$ denotes the expectation over the probability density $P$. We used three types of surrogate data sets, denoted as T, TK, and TKC. All the three types of surrogates obey to the first constraint in Equation (30) on the marginal means. In addition, surrogates of type T obey the constraint on the time covariances, surrogates of type TK on time and dimension covariance, while surrogates TKC obey all the three covariance constraints.
Analysis of the transient channels
Request a detailed protocolIn this section, we provide details on the analysis of the structure of the transient channels obtained from fitting the network model to OFF responses to individual stimuli using reducerank regression (Figure 7, see Materials and methods, Section 'Fitting the network model'). Let ${\mathbf{\mathbf{J}}}_{\mathrm{Full}}$ be the connectivity matrix resulting from fitting the network model to the responses to all stimuli at once, and ${\mathbf{\mathbf{J}}}^{(s)}$ the connectivity obtained from fitting the model to the response to stimulus s. Before fitting both matrices ${\mathbf{\mathbf{J}}}_{\mathrm{Full}}$ and ${\mathbf{\mathbf{J}}}^{(s)}$, we reduced the dimensionality of the responses to the first 100 principal components. We set the value of the rank parameter to $R=70$ and $R=5$ respectively for ${\mathbf{\mathbf{J}}}_{\mathrm{Full}}$ and ${\mathbf{\mathbf{J}}}^{(s)}$ (Figure 5). Using the SVD, we can write the matrices ${\mathbf{\mathbf{J}}}^{(s)}$ as:
where ${\widehat{\mathbf{\mathbf{U}}}}^{(s)}$ and ${\mathbf{\mathbf{V}}}^{(s)}$ contain respectively the ${R}_{s}$ right and left connectivity vectors as columns (see Equation (6)), while ${\mathrm{\Sigma}}^{(s)}$ contains the norms of ${\mathbf{\mathbf{U}}}^{(s)}$ on the diagonal (see Equation (6)). The transient dynamics elicited by stimulus s has a strong component along the statespace dimensions specified by the right connectivity vectors ${\mathbf{\mathbf{u}}}^{(r;s)}$ (columns of ${\mathbf{\mathbf{U}}}^{(s)}$; see also Equation (43)). We therefore define the overlap between the transient channels (connectivity overlaps) corresponding to pairs of stimuli s_{1} and s_{2} (Figure 7A) as the principal angle between the subspaces defined by ${\widehat{\mathbf{\mathbf{U}}}}^{({s}_{1})}$ and ${\widehat{\mathbf{\mathbf{U}}}}^{({s}_{2})}$ (see Equations (15 and 16)). We show that the structure of the connectivity overlaps (Figure 7A) matched well with the structure of the subspace overlaps (Figure 2D) across pairs of stimuli in Figure 7B.
To test whether the connectivity fitted on all stimuli at once ${\mathbf{\mathbf{J}}}_{\mathrm{Full}}$ consisted of the sum of lowrank transient coding channels, we defined the matrix ${\mathbf{\mathbf{J}}}_{\mathrm{Sum}}$ as the sum of the individual transient channels for all stimuli (see Equation 6):
We then compared the crossvalidated goodness of fit of the population OFF responses using the full matrix ${\mathbf{\mathbf{J}}}_{\mathrm{Full}}$ and the matrix ${\mathbf{\mathbf{J}}}_{\mathrm{Sum}}$ (Figure 7C).
In Figure 7, when fitting individual stimuli, the ridge and rank hyperparameters have been set respectively to $\lambda =5$, $R=5$, which correspond to the values that maximize the agreement (coefficient of determination) between the connectivity overlaps (Figure 7A) and the subspace overlaps (Figure 2D). For these values the coefficient of determination between the transient channel overlaps and the subspace overlaps is >0.7.
Analysis of the network model
Request a detailed protocolIn this section we provide a detailed mathematical analysis of the network model given by Equation (21)
Specifically, we derive the conditions on the network components (i.e. connectivity spectra, connectivity vectors, and initial conditions) to produce lowdimensional, transiently amplified OFF responses in networks with lowrank connectivity. We then focus on the specific case of rotational channels in the connectivity.
Normal and nonnormal connectivity matrices
Request a detailed protocolIn this section, we summarize the relationship between amplified OFF responses and nonnormal connectivity matrices in linear recurrent networks previously reported in Trefethen and Embree, 2005; Hennequin et al., 2012; Bondanelli and Ostojic, 2020. To characterize the amplification of OFF responses we focus on the temporal dynamics of the distance from baseline, defined as the norm of the population activity vector $\mathbf{\mathbf{r}}(t)$ (Hennequin et al., 2014). The network generates an amplified OFF response to the stimulus associated with the initial condition ${\mathbf{\mathbf{r}}}_{0}$ when the value of $\mathbf{\mathbf{r}}(t)$ transiently increases before decaying to its asymptotic value $\mathbf{\mathbf{r}}(t\to \mathrm{\infty})=0$. Note that having a transiently increasing value of the distance from baseline implies that the OFF response ${r}_{i}(t)$ of at least one unit displays nonmonotonic temporal dynamics. Importantly, the transient behavior of $\mathbf{\mathbf{r}}(t)$ depends on the stimulus through ${\mathbf{\mathbf{r}}}_{0}$, and on the properties of the connectivity matrix $\mathbf{\mathbf{J}}$, in particular on the relationship between its eigenvectors (Trefethen and Embree, 2005).
Connectivity matrices for which the eigenvectors are orthogonal to each other are called normal matrices and they are formally defined as matrices $\mathbf{\mathbf{J}}$ that satisfy ${\mathbf{\mathbf{J}\mathbf{J}}}^{T}={\mathbf{\mathbf{J}}}^{T}\mathbf{\mathbf{J}}$. In networks with normal connectivity, any stimulus ${\mathbf{\mathbf{r}}}_{0}$ evokes an OFF response for which the distance from baseline decays monotonically to zero. Such networks thus cannot produce amplified OFF responses, as defined by a transiently increasing $\mathbf{\mathbf{r}}(t)$. Note that any symmetric matrix is normal.
On the other hand, connectivity matrices for which some eigenvectors are not mutually orthogonal are called nonnormal (Trefethen and Embree, 2005), and they consist of all connectivity $\mathbf{\mathbf{J}}$ for which ${\mathbf{\mathbf{J}\mathbf{J}}}^{T}\ne {\mathbf{\mathbf{J}}}^{T}\mathbf{\mathbf{J}}$. It is well known that nonnormal networks can lead to transiently increasing values of $\mathbf{\mathbf{r}}(t)$, therefore producing amplified OFF responses. However, the nonnormality of the network connectivity $\mathbf{\mathbf{J}}$ constitutes only a necessary, but not a sufficient condition for the generation of amplified responses.
Sufficient condition for amplified OFF responses
Request a detailed protocolIn this section we identify the necessary and sufficient condition for the generation of amplified OFF responses in linear recurrent networks. To find the necessary and sufficient condition for amplified responses, we start by writing the differential equation for the dynamics of the distance from baseline as (Neubert and Caswell, 1997; Bondanelli and Ostojic, 2020):
where ${\mathbf{\mathbf{J}}}_{S}$ denotes the symmetric part of the connectivity $\mathbf{\mathbf{J}}$. A linear recurrent network exhibits amplified responses when the rate of change of the distance from baseline, $\mathrm{d}\mathbf{\mathbf{r}}/\mathrm{d}t$, takes positive values at time $t=0$. The righthand side of Equation (33) takes its largest value when the initial condition ${\mathbf{\mathbf{r}}}_{0}$ is aligned with the eigenvector of ${\mathbf{\mathbf{J}}}_{S}$ associated with the largest eigenvalue ${\lambda}_{\mathrm{max}}({\mathbf{\mathbf{J}}}_{S})$. In this case, the rate of change of the distance from baseline at time $t=0$ takes the value ${\lambda}_{\mathrm{max}}({\mathbf{\mathbf{J}}}_{S})1$. From Equation (33) it can be shown that the necessary and sufficient condition for the generation of amplified responses in a recurrent networks with connectivity $\mathbf{\mathbf{J}}$ is given by
This criterion defines two classes of networks based on the properties of the connectivity matrix: networks in which amplified responses cannot be evoked by any stimulus, and networks able to generate amplified responses to at least one stimulus.
Lowrank networks
Request a detailed protocolIn the following section we examine OFF dynamics in networks with lowrank connectivity of the form given by Equation (3):
We first show that such connectivity directly constraints the network dynamics to a lowdimensional subspace. We then derive the conditions for the stability and amplification of OFF responses. Finally we apply these results to the specific case of lowrank rotational channels as in Equation (5).
Lowdimensional dynamics
Request a detailed protocolHere we study the dynamics of the population activity vector for a unitrank network ($R=1$) and for a general network with rank$R$ connectivity structure. We consider lowrank networks in which the initial state is set to $\mathbf{\mathbf{r}}(0)={\mathbf{\mathbf{r}}}_{0}$ and no external input acts on the system at later times $t>0$. By construction, the autonomous dynamics generated by lowrank networks are constrained by the rank $R$ of the connectivity matrix and are therefore lowdimensional when $R\ll N$.
We first illustrate the linear dynamics in the case of a unitrank connectivity ($R=1$), given by
with ${\mathbf{\mathbf{v}}}^{(1)T}$ of unit norm. The vectors ${\mathbf{\mathbf{u}}}^{(1)}$ and ${\mathbf{\mathbf{v}}}^{(1)}$ are respectively the right and left eigenvectors of $\mathbf{\mathbf{J}}$, and in the following analysis we refer to them respectively as the right and left connectivity vectors. In this case, the dynamics following the initial state ${\mathbf{\mathbf{r}}}_{0}$ can be explicitly computed as the product between the timedependent propagator of the dynamics, given by ${\mathbf{\mathbf{P}}}_{t}=\mathrm{exp}(t(\mathbf{\mathbf{J}}\mathbf{\mathbf{I}}))$, and the initial condition ${\mathbf{\mathbf{r}}}_{0}$ (Arnold, 1973; Bondanelli and Ostojic, 2020):
By expanding the exponential matrix $\mathrm{exp}(t(\mathbf{\mathbf{J}}\mathbf{\mathbf{I}}))$ in power series, we can write the propagator for the unitrank dynamics as:
where $\lambda $ is the only nonzero eigenvalue of $\mathbf{\mathbf{J}}$, given by
As a result, the full dynamics in the case of rank1 connectivity structure can be written as:
where
Since ${\mathbf{\mathbf{u}}}^{(1)}$ is the right eigenvector of $\mathbf{\mathbf{J}}$ corresponding to $\lambda $, from Equation (40) we note that, when ${\mathbf{\mathbf{r}}}_{0}$ is fully aligned with ${\mathbf{\mathbf{u}}}^{(1)}$ the dynamics are onedimensional and exhibit a monotonic decay along the same dimension. Instead, when the initial state is not fully aligned with ${\mathbf{\mathbf{u}}}^{(1)}$, the dynamics are confined to the plane defined by ${\mathbf{\mathbf{r}}}_{0}$ and ${\mathbf{\mathbf{u}}}^{(1)}$. In this case, while the component of the dynamics along ${\mathbf{\mathbf{r}}}_{0}$ decays exponentially as a function of time, the component along the direction of ${\mathbf{\mathbf{u}}}^{(1)}$ increases initially in proportion to the value of its norm, ${\mathbf{\mathbf{u}}}^{(1)}$ (since at time $t=0$, $\mathrm{d}{e}^{t}a(t)/\mathrm{d}t={\mathbf{\mathbf{u}}}^{(1)}$). Eventually, the activity decays to its asymptotic value given by $\mathbf{\mathbf{r}}(t\to \mathrm{\infty})=\mathrm{}$. Therefore in a unitrank network the dynamics draw a neural trajectory that explores at most twodimensions in the state space.
These observations can be extended to the general case of rank$R$ connectivity matrices. For simplicity we rewrite Equation (3) as
where the matrices $\mathbf{\mathbf{U}}$ and $\mathbf{\mathbf{V}}$ contain respectively the $R$ right and left connectivity vectors as columns. Writing the connectivity matrix in this form is always possible by applying the SVD on $\mathbf{\mathbf{J}}$. The SVD allows us to write the connectivity matrix as $\mathbf{\mathbf{J}}=\widehat{\mathbf{\mathbf{U}}}{\mathbf{\mathbf{S}\mathbf{V}}}^{T}$, where $\mathbf{\mathbf{U}}=\widehat{\mathbf{\mathbf{U}}}\mathbf{\mathbf{S}}$, and ${\widehat{\mathbf{\mathbf{U}}}}^{T}\widehat{\mathbf{\mathbf{U}}}={\mathbf{\mathbf{V}}}^{T}\mathbf{\mathbf{V}}=\mathbf{\mathbf{I}}$. In particular, this implies the norm of each left connectivity vector ${\mathbf{\mathbf{v}}}^{(r)}$ is unity, while the right connectivity vectors ${\mathbf{\mathbf{u}}}^{(r)}$ are not normalized.
Following steps similar to Equation (38), the linear dynamics evoked by the initial state ${\mathbf{\mathbf{r}}}_{0}$ can be written as
where we defined the $R$dimensional column vector
in analogy with Equation (40). Therefore, in the case of rank$R$ connectivity matrix, the dynamics evolve in a $(R+1)$dimensional space determined by the initial state ${\mathbf{\mathbf{r}}}_{0}$ and the $R$ right connectivity vectors ${\mathbf{\mathbf{u}}}^{(r)}$ (columns of $\mathbf{\mathbf{U}}$). Equation (44) shows that the dynamics of a rank$R$ system are determined by the matrix of scalar products between left and right connectivity vectors, which we refer to as the overlap matrix (Schuessler et al., 2020; Beiran et al., 2020)
We conclude that lowrank connectivity matrices of the form given by Equation (3) with $R\ll N$ generate lowdimensional dynamics that explore at most $R+1$ dimensions.
Conditions for stability and amplification of OFF responses in lowrank networks
Request a detailed protocolIn this paragraph we examine the conditions required to generate stable amplified dynamics in networks with lowrank connectivity $\mathbf{\mathbf{J}}$ and initial state ${\mathbf{\mathbf{r}}}_{0}$. Specifically, two sets of conditions need to be satisfied. The first set of conditions is directly derived by applying the general criterion for stable amplified dynamics given by Equation (34) to lowrank networks, which effectively constrains the connectivity $\mathbf{\mathbf{J}}$ through the relative arrangement of the connectivity vectors ${\mathbf{\mathbf{u}}}^{(r)}$ and ${\mathbf{\mathbf{v}}}^{(r)}$ ($r=1,\mathrm{\dots},R$). When this criterion is satisfied, amplified dynamics can be generated only if the initial condition ${\mathbf{\mathbf{r}}}_{0}$ is aligned with specific directions in the statespace. We thus examine a second set of conditions on the initial state ${\mathbf{\mathbf{r}}}_{0}$ for which amplified trajectories can be evoked, and express these conditions in terms of relationship between ${\mathbf{\mathbf{r}}}_{0}$ and the modes ${\mathbf{\mathbf{u}}}^{(r)}$${\mathbf{\mathbf{v}}}^{(r)}$. As before, without loss of generality, we assume that the norm of the vectors ${\mathbf{\mathbf{v}}}^{(r)}$ is equal to one, while the norm of the vectors ${\mathbf{\mathbf{u}}}^{(r)}$ can vary.
Conditions on the modes ${\mathbf{\mathbf{u}}}^{(r)}$${\mathbf{\mathbf{v}}}^{(r)}$
Request a detailed protocolWe first consider a network with unitrank connectivity $\mathbf{\mathbf{J}}={\mathbf{\mathbf{u}}}^{(1)}{\mathbf{\mathbf{v}}}^{(1)T}$, with ${\mathbf{\mathbf{v}}}^{(1)}$ of unit norm. In unitrank networks, the dynamics is stable only if the nonzero eigenvalue $\lambda $ is smaller than one. From Equation (39) this yields the stability condition
The condition for the generation of amplified responses given by Equation (34) can be derived in terms of ${\mathbf{\mathbf{u}}}^{(1)}$ and ${\mathbf{\mathbf{v}}}^{(1)}$ by computing the eigenvalues of the symmetric part of the connectivity, ${\mathbf{\mathbf{J}}}_{S}=({\mathbf{\mathbf{u}}}^{(1)}{\mathbf{\mathbf{v}}}^{(1)T}+{\mathbf{\mathbf{v}}}^{(1)}{\mathbf{\mathbf{u}}}^{(1)T})/2$. The matrix ${\mathbf{\mathbf{J}}}_{S}$ is of rank two, and has in general two nonzero eigenvalues, given by:
Therefore the condition for amplified OFF responses Equation (34) can be written in terms of the length of the vector ${\mathbf{\mathbf{u}}}^{(1)}$ as
Previous work has shown that amplification in a unitrank network can take arbitrarily large values only if $0\le \mathrm{cos}\theta <1/{\mathbf{\mathbf{u}}}^{(1)}$ (Bondanelli and Ostojic, 2020). As a result, stable and unbounded amplified dynamics can be generated in a unitrank network if and only if the norm of ${\mathbf{\mathbf{u}}}^{(1)}$ is sufficiently large (Equation (48)) and the correlation between the connectivity vectors is positive and sufficiently small (Equation (46)).
The conditions for stability and amplification derived for a rank1 network can be implicitly generalized to a rank$R$ network model, using the fact that the nonzero eigenvalues $\lambda $ of the lowrank connectivity matrix $\mathbf{\mathbf{J}}={\mathbf{\mathbf{U}\mathbf{V}}}^{T}$ (where $\mathbf{\mathbf{J}}$ is a $N\times N$ matrix) are equal to the nonzero eigenvalues of its overlap matrix ${\mathbf{\mathbf{J}}}^{ov}={\mathbf{\mathbf{V}}}^{T}\mathbf{\mathbf{U}}$ (where ${\mathbf{\mathbf{J}}}^{ov}$ is a $R\times R$ matrix) (Nakatsukasa, 2019):
Therefore, the condition for stability can be implicitly written in terms of the nonzero eigenvalues $\lambda $ of ${\mathbf{\mathbf{J}}}^{ov}$ as:
To derive the conditions for transient amplification, we need that the eigenvalues of the symmetric part of the connectivity be larger than unity, where the symmetric part ${\mathbf{\mathbf{J}}}_{S}$ is a matrix of rank $2R$ given by:
Using Equation (49) (Nakatsukasa, 2019), we can write the nonzero eigenvalues ${\lambda}_{S}$ of ${\mathbf{\mathbf{J}}}_{S}$ as:
where we used the fact that ${\mathbf{\mathbf{V}}}^{T}\mathbf{\mathbf{V}}=\mathbf{\mathbf{I}}$, and defined ${\mathbf{\mathbf{S}}}^{2}=\mathrm{diag}({\mathbf{\mathbf{u}}}^{(1)},{\mathbf{\mathbf{u}}}^{(2)},\mathrm{\dots})$. By definition, the eigenvalues of the symmetric part multiplied by 2, that is, $2{\lambda}_{S}$, should satisfy:
where we used the expression of the determinant of a $2\times 2$ block matrix (Horn and Johnson, 2012). Since for two square matrices $\mathbf{\mathbf{A}}$ and $\mathbf{\mathbf{B}}$ we have that $det(\mathbf{\mathbf{A}\mathbf{B}})=det(\mathbf{\mathbf{A}})det(\mathbf{\mathbf{B}})$, $det({\mathbf{\mathbf{A}}}^{T})=det(\mathbf{\mathbf{A}})$, and $det({\mathbf{\mathbf{A}}}^{1})=1/det(\mathbf{\mathbf{A}})$, we can write:
Thus, the condition for amplification becomes:
Thus, in the general case of a rank$R$ network, the condition for stable and amplified dynamics is given by Equations (50) and (55).
Dynamics of a lowrank rotational channel
Request a detailed protocolIn this section we describe the structure of the lowrank connectivity obtained by fitting the network model to the OFF responses to individual stimuli (Figure 5B,C) and analyze the resulting dynamics. When fitting a lowrank network model to the OFF responses to individual stimuli, we observed a specific structure in the pattern of correlations between right and left connectivity vectors ${\mathbf{\mathbf{u}}}^{(i)}$ and ${\mathbf{\mathbf{v}}}^{(j)}$ of the fitted connectivity $\mathbf{\mathbf{J}}$. This pattern exhibited low values of the correlation for almost all pairs of connectivity vectors, except for pairs of left and right vectors coupled across nearby modes, e.g. ${\mathbf{\mathbf{u}}}^{(1)}$${\mathbf{\mathbf{v}}}^{(2)}$, ${\mathbf{\mathbf{u}}}^{(2)}$${\mathbf{\mathbf{v}}}^{(1)}$, ${\mathbf{\mathbf{u}}}^{(3)}$${\mathbf{\mathbf{v}}}^{(4)}$, ${\mathbf{\mathbf{u}}}^{(4)}$${\mathbf{\mathbf{v}}}^{(3)}$, and so forth. This structures gives rise to independent rank2 channels grouping pairs of modes of the form ${\mathbf{\mathbf{J}}}_{1}={\mathbf{\mathbf{u}}}^{(1)}{\mathbf{\mathbf{v}}}^{(1)T}+{\mathbf{\mathbf{u}}}^{(2)}{\mathbf{\mathbf{v}}}^{(2)T}$. Within a single channel, the values of the correlation was high, and opposite in sign, for different pairs of vectors, that is, ${\widehat{\mathbf{\mathbf{u}}}}^{(1)}\cdot {\mathbf{\mathbf{v}}}^{(2)}\approx 1$ and ${\widehat{\mathbf{\mathbf{u}}}}^{(2)}\cdot {\mathbf{\mathbf{v}}}^{(1)}\approx 1$. As a result, each of these channels can be cast in the form:
where we set
with ${\mathrm{\Delta}}_{1}$, ${\mathrm{\Delta}}_{2}$ two positive scalars, and the vectors $\mathbf{\mathbf{v}}$’s are of unit norm. Scheme 1 illustrates the structure of the rank2 channel in terms of the left and right connectivity vectors, and the dynamics evoked by an initial condition along ${\mathbf{\mathbf{v}}}^{(1)}$.
For a rank2 connectivity as in Equation (5), the 2x2 overlap matrix ${\mathbf{\mathbf{J}}}^{ov}$ is therefore given by
Note that the overlap matrix also corresponds to the connectivity matrix ${\mathbf{\mathbf{J}}}_{2}$ projected on the basis of the left connectivity vectors $[{\mathbf{\mathbf{v}}}^{(1)},{\mathbf{\mathbf{v}}}^{(2)}]$ (third equality in Equation (58)).
To derive the necessary and sufficient conditions for the channel to exhibit stable and amplified dynamics in response to at least one initial condition ${\mathbf{\mathbf{r}}}_{0}$, we need to compute the nonzero eigenvalues of the connectivity matrix and of its symmetric part (equal to the eigenvalues of ${\stackrel{~}{\mathbf{\mathbf{J}}}}_{2}$ and of its symmetric part). The eigenvalues of ${\stackrel{~}{\mathbf{\mathbf{J}}}}_{2}$ are purely imaginary for all values of ${\mathrm{\Delta}}_{1}$ and ${\mathrm{\Delta}}_{2}$ and are given by ${\lambda}_{1,2}=\pm i\sqrt{{\mathrm{\Delta}}_{1}{\mathrm{\Delta}}_{2}}\equiv \pm i\omega $. The dynamics are therefore always stable. Next, we compute the eigenvalues of the symmetric part of the connectivity, which reads:
The eigenvalues of ${\stackrel{~}{\mathbf{\mathbf{J}}}}_{2,S}$ are $\pm {\mathrm{\Delta}}_{2}{\mathrm{\Delta}}_{1}/2$, so that the dynamics are amplified when
We next derive the full dynamics of the rank2 connectivity matrix ${\mathbf{\mathbf{J}}}_{2}$ in Equation (56). To this end, we use the general expression of the propagator for a rankR network given by Equations (43),(44) in terms of the overlap matrix ${\mathbf{\mathbf{J}}}^{ov}$ (Equation (58)). To compute the inverse and the exponential of the overlap matrix, we start by diagonalizing ${\mathbf{\mathbf{J}}}^{ov}$. Its eigenvalues are given by ${\lambda}_{1,2}=\pm i\omega $, while the corresponding eigenvectors are specified in the columns of the matrix $\mathbf{\mathbf{E}}$, where $\mathbf{\mathbf{E}}$ and ${\mathbf{\mathbf{E}}}^{1}$ read:
The inverse and exponential of the overlap matrix can be therefore computed as:
so that we obtain the term $\mathbf{\mathbf{a}}(t)$ in Equation (44) as:
From Equations (43), (44), (57), and (63), we can write the expression for the dynamics of the transient channel ${\mathbf{\mathbf{J}}}_{2}$ with initial condition given by ${\mathbf{\mathbf{r}}}_{0}={\alpha}_{1}{\mathbf{\mathbf{v}}}^{(1)}+{\alpha}_{2}{\mathbf{\mathbf{v}}}^{(2)}+\beta {\mathbf{\mathbf{z}}}^{(\u27c2)}$ (with ${\mathbf{\mathbf{z}}}^{(\u27c2)}$ a vector orthogonal to both ${\mathbf{\mathbf{v}}}^{(1)}$ and ${\mathbf{\mathbf{v}}}^{(2)}$) as:
Therefore, the squared distance from baseline of the trajectory reads:
Note that when ${\mathrm{\Delta}}_{2}={\mathrm{\Delta}}_{1}$, the connectivity matrix is purely antisymmetric (and therefore normal) and cannot produce amplified dynamics. In fact, in this case, the evolution of the distance from baseline results in an exponential decay $\mathbf{\mathbf{r}}(t)={e}^{t}$. Amplification therefore requires that ${\mathrm{\Delta}}_{1}\ne {\mathrm{\Delta}}_{2}$ and specifically that Equation (60) holds.
We observe that while an antisymmetric matrix (therefore normal, and with purely imaginary eigenvalues) cannot produce amplified dynamics, nonetheless a matrix with purely imaginary eigenvalues is not necessarily normal, and can thus produce amplified dynamics. Qualitatively, this is reflected in the dynamics tracing elliptical rotations (as opposed to circular rotations in the case of an antisymmetric matrix) which determine oscillations (and therefore rising and decaying phases) in the distance from baseline. This analysis can be extended to the case of a superposition of K rank2 mutually orthogonal channels with connectivity matrix given by:
where each channel $k$ is defined by the connectivity vectors ${\mathbf{\mathbf{v}}}^{(1;k)}$, ${\mathbf{\mathbf{v}}}^{(2;k)}$ (Figure 5B), defining together a rank2K rotational channel. If we write the initial condition along the connectivity vectors of the different K channels as
with ${\sum}_{k=1}^{K}\left({\alpha}_{1}^{(k)2}+{\alpha}_{2}^{(k)2}\right)+{\beta}^{2}=1$, we can write the resulting dynamics as the sum of the dynamics within each channel as:
where the terms $\mathbf{\mathbf{r}}(t;{\mathbf{\mathbf{r}}}_{0}^{(k)})$ have the same form of the second term of the righthand side of Equation (64).
Conditions on the initial state ${\mathbf{\mathbf{r}}}_{0}$
Request a detailed protocolIn this paragraph we consider the rotational channel examined in Section 'Dynamics of a lowrank rotational channel' and study the conditions on the relationship between the initial condition ${\mathbf{\mathbf{r}}}_{0}$ and the connectivity vectors that ensure that the evoked dynamics be amplified. We first examine the case of a rank2 rotational channel, and then generalize to a superposition of orthogonal rank2 channels.
We consider the equation for the distance from baseline of the dynamics for a rank2 channel (Equation (65)) and assume, without loss of generality, that ${\mathrm{\Delta}}_{2}>{\mathrm{\Delta}}_{1}$. We observe that when ${\mathrm{\Delta}}_{2}>{\mathrm{\Delta}}_{1}$, an initial condition along ${\mathbf{\mathbf{v}}}^{(1)}$ does not evoke amplified dynamics. On the other hand, an initial condition along ${\mathbf{\mathbf{v}}}^{(2)}$ can evoke amplified dynamics, depending on the values of ${\mathrm{\Delta}}_{1}$ and ${\mathrm{\Delta}}_{2}$. In fact, we have:
Therefore, when ${\mathrm{\Delta}}_{2}>{\mathrm{\Delta}}_{1}$ (resp. ${\mathrm{\Delta}}_{1}>{\mathrm{\Delta}}_{2}$), the initial condition ${\mathbf{\mathbf{r}}}_{0}$ needs to have a substantial component along the vector ${\mathbf{\mathbf{v}}}^{(2)}$ (resp. ${\mathbf{\mathbf{v}}}^{(1)}$). To formalize this observation, we compute the peak time ${t}^{*}$ when $\mathbf{\mathbf{r}}(t;{\mathbf{\mathbf{r}}}_{0}={\mathbf{\mathbf{v}}}^{(2)}){}^{2}$ takes the largest value. If Equation (60) holds and ${\mathrm{\Delta}}_{2}\gg {\mathrm{\Delta}}_{1}$, the peak time can be approximated by $\omega {t}^{*}\approx \pi /2$.
We then examine the value of the distance from baseline ${\mathbf{\mathbf{r}}({t}^{*};{\mathbf{\mathbf{r}}}_{0})}^{2}$ when the initial condition ${\mathbf{\mathbf{r}}}_{0}$ is a linear combination of the vectors ${\mathbf{\mathbf{v}}}^{(1)}$, ${\mathbf{\mathbf{v}}}^{(2)}$ and a vector orthogonal to both $\mathbf{\mathbf{v}}$’s, that is, ${\mathbf{\mathbf{r}}}_{0}={\alpha}_{1}{\mathbf{\mathbf{v}}}^{(1)}+{\alpha}_{2}{\mathbf{\mathbf{v}}}^{(2)}+\beta {\mathbf{\mathbf{z}}}^{(\u27c2)}$, with ${\alpha}_{1}^{2}+{\alpha}_{2}^{2}+{\beta}^{2}=1$. At the peak time we can write the distance from baseline as:
By requiring that, at the peak time, the distance from baseline is larger than unity we can derive the following sufficient conditions on the component of the initial conditions along the vectors ${\mathbf{\mathbf{v}}}^{(1)}$, ${\mathbf{\mathbf{v}}}^{(2)}$ and ${\mathbf{\mathbf{z}}}^{(\u27c2)}$:
When the component of the initial condition along ${\mathbf{\mathbf{z}}}^{(\u27c2)}$ is zero ($\beta =0$), Equation (71) is satisfied when
Equation (71) shows that in an amplified rotational channel the initial state ${\mathbf{\mathbf{r}}}_{0}$ may have a significant component orthogonal to the vector ${\mathbf{\mathbf{v}}}^{(2)}$ and yet be able to generate amplified dynamics. However, from Equation (70) we observe that, for a fixed value of ${\mathrm{\Delta}}_{1}$ and ${\mathrm{\Delta}}_{2}$ (with ${\mathrm{\Delta}}_{2}>{\mathrm{\Delta}}_{1}$), the amplification decreases when the component along ${\mathbf{\mathbf{v}}}^{(2)}$, that is, ${\alpha}_{2}$, decreases. Therefore, to have amplification the component of ${\mathbf{\mathbf{r}}}_{0}$ along ${\mathbf{\mathbf{v}}}^{(2)}$ should be sufficiently strong.
The above condition Equation (71) can be generalized to the case of a superposition of orthogonal rank2 rotational channels (see Equation (68)) if we assume that the peak time is approximately the same when considering separately the dynamics within each rank2 channel. Under these assumptions, the distance from baseline at the peak time can be written as
and the condition for the relationship between initial condition and connectivity vectors in a rank2K channel becomes:
When the component of the initial condition along ${\mathbf{\mathbf{z}}}^{(\u27c2)}$ is zero ($\beta =0$) and the ${\mathrm{\Delta}}_{i}^{(k)}$ do not depend on the channel $k$, Equation (74) is satisfied when
where the lefthand side represents the component of the initial condition on all the vectors ${\mathbf{\mathbf{v}}}^{(2;k)}$’s. From Equation (73) we note that, for fixed values of the ${\mathrm{\Delta}}_{1}^{(k)}$ and ${\mathrm{\Delta}}_{2}^{(k)}$ (assuming ${\mathrm{\Delta}}_{2}^{(k)}>{\mathrm{\Delta}}_{1}^{(k)}$ for all $k$), amplification decreases by decreasing the components of the initial condition on the vectors ${\mathbf{\mathbf{v}}}^{(2;k)}$’s. Therefore, to generate amplified dynamics, the components of the initial condition ${\mathbf{\mathbf{r}}}_{0}$ on the vectors ${\mathbf{\mathbf{v}}}^{(2;k)}$’s should be sufficiently strong.
Correlation between initial and peak state
Request a detailed protocolIn this section we derive the expression for the correlation between the initial state ${\mathbf{\mathbf{r}}}_{0}$ and the state at the peak of the transient dynamics, for a rank2K rotational channel consisting of K rotational rank2 channels (see Section 'Dynamics of a lowrank rotational channel'). We set the initial condition to ${\mathbf{\mathbf{r}}}_{0}={\sum}_{k=1}^{K}{\alpha}_{1}^{(k)}{\mathbf{\mathbf{v}}}^{(1;k)}+{\sum}_{k=1}^{K}{\alpha}_{2}^{(k)}{\mathbf{\mathbf{v}}}^{(2;k)}+\beta {\mathbf{\mathbf{z}}}^{(\u27c2)}$ and evaluate the state at the peak at time ${t}^{*}$ (corresponding to the peak time when ${\mathbf{\mathbf{r}}}_{0}$ has a strong component on the ${\mathbf{\mathbf{v}}}^{(2;k)}$’s and ${\mathrm{\Delta}}_{2}^{(k)}\gg {\mathrm{\Delta}}_{1}^{(k)}$). The initial condition and the peak state are therefore given by:
From the mutual orthogonality between $\mathbf{v}}^{(1;k)$, ${\mathbf{\mathbf{v}}}^{(2;k)}$, and ${\mathbf{\mathbf{z}}}^{(\u27c2)}$ it follows that
Thus, we can write the correlation between the initial condition and the peak state as
If the initial condition has a strong component on the vectors ${\mathbf{\mathbf{v}}}^{(2)}$’s (and the component on ${\mathbf{\mathbf{z}}}^{(\u27c2)}$ can be neglected, that is, $\beta =0$), so that ${\alpha}_{1}^{(k)\mathrm{\hspace{0.17em}2}}={\u03f5}^{2}$ and ${\alpha}_{2}^{(k)\mathrm{\hspace{0.17em}2}}=1{\u03f5}^{2}$, then the correlation between initial and peak state satisfies
and equal to zero when the initial condition is a linear combination of the vectors ${\mathbf{\mathbf{v}}}^{(2)}$’s.
Singletrial analysis of population OFF responses
Request a detailed protocolIn this section we focus on the singletrial structure of the dynamics generated by the network and singlecell models. In particular, we consider a setting where dynamics are amplified and examine the amount of singletrial variability along specific directions in the state space: the direction corresponding to the maximum amplification of the dynamics, denoted by ${\mathbf{\mathbf{r}}}_{\mathrm{ampl}}$, and a random direction ${\mathbf{\mathbf{r}}}_{\mathrm{rand}}$. Variability along a given direction ${\mathbf{\mathbf{z}}}_{0}$ at time $t$ can be written as a function of the covariance matrix $\mathbf{\mathbf{C}}(t)$ of the population activity at time $t$ as:
In particular, for a given direction ${\mathbf{\mathbf{z}}}_{0}$, we examine the ratio between the variability at the time of maximum amplification $t={t}^{*}$ and the variability at time $t=0$ (corresponding to the end of stimulus presentation), which we termed variability amplification (VA), defined as:
We show that the network model predicts that the variability should increase from $t=0$ to $t={t}^{*}$ along the amplified direction ${\mathbf{\mathbf{r}}}_{\mathrm{ampl}}$, but not along a random direction ${\mathbf{\mathbf{r}}}_{\mathrm{rand}}$, resulting in the relationship $VA({\mathbf{\mathbf{r}}}_{\mathrm{ampl}})>VA({\mathbf{\mathbf{r}}}_{\mathrm{rand}})$. In contrast, in the singlecell model the presence of VA depends on the shapes of the singlecell responses. Further, we show that changes in VA between the amplified and random directions are due to distinct mechanisms in the network and singlecell models: in the network model they are due to the interneuron correlations generated by network recurrency; in the singlecell model they directly stem from changes in the variance of single neurons in the population.
Structure of singletrial responses in the network model
Request a detailed protocolIn the following we examine the structure of singletrial population OFF responses generated by a general linear network model. We show that the network model is able to reproduce the experimental observations on VA under the minimal assumption that the variability in the singletrial responses is due to the noise in the initial condition ${\mathbf{\mathbf{r}}}_{0}$ at stimulus offset. The analysis can nevertheless be extended to the case where external input noise is fed into the network without affecting the key results.
We consider a linear network model with connectivity $\mathbf{\mathbf{J}}$ and a single stimulus s modeled by the trialaveraged initial condition ${\mathbf{\mathbf{r}}}_{0}$. We assume that the trialtotrial noise in the initial condition is Gaussian, so that for trial $\nu $ the corresponding initial condition ${\mathbf{\mathbf{r}}}_{0}^{(\nu )}$ is given by
where $\u27e8{\eta}_{i}^{(\nu )}\u27e9=0$ and $\u27e8{\eta}_{i}^{(\nu )}{\eta}_{j}^{(\nu )}\u27e9={s}^{2}{\delta}_{ij}$, so that $\u27e8{\mathbf{\mathbf{r}}}_{0}^{(\nu )}\u27e9={\mathbf{\mathbf{r}}}_{0}$. The solution of the linear system can thus be analytically expressed as:
where the timedependent matrix ${\mathbf{\mathbf{P}}}_{t}$ is the propagator of the system (Arnold, 1973).
We start by computing the covariance matrix of the population activity at an arbitrary time $t$ as defined by:
Using Equations (83) and (84), we can write the covariance matrix as (Farrell and Ioannou, 1996; Farrell and Ioannou, 2001):
so that $\mathbf{\mathbf{C}}(0)={s}^{2}\mathbf{\mathbf{I}}$.
At a given time $t={t}^{*}$, let the SVD of the propagator is given by
where the singular values ${\sigma}_{i}$ are positive numbers. If none of the singular values ${\sigma}_{i}$ is larger than unity, no initial condition ${\mathbf{\mathbf{r}}}_{0}$ can lead to amplified dynamics at time ${t}^{*}$, meaning that for all initial conditions $\mathbf{\mathbf{r}}({t}^{*})<1$(Bondanelli and Ostojic, 2020). Instead, suppose that the first $K$ singular values ${\{{\sigma}_{k}\}}_{k=1}^{K}$ are larger than unity, while the remaining $NK$ singular values are smaller than one (with arbitrary $K$). Under this condition, any initial condition ${\mathbf{\mathbf{r}}}_{0}$ consisting of a linear combination of the first K vectors ${\mathbf{\mathbf{n}}}^{(k)}$ (i.e. ${\mathbf{\mathbf{r}}}_{0}={\sum}_{k=1}^{K}{\alpha}_{k}{\mathbf{\mathbf{n}}}^{(k)}$) will be amplified, so that the norm of the population vector at time ${t}^{*}$ is larger than one (Bondanelli and Ostojic, 2020):
In the following analysis, we explore this last scenario, where the first $K$ singular values are larger than unity. We make the following two assumptions. First, we assume that the initial condition is a linear combination of the first $K$ vectors ${\mathbf{\mathbf{n}}}^{(k)}$’s, so that the amplified direction is given by ${\mathbf{\mathbf{r}}}_{\mathrm{ampl}}=\mathbf{\mathbf{r}}({t}^{*})/\mathbf{\mathbf{r}}({t}^{*})$ (see Equation (87)). Notably, for any specified initial condition ${\mathbf{\mathbf{r}}}_{0}$ defined in this way, the following analysis holds for all times ${t}^{*}$ for which $\mathbf{\mathbf{r}}({t}^{*})>1$, including the peak time where the distance from baseline is maximum. Second, we assume that averages of the first or second powers of the singular values are of order (at most) one, that is, ${\sum}_{i=1}^{M}{\sigma}_{i}/M,{\sum}_{i=1}^{M}{\sigma}_{i}^{2}/M=O(1)$ for any integer $M$. This assumption holds for typically studied network topologies (e.g. Gaussian random, lowrank networks), where a few amplified dimensions (corresponding to ${\sigma}_{i}>1$) coexist with many nonamplified ones (corresponding to ${\sigma}_{i}<1$) (see Bondanelli and Ostojic, 2020).
Using Equation (84) and the SVD of the propagator ${\mathbf{\mathbf{P}}}_{{t}^{*}}$ (Equation (86)), we can write the covariance matrix at time ${t}^{*}$ as
Thus, we can compute the amount of variability along the amplified direction ${\mathbf{\mathbf{r}}}_{\mathrm{ampl}}$ (Equation (87)) and a random direction ${\mathbf{\mathbf{r}}}_{\mathrm{rand}}$ (with $\mathbf{r}}_{\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{d}}=\sum _{i=1}^{N}{\beta}_{i}{\mathbf{m}}^{(i)$, with ${\beta}_{i}\sim \mathcal{\mathcal{N}}(0,1/N)$), using Equation (80) as:
where in the third equation the sum runs only over the singular values larger than one, while in the fourth equation the sum runs over all singular values (both larger and smaller than unity), and brackets denote average over realizations of $\mathbf{r}}_{\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{d}$. Equation (89) shows that in a linear network model the variability along the amplified direction ${\mathbf{\mathbf{r}}}_{\mathrm{ampl}}$ increases from the initial time $t=0$ to the peak time $t={t}^{*}$, while the variability along a random direction does not, and instead decreases. We quantified these observations by computing the VA, defined in Equation (81) (Figure 8). Using Equation (89) we obtain:
thus showing that, for a linear network model, VA is larger when computed for the amplified direction than a random direction.
Shuffled responses
Request a detailed protocolIn the following we compare the VA of the original responses with the VA obtained from the responses where the trial labels have been shuffled independently for each cell. The covariance matrix of the shuffled responses ${\mathbf{\mathbf{C}}}^{(\mathrm{sh})}(t)$ is obtained by retaining only the diagonal elements of the real covariance matrix $\mathbf{\mathbf{C}}(t)$:
The first two quantities in Equation (89) computed using ${\mathbf{\mathbf{C}}}^{\mathrm{sh}}$ instead of $\mathbf{\mathbf{C}}$ do not change, since $\mathbf{\mathbf{C}}(0)$ is already diagonal. Instead, for shuffled responses the variability along both directions at the peak time ${t}^{*}$ is of order $O({s}^{2})$, since:
Following the same steps it is possible to show that ${[\mathrm{var}({\mathbf{\mathbf{r}}}_{\mathrm{ampl}};{t}^{*};\mathrm{shuffled})]}_{{\mathbf{\mathbf{r}}}_{\mathrm{ampl}},{\{{\mathbf{\mathbf{m}}}^{(k)}\}}_{k}}=O({s}^{2})$. In fact if we set, according to Equation (87), ${\mathbf{\mathbf{r}}}_{\mathrm{ampl}}={\sum}_{k=1}^{K}{\alpha}_{k}{\sigma}_{k}{\mathbf{\mathbf{m}}}^{(k)}/C$, where $C={\sum}_{k=1}^{K}{\alpha}_{k}^{2}{\sigma}_{k}^{2}=O(1)$, we have:
where in the last equality we assumed that $[{\alpha}_{k}^{2}]\approx 1/K$ if ${\alpha}_{k}\sim \mathcal{N}(0,1/K)$. To summarize, for shuffled responses we have:
so that the VA along both directions is given by:
implying a weaker modulation of the VA between random and amplified direction for shuffled responses with respect to the original responses. These analyses show that in the network model, the difference in the VA between the amplified and random directions is a result of the recurrent interactions within the network, reflected in the offdiagonal elements of the covariance matrix $\mathbf{\mathbf{C}}(t)$, and it therefore cannot be explained solely by changes in the variability of single units.
Structure of singletrial responses in the singlecell model
Request a detailed protocolHere we examine the structure of the singletrial population responses generated by the singlecell model (Equation (1)). We consider singletrial responses ${\mathbf{r}}^{(\nu )}(t)$ to a single stimulus generated by perturbing the initial state of the trial averaged responses $\mathbf{r}}_{0$, yielding
where ${\eta}^{(\nu )}$ has zero mean and variance equal to ${s}^{2}$ ($\u27e8{\eta}_{i}^{(\nu )}\u27e9=0$ and $\u27e8{\eta}_{i}^{(\nu )}{\eta}_{j}^{(\nu )}\u27e9={s}^{2}{\delta}_{ij}$). From Equation (96) we can compute the covariance matrix of the population response for the singlecell model as:
We can obtain the variability along a given direction ${\mathbf{\mathbf{z}}}_{0}$ as:
Notably, the term on the righthand side is proportional to the norm of the dynamics that would be evoked by the initial condition ${\mathbf{\mathbf{z}}}_{0}$. In the singlecell model, the variability is maximally amplified along the initial condition that leads to largest amplification of the trial averaged dynamics. In contrast, in the network model, the variability is maximally amplified along the amplified direction of the trialaveraged dynamics, which is in general orthogonal to the corresponding initial condition (see Materials and methods, Section 'Conditions on the initial state ${\mathbf{\mathbf{r}}}_{0}$'; Figure 6C,E). In general, the relationship between the modulation of the variability along the amplified direction and modulation along a random direction depends on the specific shape of the temporal filters ${L}_{i}(t)$, and we used simulations of the fitted model to determine it. We can however contrast the singletrial structure generated by the singlecell model with the structure generated by the recurrent model by noting that the covariance matrix given by Equation (97) is diagonal, and is therefore equal to the covariance matrix of the shuffled responses, ${\mathbf{\mathbf{C}}}_{ij}(t)={\mathbf{\mathbf{C}}}_{ij}^{\mathrm{sh}}(t)$. This implies that no difference is observed when computing the VA for the amplified and random directions for the real and shuffled responses (i.e. $\mathrm{VA}({\mathbf{\mathbf{r}}}_{\mathrm{rand}})=\mathrm{VA}({\mathbf{\mathbf{r}}}_{\mathrm{rand}};\mathrm{shuffled})$ and $\mathrm{VA}({\mathbf{\mathbf{r}}}_{\mathrm{ampl}})=\mathrm{VA}({\mathbf{\mathbf{r}}}_{\mathrm{ampl}};\mathrm{shuffled})$), indicating that changes in the trialtotrial variability can be fully explained in terms of changes in the variability of single units.
In Figure 8E,G,F,H, for both network and singlecell models, singletrial population responses were generated by drawing the singletrial initial conditions from a random distribution with mean ${\mathbf{\mathbf{r}}}_{0}$ and covariance matrix computed from the singletrials initial conditions of the original data (across neurons for the singlecell model, across PC dimensions for the network model). The results do not substantially change if we draw the singletrial initial conditions from a random distribution with mean ${\mathbf{\mathbf{r}}}_{0}$ and isotropic covariance matrix (${\mathbf{\mathbf{r}}}_{0}^{(\nu )}\sim \mathcal{N}({\mathbf{\mathbf{r}}}_{0},\mathbf{\mathbf{C}}(0))$, with $\mathbf{\mathbf{C}}(0)={s}^{2}\mathbf{\mathbf{I}}$), as assumed in the analysis above (not shown).
Software
Numerical simulations and data analyses were done using Python 3.6 (using NumPy [Harris et al., 2020], SciPy [Virtanen et al., 2020], and the scikitlearn package [Pedregosa et al., 2011]). Code is available at https://github.com/gbondanelli/OffResponses; Bondanelli, 2021; copy archived at swh:1:rev:2438e688ad719eb9870af8c032803a7367fe1140. Surrogate data sets for model validation (see Materials and methods, Section 'Control data sets', Figure 3, Figure 3—figure supplement 2, Figure 3—figure supplement 3) were generated using the Matlab code provided by Elsayed and Cunningham, 2017 available at https://github.com/gamaleldin/TME.
Data availability
Python code and data are available at https://github.com/gbondanelli/OffResponses copy archived at https://archive.softwareheritage.org/swh:1:rev:2438e688ad719eb9870af8c032803a7367fe1140/.
References

Mind the gap: two dissociable mechanisms of temporal processing in the auditory systemThe Journal of Neuroscience 36:1977–1995.https://doi.org/10.1523/JNEUROSCI.165215.2016

Computational models of millisecond level duration tuning in neural circuitsJournal of Neuroscience 29:9255–9270.https://doi.org/10.1523/JNEUROSCI.108509.2009

Excitatory local connections of superficial neurons in rat auditory cortexJournal of Neuroscience 28:11174–11185.https://doi.org/10.1523/JNEUROSCI.209308.2008

CorticoCortical connectivity within ferret auditory cortexJournal of Comparative Neurology 523:2187–2210.https://doi.org/10.1002/cne.23784

Numerical methods for computing angles between linear subspacesMathematics of Computation 27:579–594.https://doi.org/10.2307/2005662

SoftwareOffResponses, version swh:1:rev:2438e688ad719eb9870af8c032803a7367fe1140Software Heritage.

Coding with transient trajectories in recurrent neural networksPLOS Computational Biology 16:e1007655.https://doi.org/10.1371/journal.pcbi.1007655

Duration tuning in the mouse auditory midbrainJournal of Neurophysiology 84:1790–1799.https://doi.org/10.1152/jn.2000.84.4.1790

Statedependent computations: spatiotemporal processing in cortical networksNature Reviews Neuroscience 10:113–125.https://doi.org/10.1038/nrn2558

Spectral envelope coding in cat primary auditory cortex: linear and nonlinear effects of stimulus characteristicsEuropean Journal of Neuroscience 10:926–940.https://doi.org/10.1046/j.14609568.1998.00102.x

Neural variability in premotor cortex provides a signature of motor preparationJournal of Neuroscience 26:3697–3712.https://doi.org/10.1523/JNEUROSCI.376205.2006

Temporal complexity and heterogeneity of singleneuron activity in premotor and motor cortexJournal of Neurophysiology 97:4235–4257.https://doi.org/10.1152/jn.00095.2007

Dimensionality reduction for largescale neural recordingsNature Neuroscience 17:1500–1509.https://doi.org/10.1038/nn.3776

Procedures for ReducedRank regressionApplied Statistics 31:244–255.https://doi.org/10.2307/2347998

Cortical off response tuning for stimulus durationVision Research 36:3243–3251.https://doi.org/10.1016/00426989(96)000405

Structure in neural population recordings: an expected byproduct of simpler phenomena?Nature Neuroscience 20:1310–1318.https://doi.org/10.1038/nn.4617

Generalized stability theory. part I: autonomous operatorsJournal of the Atmospheric Sciences 53:2025–2040.https://doi.org/10.1175/15200469(1996)053<2025:GSTPIA>2.0.CO;2

Accurate LowDimensional approximation of the linear dynamics of fluid flowJournal of the Atmospheric Sciences 58:2771–2789.https://doi.org/10.1175/15200469(2001)058<2771:ALDAOT>2.0.CO;2

BookThe canonical correlations of matrix pairs and their numerical computationIn: Bojanczyk A, Cybenko G, editors. Linear Algebra for Signal Processing. Stanford University. pp. 27–49.https://doi.org/10.1007/9781461242284_3

Wideband inhibition of dorsal cochlear nucleus type IV units in cat: a computational modelAnnals of Biomedical Engineering 27:73–87.https://doi.org/10.1114/1.150

Binaural sensitivity changes between cortical on and off responsesJournal of Neurophysiology 106:30–43.https://doi.org/10.1152/jn.01070.2010

OFF responses in the auditory thalamus of the guinea pigJournal of Neurophysiology 88:2377–2386.https://doi.org/10.1152/jn.00083.2002

Corticofugal modulation on both ON and OFF responses in the nonlemniscal auditory thalamus of the guinea pigJournal of Neurophysiology 89:367–381.https://doi.org/10.1152/jn.00593.2002

Auditory cortical onset responses revisited. I. Firstspike timingJournal of Neurophysiology 77:2616–2641.https://doi.org/10.1152/jn.1997.77.5.2616

Auditory cortical onset responses revisited. II. response strengthJournal of Neurophysiology 77:2642–2660.https://doi.org/10.1152/jn.1997.77.5.2642

Nonnormal amplification in random balanced neuronal networksPhysical Review E 86:011909.https://doi.org/10.1103/PhysRevE.86.011909

Relations between two sets of variatesBiometrika 28:321–377.https://doi.org/10.1093/biomet/28.34.321

BookNeurons, Connections, and Microcircuits of the Inferior ColliculusIn: Oliver D. L, editors. Mammalian Auditory Pathways: Synaptic Organization and Microcircuits. Springer International Publishing. pp. 127–167.https://doi.org/10.1007/9783319717982

Reducedrank regression for the multivariate linear modelJournal of Multivariate Analysis 5:248–264.https://doi.org/10.1016/0047259X(75)900421

Quantitative analysis of neuronal response properties in primary and higherorder auditory cortical fields of awake house mice (Mus musculus)The European Journal of Neuroscience 39:904–918.https://doi.org/10.1111/ejn.12478

Gap encoding by parvalbuminexpressing interneurons in auditory cortexJournal of Neurophysiology 120:105–114.https://doi.org/10.1152/jn.00911.2017

Principal Angles between Subspaces in an A Based Scalar Product: Algorithms and Perturbation EstimatesSIAM Journal on Scientific Computing 23:2008–2040.https://doi.org/10.1137/S1064827500377332

When sound stops: offset responses in the auditory systemTrends in Neurosciences 41:712–728.https://doi.org/10.1016/j.tins.2018.08.009

Wiring of divergent networks in the central auditory systemFrontiers in Neuroanatomy 5:46.https://doi.org/10.3389/fnana.2011.00046

Intracortical multiplication of thalamocortical signals in mouse auditory cortexNature Neuroscience 16:1179–1181.https://doi.org/10.1038/nn.3493

Linearity of cortical receptive fields measured with natural soundsJournal of Neuroscience 24:1089–1100.https://doi.org/10.1523/JNEUROSCI.444503.2004

Models of neuronal StimulusResponse functions: elaboration, estimation, and evaluationFrontiers in Systems Neuroscience 10:109.https://doi.org/10.3389/fnsys.2016.00109

How single neuron properties shape chaotic dynamics and signal transmission in random neural networksPLOS Computational Biology 15:e1007122.https://doi.org/10.1371/journal.pcbi.1007122

Spatial profile and differential recruitment of GABAB modulate oscillatory activity in auditory cortexJournal of Neuroscience 29:10321–10334.https://doi.org/10.1523/JNEUROSCI.170309.2009

Maturation of intrinsic and synaptic properties of layer 2/3 pyramidal neurons in mouse auditory cortexJournal of Neurophysiology 99:2998–3008.https://doi.org/10.1152/jn.01160.2007

Scikitlearn: machine learning in PythonJournal of Machine Learning Research 12:2825–2830.

Comparison between offset and onset responses of primary auditory cortex ONOFF neurons in awake catsJournal of Neurophysiology 97:3421–3431.https://doi.org/10.1152/jn.00184.2007

A dynamical systems perspective on flexible motor timingTrends in Cognitive Sciences 22:938–952.https://doi.org/10.1016/j.tics.2018.07.010

Functional organization and population dynamics in the mouse primary auditory cortexNature Neuroscience 13:353–360.https://doi.org/10.1038/nn.2484

ConferenceHow linear are auditory cortical responses?Advances in Neural Information Processing Systems. pp. 125–132.

Maximum entropy models as a tool for building precise neural controlsCurrent Opinion in Neurobiology 46:120–126.https://doi.org/10.1016/j.conb.2017.08.001

Towards the neural population doctrineCurrent Opinion in Neurobiology 55:103–111.https://doi.org/10.1016/j.conb.2019.02.002

Dynamics of random recurrent networks with correlated lowrank structurePhysical Review Research 2:013111.https://doi.org/10.1103/PhysRevResearch.2.013111

Adaptation in auditorynerve fibers: a revised modelBiological Cybernetics 44:107–120.https://doi.org/10.1007/BF00317970

Motor primitives in space and time via targeted gain modulation in cortical networksNature Neuroscience 21:1774–1783.https://doi.org/10.1038/s4159301802760

Single unit activity in cochlear nucleus and inferior colliculus of echolocating batsThe Journal of Physiology 172:449–474.https://doi.org/10.1113/jphysiol.1964.sp007432

A neural network that finds a naturalistic solution for the production of muscle activityNature Neuroscience 18:1025–1033.https://doi.org/10.1038/nn.4042

BookSpectra and Pseudospectra: The Behavior of Nonnormal Matrices and OperatorsPrinceton University Press.https://doi.org/10.1007/9783662039724_6

A tutorial on canonical correlation methodsACM Computing Surveys 50:1–33.https://doi.org/10.1145/3136624

The functional role of GABA and glycine in monaural and binaural processing in the inferior colliculus of horseshoe batsJournal of Comparative Physiology A 171:541–553.https://doi.org/10.1007/BF00194587

Neural coding strategies in auditory cortexHearing Research 229:81–93.https://doi.org/10.1016/j.heares.2007.01.019

Auditory cortical projections to the cat inferior colliculusThe Journal of Comparative Neurology 400:147–174.https://doi.org/10.1002/(SICI)10969861(19981019)400:2<147::AIDCNE1>3.0.CO;29

The function of offset neurons in auditory information processingTranslational Neuroscience 5:275–285.https://doi.org/10.2478/s1338001402355
Decision letter

Peter LathamReviewing Editor; University College London, United Kingdom

Barbara G ShinnCunninghamSenior Editor; Carnegie Mellon University, United States

Guillaume HennequinReviewer; University of Cambridge, United Kingdom
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This paper explores the hypothesis that OFF responses are singlecell signatures of network dynamics, and shows convincingly that a model based on recurrent interactions does a much better job explaining the data than the de facto standard model, which is based purely on single cell dynamics. In addition, the proposed recurrent network model generates transient OFF responses very similar to what is seen in experiments.
Decision letter after peer review:
Thank you for submitting your article "Population coding and network dynamics during OFF responses in auditory cortex" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Barbara ShinnCunningham as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Guillaume Hennequin (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. Please aim to submit the revised version within two months.
Summary:
In this paper, Bondanelli and colleagues study the spatiotemporal structure of OFF responses in A1. They propose that these responses arise from a simple linear mechanism involving a certain form of nonnormal population dynamics. They fit such a network model to A1 data and find that it explains several important aspects of OFF responses, some, but importantly, not all, of which are also captured by a singleneuron mechanism.
Essential revisions:
We're very sorry for the long delay. That's in part because at least one of use found the paper very hard to read, and it took a long time to figure out what was going on. And as you'll see, it's not clear if we did that very well. We think the problem is that you're missing quantitative analysis; in many cases it seems that we're supposed to understand why a particular piece of analysis supported your model, but it simply wasn't clear to us why that was true. Explicit examples of this will be given below.
A major issue for us has been that you begin with a discussion of special cases of Equation (2) (last two paragraphs of page 6) but then go on to test predictions of THE model without telling us which of these special assumptions you are actually making (if any) in the production of your results (Figure 4 specifically). We believe that the first paragraph of "Testing the network mechanisms on the data" should begin with a clear description of the actual model considered (including whether your transient channels are mutually orthogonal, and how initial conditions relate to those planes).
More specifically, to summarize our current understanding: under the assumption that the v's are orthonormal, the u's are orthogonal, and u_{k} dot v_{l} = 0 for all k \ne l (not strictly true, but close enough in high dimensions), the most general solution (a generalization of Eq. 34 to an arbitrary number of modes) is
r(t) = [uperp + sum_{k} u_{k} (a_{k} + v_{k} dot u_{k} t)] exp(t)
where uperp is perpendicular to all the u_{k} and the a_{k} are arbitrary.
This appears late in Methods. But it's important, because there seem to be hidden assumptions about uperp and the a_{k}. Namely,
 the a_{k} are all set to zero
 uperp = u_{k} for only one k, with k dependent on the stimulus.
Assuming this is correct, it's a critical part of the model. And it is, in fact, an extra assumption.
Given that setup, we believe there were several predictions:
a. Lines 216228: There's a prediction about the correlational structure that we didn't fully understand. Equations are needed, in the main text, that make the predictions explicit.
b. Lines 229238: Correlations between the beginning and the peak of the off response should small. Again, equations are needed; this should be quantified.
c. Lines 239253:
"To further examine to which extent the population dynamics during OFF responses could be accounted for by a nonnormal linear dynamical system, we next fitted our network model population OFF responses to all stimuli at once using reducedrank regression (Figure S1; see Methods). Qualitatively, we found that the fit reproduced well the lowdimensional population trajectories (Figure 4C)."
As far as we could tell, this just tells us that the fit is good; it doesn't tell us anything about whether the OFF responses could be accounted for by a nonnormal linear dynamical system. Is that correct, or did we miss something? If that is correct, it seems like this paragraph is mainly a distractor.
d. Lines 254262: The fitted connectivity, J, had eigenvalues less than 1 and the symmetrized version had eigenvalues greater than 1. That's consistent with the low rank, nonnormal model. However, couldn't such a J could come from a high rank, nonnormal model? If so, it's not strong evidence for your model. This should be clarified.
e. Lines 263283: There were several results here, but it's not clear how much they supported the model. For example,
"We computed the overlap between the subspaces spanned by the transient channels for each pair of stimuli s_{1} and s_{2} , analogous to the overlap between the vectors u^{(s1)} and u^{(s2)} in Eq. (2) (see Methods). We found that the overlap between the transient channels matched the overlap between the response dynamics and population states at the end of stimulus presentation (Figure 4F, right panel), consistent with the interpretation that individual OFF responses may be generated through transient coding channels of the form given by Eq. (2)."
It's not clear what we should expect here. Is this strong or weak evidence for the model? You need to quantify what we should expect, and be clear whether this is only consistent with the model, or provides evidence for it.
"While the fraction of variance explained when using the matrix J_{Sum} was necessarily lower than the one computed using J_{Full} , the model with connectivity J_{Sum} could still explain a consistent fraction of the total variance, and yielded values of R^{2} significantly higher than the ones obtained from the shuffles."
Again, it's not clear what we should expect. Is this strong or weak evidence for the model?
f. Lines 328331: for the single cell model in Eq. 3, the correlations between the beginning and peak of the off response are high. This seems easily fixable with a very minor tweak. Suppose there are two kinds of neurons, some that peak early,
r_{i}(t) = a_{i}(s) f_{i}(t)
and some that peak late,
r_{i}(t) = b_{i}(s) g_{i}(t)
(for instance, f_{i}(t) = Theta(t) e^{t} and g_{i}(t) = Theta(t) t*e^{t}). Different stimuli (s) could activate different sets of neurons.
Alternatively, the single neuron responses could be of the form
r_{i}(t) = a_{i}(s) f_{i}(t) + b_{i}(s) g_{i}(t)
where the a's and b's are anti correlated (either a is large and b is small or vice versa) and f_{i} and g_{i} are as above.
Are these model ruled out by the data? If not, that seems to weaken the paper (although maybe there are other things wrong with these models). In either case, models like these should be discussed.
g. Lines 332353:
"A second important difference between the two models is that in the singlecell model the temporal shape of the response of each neuron is the same across all stimuli, while for the recurrent model this is not necessarily the case."
Given this opening sentence, why not just show that for the recurrent network model, the temporal shape of the response is not the same across all stimuli? Instead, there's a lot of analysis that we didn't fully understand. In particular, there's a rather sudden switch from specifying the form of L_{i}(t) (Eq. 44) and fitting L_{i}(t) (Eq. 48). What's missing is a quantitative prediction of what we should expect, followed by a demonstration that the data is consistent with that prediction.
In addition, we have some specific suggestions.
1. Lines 174177,
"We focus on two outstanding properties identified in the OFF responses to different stimuli. The first one is the strong amplification of OFF responses, as quantified by the transient increase of the distance from baseline r(t). The second important feature is the lowdimensionality of OFF response trajectories and their global structure across different stimuli."
It seems that this is all you're going to focus on, but it's not. It would be nice to know, at some point early in the manuscript, exactly what you're going to do.
2. What's the difference between Figures 2D and 4A (left panel).
3. Lines 32931,
"The singlecell model predicts that these two states are only weakly decorrelated and essentially lie along a single dimension (Figure 5E and S7), which is inconsistent with experimental observations"
Which experimental observations?
4. Throughout the paper we found it unclear how initial conditions were chosen for the model, e.g. in Figure 3. From line 167, we are led to think that they are picked from the actual data, "we model the firing activity at the end of stimulus presentation as the initial condition of the network dynamics"  since the model doesn't explicitly models the stimulus period as far as we understand (?), we infer this activity is taken from the data? But does the model even have the same number of neurons?
5. The purpose of Figure 2 is to convey that different stimuli evoke orthogonal OFF responses, and these OFF responses are low dimensional. By my calculation, for each conditional mean (over 20 trials), there are 400ms * 31.5 frame/s ~= 13 samples used to estimate the PCs and variance explained resulting in a maximum dimensionality (PCs to get to 80%) of 13. If this is the case, a more careful comparison to the crosscondition dimensionality is warranted, which accounts for sample size.
Also, we would expect there to be fewer time points on the traces in Figure 2B. How were the trialaveraged responses interpolated?
What steps were taken to insure that the same unit wasn't double counted during pooling across sessions. Mistakes in this process can contribute to inflated variance along primary PC's creating the illusion of low dimensionality.
6. Statements of relative degree of subspace overlap should be verified with a hypothesis test. For data, this can be done by resampling neurons.
7. You should justify why LDA (naive covariance assumption) is more suitable for assessing linear separability in Figure 1C than an SVM. Absent justification, an SVM should be used to get the clearest assessment of linear separability.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Network dynamics underlying OFF responses in the auditory cortex" for further consideration by eLife. Your revised article has been evaluated by Barbara ShinnCunningham (Senior Editor) and a Reviewing Editor.
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:
The good news is that the paper is much improved. The writing is more clear, and the analysis of dimensions of amplified trialtotrial variability is compelling. The less good news is that now that we understand it better, additional issues have come up. All can, we believe, be dealt with relatively easily.
Let's start with the data you're are trying to explain. Basically, after the offset of an auditory stimulus, there is transient amplification, with different stimuli activating nearly orthogonal subspaces.
To model this, low rank linear dynamics is used, of the form
dr/dt = sum_{k=1}^R u_{k} v_{k} r – r,
where r (=r1, r2,.…, rN) is the trial averaged activity. The network had a few additional properties,
a. The rank, R, is much less than the number of neurons, N.
b. The u's and v's are nearly orthogonal.
c. The symmetric part of the connectivity matrix must have eigenvalues with real part >1.
d. The model has no external input, so the response to a particular stimulus (after offset) depends only on the value of r immediately after stimulus offset, here denoted r0(s) where s is the stimulus.
e. For any stimulus, s, r0(s) has appreciable overlap with a small number of v_{k} (on the order of 5).
This model did an OK job explaining the data. In particular, Figure 4H, left (the model), is supposed to match Figure 2D (the data). But the match is not so good: the offdiagonal channel overlap in Figure 2D is near zero, whereas it's around 0.5 in Figure 4H. And the full model explains only about half the variance (Figure 4H, Right).
In addition, you claim that the dynamics is autonomous. The test for this is described on lines 2519:
"We first tested the starting assumption that the OFF response trajectories are consistent with linear, autonomous network dynamics driven only by the initial pattern of activity and network interactions. If that is the case, the OFF response dynamics are fully determined by the initial activity at stimulus offset (t = 0). This assumption implies that the geometric structure of the initial activity states r_{0}^{(s)} fully predicts the geometric structure of the dynamics r^{(s)}(t) during the OFF responses (see Materials and methods). In particular, two stimuli s_{1} and s_{2} corresponding to correlated initial states r_{0}^{(s1)} and r_{0}^{(s2)} will lead to correlated trajectories r^{(s1)}(t) and r^{(s2)}(t) in the model. Conversely, if two OFFresponse trajectories r^{(s1)}(t) and r^{(s2)}(t) are orthogonal, the initial states r_{0}^{(s1)} and r_{0}^{(s}_{2}^{)} are necessarily orthogonal."
We're pretty sure that isn't correct: correlations are preserved only if the dynamics causes pure rotation at constant angular speed, something that doesn't happen in general, even for linear dynamics.
And finally, Figure 4E, right, shows that many u's and v's are not orthogonal  judging from that figure, on the order of 1020%. These terms were of the form
u1 v2 v1 – u2 v1 v2
where v1 and v2 are nearly orthogonal. You claim that terms like this can't lead to transient amplification. That's true if u_1=u_2. But if u_1 and u_2 are sufficiently different, it's not true. But maybe u_1 \approx u_2 in the data? If so, that should be clear. In any case, this seems like a detail  it's certainly possible to do the analysis with these terms included.
The lack of orthogonality between the u's and v's doesn't seem so serious. But the lack of agreement between Figures 2D and 4H, combined with no strong evidence for autonomous dynamics, seems potentially problematic. Of course, no model ever fits perfectly. But it's hard to figure out exactly what the takehome message is. If there was strong evidence that the dynamics is autonomous, that would help  at least we would know that the intrinsic dynamics is responsible for the transient amplification.
As a potentially important aside: the eigenvalue spectrum of Figure 4D shows one marginally stable, nonoscillatory mode. Based on past experience of one of the reviewers, it may be that the lack of agreement between Figures 2D and 4H is because of this mode. The reasoning is that if this slow mode is present by a small amount in all initial conditions, it will generate the fairly weak overlap in Figure 2D, but because it sticks, it might end up dominating the late offresponses for each stimulus after all the rest has decayed away. If that's the case, then this mode will be part of the principal subspace extracted for each stimulus. The measure of overlap based on "subspace angle" is very sensitive to this, which could explain the large overlaps in Figure 4H. Importantly, that marginally stable eigenvalue may be an artifact of insufficient regularization. There are a lot of "ifs" in the above reasoning, but if this mode can be eliminated, perhaps by different regularization, agreement may improve. We don't know if that's possible, but it's worth thinking about.
There is a lot of very nice analysis in the paper, and all reviewers agree that it should be published. However, it needs a clear takehome message. Exactly what that is depends, we believe, on how strongly the evidence is for autonomous dynamics. If that's very strong, then at the very least the takehome message is:
"Off cells show transient amplification that is generated by internal dynamics, not external input. That transient dynamics is mildly consistent with low rank connectivity in which dynamics can mainly be reduced to multiple 2D systems. However, we can't rule out more complex behavior, with transient amplification fundamentally involving several dimensions."
Possibly you can come up with a stronger conclusion, but that's the best we could come up with.
If the evidence for autonomous dynamics is weak, that seems to be a fundamental problem, since the transient amplification may be caused solely by external input. But presumably that can never be ruled out  a fact that you should probably mention. It's not a plausible scenario, so we don't think it will weaken the paper..
Our main comment, then, is to explore more carefully whether or not dynamics is autonomous, and/or be more careful about the conclusions. One possible way to test for autonomous dynamics is to plot the trajectories and look for intersections, or near intersections  what Mark Churchland calls tangled trajectories, for which quantification is possible. Absence of near intersections would at least be consistent with autonomous dynamics. But there may be better methods. For instance, you might try LFADS: you can fit both an RNN and a linear dynamical model, and see which one fits better. That will, though, be a huge amount of work, and you should do it only if you want to.
In addition, we suggest exploring a richer singleneuron model. The current model has the form
r_{i}^s(t) = r_{0i}^s L_{i}(t).
If r_{i} is nonnegative, this can be written
dr_i/dt = f_{i}(t) r_{i},
with r_{i}(t=0) a function of the stimulus. It would be more convincing to consider a model with a nonlinearity; say
dr_{i}/dt = f_{i}(t) g_{i}(r_{i}).
The nonlinearity g_{i} could be parameterized with, probably, only a handful of parameters. For instance, it could be written as a sum of basis functions, much the way L_{i} was parameterised. The singlecell model was rejected because it explained only about 20% of the variance, versus 50% for the recurrent network. It would be reassuring if adding a nonlinearity didn't bring this up to 50%. Again, this also is only a suggestion. However, if you stick with the current single cell model, you'll need to frame your comparison less generally than "recurrent versus single cell".
https://doi.org/10.7554/eLife.53151.sa1Author response
Essential revisions:
We're very sorry for the long delay. That's in part because at least one of use found the paper very hard to read, and it took a long time to figure out what was going on. And as you'll see, it's not clear if we did that very well. We think the problem is that you're missing quantitative analysis; in many cases it seems that we're supposed to understand why a particular piece of analysis supported your model, but it simply wasn't clear to us why that was true. Explicit examples of this will be given below.
We thank the reviewers for a very constructive feedback which has allowed us to significantly improve the manuscript. Following the reviewers’ suggestions we have rewritten large parts of the Results and Methods. Here we first briefly list the main analyses we added. Further below, we explain the new structure of the manuscript, and how the new analyses address the reviewers’ comments.
List of new analyses:
– We theoretically derived quantitative predictions on the structure of the network connectivity for a general l owrank connectivity matrix and tested these predictions on the AC data (Figure 4E, Methods Section 2.3, 2.7);
– We derived quantitative predictions for the relationship between the structure of the network connectivity and the population vector at stimulus offset and tested these predictions on AC data (Figure 4F, Methods Section 2.3);
– We derived quantitative prediction for the value of the correlation between the population activity vectors at stimulus offset and at the time of maximum distance from baseline (Figure 4G, Methods Section 2.5);
– We added a method section illustrating the prediction for he correlation between the structure of the population vectors at stimulus offset and dynamical trajectories (Figure 4A) f or a general lowrank network model (Methods Section 2.4);
– We adopted a different approach when comparing the recurrent model with the singlecell model. In the new version of the manuscript, we fit directly the form of the temporal responses of the single units L_{i}(t) to the data (we correspondingly updated Figure 5BC; Methods Section 3.1). We then compare the predictions of the two models on AC data (Figure 5F).
– We added a new (unrequested) analysis comparing the predictions of the recurrent and singlecell model on the population dynamics across singletrials (Figure 6, Methods Section 3.2, 3.3). We show that the two models yield quantitatively different predictions for the structure of the trialtotrial response variability, and that the AC data i s consistent with the recurrent model and not with the singlecell model. This adds substantial new evidence i n favor of the recurrent model.
– We added controls f or the principal component analysis of the OFF responses to individual stimuli and across stimuli (Figure S1, Methods Section 1.4), and for the values of subspace overlap (i.e. Figure 2D) and correlation between i nitial and peak states (i.e. Figure 4G). These controls are described i n detail i n Methods Sections 1.6 and shown i n Figure S7, S8.
A major issue for us has been that you begin with a discussion of special cases of Equation (2) (last two paragraphs of page 6) but then go on to test predictions of THE model without telling us which of these special assumptions you are actually making (if any) in the production of your results (Figure 4 specifically). We believe that the first paragraph of "Testing the network mechanisms on the data" should begin with a clear description of the actual model considered (including whether your transient channels are mutually orthogonal, and how initial conditions relate to those planes).
Following the reviewers’ suggestion, we have totally rewritten the subsection “Recurrent network mechanism” (and the corresponding Methods) and most of the subsections “Testing the network mechanism” and “Comparison with a singlecell model”.
To help the reviewers navigate the changes, here we provide a summary of the new organization of the manuscript.
First two subsections of Results: we use population analysis of calcium activity identify three main features of OFF responses in the auditory cortex:
F1. OFF responses correspond to transiently amplified trajectories;
F2. responses to individual stimuli lie in lowdimensional subspaces;
F3. responses to different stimuli lie in largely orthogonal subspaces.
Subsection “Recurrent network mechanism”: we propose a recurrent network model f or OFF responses based on only two hypotheses:
H1. dynamics are driven exclusively by recurrent interactions (Eq. 1), and the only effect of each stimulus s is to set the initial state r_{0}^{s} that corresponds to activity at stimulus offset;
H2. the recurrent connectivity is of l owrank type, i.e. characterized by a set of connectivity vectors u^{r} and v_{r} (Eq. 1).
We then systematically derive a set of sufficient conditions on vectors u^{r}, v^{r} and r_{0}^{s} to reproduce the observed features F1F3. These sufficient conditions provide corresponding predictions P1P3 that can be tested on the data.
Subsection “Testing the network mechanism”: we systematically test on the data the hypotheses H1 and H2, as well as the predictions P1P3. The structure of this subsection now mirrors the structure of “Recurrent network mechanism”.
Subsection “Comparison with a singlecell model”: we first compare predictions of the fitted models on trialaveraged data; we then compare new predictions f or trialtotrial variability.
More specifically, to summarize our current understanding: under the assumption that the v's are orthonormal, the u's are orthogonal, and u_{k} dot v_{l} = 0 for all k \ne l (not strictly true, but close enough in high dimensions), the most general solution (a generalization of Eq. 34 to an arbitrary number of modes) is
r(t) = [uperp + sum_{k} u_{k} (a_{k} + v_{k} dot u_{k} t)] exp(t)
where uperp is perpendicular to all the u_{k} and the a_{k} are arbitrary.
This appears late in Methods. But it's important, because there seem to be hidden assumptions about uperp and the a_{k}. Namely,
– the a_{k} are all set to zero
– uperp = u_{k} for only one k, with k dependent on the stimulus.
Assuming this is correct, it's a critical part of the model. And it is, in fact, an extra assumption.
The new version of the text clarifies i n detail the questions raised by the reviewers. In short, we make no a priori assumptions on vectors u^{r}, v^{r} and r_{0}^{s} (the initial state) other than that the v's are orthonormal, the u's are orthogonal, but there is no loss of generality there (such u’s and v’s can be uniquely specified by an SVD of the connectivity matrix).
We then derive sufficient conditions on u+ and v^{r} (u_{k}, v_{k} in the reviewers' notation) and r_{0} to reproduce F1F3, and find i n particular:
– An upper bound on u^{r} dot v^{s} (always satisfied if u^{r} dot v_{s} = 0 for all r,s) to have transient amplification (F1)
– Conditions on r_{0} with respect to u^{r}, v^{r}, that correspond to conditions on uperp and a_{k} in the reviewers’ notation
We next fit the network model on the data (without specific constraints on u^{r}, v^{r} and r_{0}^{s}), and check whether the predicted constraints on u^{r}, v^{r} and r_{0}^{s} hold.
The methods now start with equations equivalent to the general solution put forward by the reviewers (Eq. 33 i n a unitrank example, and Eq. 35 for arbitrary rank).
Given that setup, we believe there were several predictions:
a. Lines 216228: There's a prediction about the correlational structure that we didn't fully understand. Equations are needed, in the main text, that make the predictions explicit.
The goal was simply to test H1, i.e. that the states of initial activity at stimulus offset determine the ensuing dynamics. We have rewritten that part of the text, and added a detailed section i n the Methods (Section 2.4, "Correlation between structure of initial states and structure of the dynamics").
b. Lines 229238: Correlations between the beginning and the peak of the off response should small. Again, equations are needed; this should be quantified.
In the new organization of the text, this prediction now appears later. We have now derived a quantitative prediction for the upper bound of the correlation between initial state and peak response (Methods Section 2.5, "Correlation between initial and peak state"), and added a panel (Figure 4G) demonstrating this prediction is quantitatively satisfied.
c. Lines 239253:
"To further examine to which extent the population dynamics during OFF responses could be accounted for by a nonnormal linear dynamical system, we next fitted our network model population OFF responses to all stimuli at once using reducedrank regression (Figure S1; see Methods). Qualitatively, we found that the fit reproduced well the lowdimensional population trajectories (Figure 4C)."
As far as we could tell, this just tells us that the fit is good; it doesn't tell us anything about whether the OFF responses could be accounted for by a nonnormal linear dynamical system. Is that correct, or did we miss something? If that is correct, it seems like this paragraph is mainly a distractor.
The aim of this paragraph is twofold: (i) test H1, i.e. show an autonomous dynamical system is a good model of the data, using the approach introduced in Elsayed and Cunningham (2017) (ii) determine the effective connectivity matrix, and extract vectors u^{r} and v^{r}, which are then used to test the predictions on the relationship between u^{r}, v^{r} and r_{0}^{s} (Figure 4 F and F). We have now rephrased that part (and reorganized the whole subsection).
d. Lines 254262: The fitted connectivity, J, had eigenvalues less than 1 and the symmetrized version had eigenvalues greater than 1. That's consistent with the low rank, nonnormal model. However, couldn't such a J could come from a high rank, nonnormal model? If so, it's not strong evidence for your model. This should be clarified.
The reviewers are correct, the examination of the eigenvalues tests only the predictions that ensue from H1 (autonomous dynamical system), not H2 (lowrank structure). So this test provides only evidence for H1, not H2, and we have clarified that i n the new version of the text. H2 and the ensuing predictions are tested separately.
e. Lines 263283: There were several results here, but it's not clear how much they supported the model. For example,
"We computed the overlap between the subspaces spanned by the transient channels for each pair of stimuli s_{1} and s_{2} , analogous to the overlap between the vectors u^{(s1)} and u^{(s2)} in Eq. (2) (see Methods). We found that the overlap between the transient channels matched the overlap between the response dynamics and population states at the end of stimulus presentation (Figure 4F, right panel), consistent with the interpretation that individual OFF responses may be generated through transient coding channels of the form given by Eq. (2)."
It's not clear what we should expect here. Is this strong or weak evidence for the model? You need to quantify what we should expect, and be clear whether this is only consistent with the model, or provides evidence for it.
Here we test the predictions that result from the requirement that the model reproduces F3 (orthogonal trajectories in response to different stimuli). The corresponding prediction has now been made explicit in the “Recurrent network mechanism” subsection. Specifically, we test Eqs. 34.
"While the fraction of variance explained when using the matrix J=Sum was necessarily lower than the one computed using J=Full , the model with connectivity J=Sum could still explain a consistent fraction of the total variance, and yielded values of R^{2} significantly higher than the ones obtained from the shuffles."
Again, it's not clear what we should expect. Is this strong or weak evidence for the model?
We have now rephrased this part. If the prediction (Eqs.34) i s exact, the explained variance should be the same i f the model was f it on the responses to all stimuli at once, or separately on responses to different stimuli. We find that the explained variance i s slightly lower i n the second case, but still much larger than chance (as could be expected, since the responses are orthogonal to many, but not all stimuli). Hence Eq.3 i s not exact, but a reasonable approximation. The strength of the evidence f or the model i s therefore directly quantified by the value of R^{2} when fitting the model separately to different stimuli.
f. Lines 328331: for the single cell model in Eq. 3, the correlations between the beginning and peak of the off response are high. This seems easily fixable with a very minor tweak. Suppose there are two kinds of neurons, some that peak early,
r_{i}(t) = a_{i}(s) f_{i}(t)
and some that peak late,
r_{i}(t) = b_{i}(s) g_{i}(t)
(for instance, f_{i}(t) = Theta(t) e^{t} and g_{i}(t) = Theta(t) t*e^{t}). Different stimuli (s) could activate different sets of neurons.
Alternatively, the single neuron responses could be of the form
r_{i}(t) = a_{i}(s) f_{i}(t) + b_{i}(s) g_{i}(t)
where the a's and b's are anti correlated (either a is large and b is small or vice versa) and f_{i} and g_{i} are as above.
Are these model ruled out by the data? If not, that seems to weaken the paper (although maybe there are other things wrong with these models). In either case, models like these should be discussed.
In the original manuscript, the correlations between initial and peak activity were assessed by assuming a specific form of the singleneuron response, without constraining i t by the data. Rather than explore this quantity for various assumptions on the singleneuron responses, we decided to first fit the singleneuron responses to data, and then compute the resulting correlations between initial and peak activity. Figure 5F shows that these correlations are much higher than on actual activity trajectories. This clearly demonstrates that the single model does not capture the decorrelation present in the data. (note that the second form suggested by the reviewers, r_{i}(t) = a_{i}(s) f_{i}(t) + b_{i}(s) g_{i}(t), is not consistent with our hypothesis that a singlecell response depends only on the activity of the cell at stimulus offset; it is instead equivalent to a unitrank network).
We moreover added a new analysis and figure (Figure 6) comparing the predictions of the recurrent and singlecell mechanisms for trialtotrial fluctuations. The results clearly indicate that the singlecell model is inconsistent with the data, and provide significant new evidence in favor of the network model.
g. Lines 332353:
"A second important difference between the two models is that in the singlecell model the temporal shape of the response of each neuron is the same across all stimuli, while for the recurrent model this is not necessarily the case."
Given this opening sentence, why not just show that for the recurrent network model, the temporal shape of the response is not the same across all stimuli? Instead, there's a lot of analysis that we didn't fully understand. In particular, there's a rather sudden switch from specifying the form of L_{i}(t) (Eq. 44) and fitting L_{i}(t) (Eq. 48). What's missing is a quantitative prediction of what we should expect, followed by a demonstration that the data is consistent with that prediction.
We thank the reviewers for this important comment. In the new version of the manuscript, we have totally removed the part where we assumed a specific form of singleneuron responses. We now fit i t to the data right away.
Following the reviewers’ suggestion, we also show that in the network model the temporal shape of the response is not the same across stimuli (Figure 3E).
In addition, we have some specific suggestions.
1. Lines 174177,
"We focus on two outstanding properties identified in the OFF responses to different stimuli. The first one is the strong amplification of OFF responses, as quantified by the transient increase of the distance from baseline r(t). The second important feature is the lowdimensionality of OFF response trajectories and their global structure across different stimuli."
It seems that this is all you're going to focus on, but it's not. It would be nice to know, at some point early in the manuscript, exactly what you're going to do.
As mentioned above, we have now completely reorganized the manuscript to make clear we focus on three features identified i n the data (F1F3 listed above). These features are now clearly listed in the abstract, introduction and provide the guiding line throughout the results.
2. What's the difference between Figures 2D and 4A (left panel).
Figure 2D displays the correlation matrix between the subspaces explored during the OFF responses to different stimuli. Figure 4A displays the correlation matrix of the initial states (=activity at stimulus offset, not included in the OFF responses). We have now clarified this i n the text and figure legends.
3. Lines 32931,
"The singlecell model predicts that these two states are only weakly decorrelated and essentially lie along a single dimension (Figure 5E and S7), which is inconsistent with experimental observations"
Which experimental observations?
We have rephrased this sentence. As mentioned above, in the new version of the manuscript, we don't fix the shape of the singlecell responses, but instead fit them directly to the data. We compute the correlation between initial state and peak state from the fitted responses when fitting the singlecell model to a progressively increasing number of stimuli. We found that the value of the correlation predicted by the singlecell model deviated from the experimental value as more stimuli were included in the fit, in contrast to the recurrent model, which predicted a value of the correlation consistent with data for any number of fitted stimuli (Figure 5F).
4. Throughout the paper we found it unclear how initial conditions were chosen for the model, e.g. in Figure 3. From line 167, we are led to think that they are picked from the actual data, "we model the firing activity at the end of stimulus presentation as the initial condition of the network dynamics"  since the model doesn't explicitly models the stimulus period as far as we understand (?), we infer this activity is taken from the data? But does the model even have the same number of neurons?
Figure 3 is only an illustration of the model (with rank=1), where the initial conditions r_{0} are picked to satisfy the conditions derived on the relationship between u_{r}, v_{r} and r_{0} to reproduce F1F3. Once we turn to testing the predictions on the data, the initial conditions are taken from the data: for each stimulus they correspond to the calcium activity recorded at stimulus offset, corresponding to 50ms before the termination of the stimulus. Prior to fitting the model, we applied dimensionality reduction (PCA) to the data to avoid overfitting, so that the dimensionality of the fitted model i s equal to the number of principal components. This has been specified i n Section 1.7 of the Methods "Linear dynamical system fit".
5. The purpose of Figure 2 is to convey that different stimuli evoke orthogonal OFF responses, and these OFF responses are low dimensional. By my calculation, for each conditional mean (over 20 trials), there are 400ms * 31.5 frame/s ~= 13 samples used to estimate the PCs and variance explained resulting in a maximum dimensionality (PCs to get to 80%) of 13. If this is the case, a more careful comparison to the crosscondition dimensionality is warranted, which accounts for sample size.
We thank the reviewers for pointing this out. We have now added two types of controls for the principal component analysis, which aim at controlling respectively for sample size and trialtotrial response variability. To control f or sample size we performed PCA on the trialaveraged responses to individual stimuli using the raw data (13 time points). Next we performed PCA on the responses where we shuffled cell labels independently for each time point, thus preserving the firing rate statistics at each time point, but removing temporal correlations i n the response. We found that the dimensionality of the shuffled response was higher than the dimensionality of the real responses, indicating that the value of the dimensionality obtained for individual stimuli is not the direct consequence of sample size. These results are shown in Figure S1 A.
To account for trialtotrial variability i n the response we performed a second control which uses crossvalidated PCA. This approach (Stringer et al. 2018) provides an unbiased estimate of the signal variance by computing the covariance between a training and a test set of responses (e.g. two different trials) to an identical collection of stimuli. We show that the same number of dimensions obtained with ordinary PCA explains 80% of variance when the trialtotrial variability of the responses to a fixed stimulus i s taken into account. We describe the details of this procedure in the Methods, Section 1.4 "Crossvalidated PCA", and i n Figure S1 BC.
Also, we would expect there to be fewer time points on the traces in Figure 2B. How were the trialaveraged responses interpolated?
Indeed, the time points were interpolated using a Gaussian filter with standard deviation equal to 32ms. This is specified in Section 1.1 "Neural recordings".
What steps were taken to insure that the same unit wasn't double counted during pooling across sessions. Mistakes in this process can contribute to inflated variance along primary PC's creating the illusion of low dimensionality.
Recordings in different sessions were performed at different depths, typically with a 50 micron difference (never less than 20 microns). Since the soma diameter is 15 microns, this ensured different cells were recorded in different sessions. We now specify this in the Methods (Section 1.1).
6. Statements of relative degree of subspace overlap should be verified with a hypothesis test. For data, this can be done by resampling neurons.
We thank the reviewers for pointing this out. We performed two types of statistical tests for assessing the significance of the orthogonality between responses to different stimuli, reflected in l ow values of the subspace overlap (Figure 2D). The two controls correspond to two different null hypotheses.
The first hypothesis is that low values of the subspace overlap may be due to the high dimensionality of the statespace in which they are embedded (e.g. the correlation between any pair of random vectors in high dimension is likely to be low). To test for this hypothesis we compared the subspace overlap computed f or each pair of stimulis 1 and s2 on the real trialaveraged responses with the subspace overlap computed on the responses where the stimulus labels s1 and s2 had been shuffled across trials, and then averaged over trials. This yielded pairs of responses that are embedded in the same dimensions of the real responses but are otherwise random. We found that f or most of the stimuli the subspace overlap was significantly lower for the real responses than f or the shuffled responses, indicating that l ow values of the subspace overlap were not a direct consequence of the high dimensionality of the state space.
The second null hypothesis is that l ow values of the subspace overlap might be an artifact of the trialtotrial variability present in the calcium activity data. To test for this possibility, for each pair of stimuli we computed the values of the subspace overlap by computing the trialaveraged activity on only half of the trials (10 trials), subsampling the set of 10 trials multiple times for each pair of stimuli. We then compared these values with the values of the subspace overlap between the trialaveraged responses to the same stimulus, but averaged over two different sets of 10 trials each, over multiple permutations of the trials (if the trialtotrial variability was l ow, these values should be close to 1 since we compute it on a single stimulus, but should decrease as a function of the variability). We found that for most of the stimuli the subspace overlap was significantly lower for the real responses than f or the controls, indicating that l ow values of the subspace overlap were not a direct consequence of the trialtotrial response variability.
The same type of controls have been applied to test f or the decorrelation between the initial state and peak state (see Figure 4G). We describe the controls in detail in the Methods, Section 1.6 "Controls for subspace overlaps and initial statepeak correlations", while the results are reported in Figure S7S8.
7. You should justify why LDA (naive covariance assumption) is more suitable for assessing linear separability in Figure 1C than an SVM. Absent justification, an SVM should be used to get the clearest assessment of linear separability.
We do not make a strong statement on linear separability in Figure 1C. LDA assuming naive covariance was used simply for convenience, and for consistency with previous studies that did not quantify the impact of noise correlations on decoding (Mazor and Laurent 2005, Saha et al. 2014).
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:
The good news is that the paper is much improved. The writing is more clear, and the analysis of dimensions of amplified trialtotrial variability is compelling. The less good news is that now that we understand it better, additional issues have come up. All can, we believe, be dealt with relatively easily.
We thank the reviewers for detailed and constructive feedback. We have significantly revised the manuscript to address the reviewers’ comments.
A major comment was that the paper lacked a clear take home message. Our main message is in fact very simple: A1 OFF responses across stimuli are better described by a network mechanism than by a singlecell mechanism. This is an important message, because the whole OFFresponse literature is based on singlecell mechanisms (see KoppScheinpflug, Sinclair and Linden, Trends in Neuroscience 2018), and a network model has been lacking.
The reviewers’ feedback however made us realize that this message was not communicated in a sufficiently clear manner. We have therefore thoroughly reorganized the manuscript to make this message more prominent.
The new organisation of the paper is the following:
1. The first part is unchanged: we start with modelfree analyses of auditory cortex data.
2. We now first discuss the singlecell model, and show that it describes well responses to a single stimulus, but not responses across stimuli.
3. We then introduce the network model, and show that it accounts better for the responses across stimuli than the singlecell model.
4. We next analyze the structure of the network model, which leads to additional predictions we verify on the data. A subsection of this part has also been substantially revised following the reviewers’ feedback (see below).
As a consequence of this reorganization, the figures have also been thoroughly rearranged, and several panels have been added:
– Figure 1: unchanged.
– Figure 2: panel E (previously in former Figure 4) has been added.
– Figure 3 contains the fit of the singlecell and network models (previously in former Figure 5 and Figure 4 respectively). We added one more example neuron to panel E. We added panel G (distance from baseline for the fitted network model) and panel I (subspace overlaps for the fitted network model).
– Figure 4: the spectra of the fitted connectivity matrix J and of its symmetric part J_{S} have been recomputed using ridge regression (without lowrank constraints), and extending the temporal window of the OFF response from 350 to 600 ms to get a stable connectivity matrix. To evaluate uncertainty in the connectivity spectra, we now compute the spectra over multiple neuron subsamplings.
– Figure 5 (former Figure 3) has been updated, and now illustrates the dynamics of orthogonal rank2 rotational channels.
– Figure 6 illustrates the structure of the connectivity fitted to individual stimuli, and the predictions ensuing from this structure. We added panel A showing the goodness of fit computed using reducedrank regression normalized by its value computed when using ordinary least squares regression. We added panel C(left) (condition for amplification for orthogonal rotational channels) and removed former Figure 4F(right). Former Figure 5F has been moved to Figure 6E.
– Figure 7 contains the results shown in former Figure 4H. Panel B has been added (correlation between subspace overlap and overlap between transient channels).
– Figure 8 (former Figure 6): the panels have been reorganized: following the structure of the main text, the data panels are now shown before the model panels.
– Former Figure 4C has been removed and is now shown in Figure 3Supplementary3.
– Figure 3Supplementary1 has been updated to include up to 16 stimuli.
Below we reply to the details of the reviewers’ comments.
Let's start with the data you're are trying to explain. Basically, after the offset of an auditory stimulus, there is transient amplification, with different stimuli activating nearly orthogonal subspaces.
To model this, low rank linear dynamics is used, of the form
dr/dt = sum_{k=1}^R u_{k} v_{k} r – r,
where r (=r1, r2,.…, rN) is the trial averaged activity. The network had a few additional properties,
a. The rank, R, is much less than the number of neurons, N.
b. The u's and v's are nearly orthogonal.
c. The symmetric part of the connectivity matrix must have eigenvalues with real part >1.
d. The model has no external input, so the response to a particular stimulus (after offset) depends only on the value of r immediately after stimulus offset, here denoted r0(s) where s is the stimulus.
e. For any stimulus, s, r0(s) has appreciable overlap with a small number of v_{k} (on the order of 5).
This model did an OK job explaining the data. In particular, Figure 4H, left (the model), is supposed to match Figure 2D (the data). But the match is not so good: the offdiagonal channel overlap in Figure 2D is near zero, whereas it's around 0.5 in Figure 4H.
We thank the reviewers for pointing out the inconsistency between the former Figure 4H and Figure 2D. We have traced it back to a mismatch between preprocessing parameters, and updated former Figure 4H (now Figure 7A) so that it now matches much better Figure 2D.
As indicated in the Methods (sections "Fitting the network model" and "Analysis of the transient channels"), before fitting the network model, we denoise the data using a PCA. In the original Figure 4H, the total number of dimensions (PC dimensionality) after denoising was set to D=50, which was too low compared to the dimensionality of responses to individual stimuli, and introduced correlations between the fitted lowrank components. Increasing this preprocessing parameter to D=100 and setting the rank parameter to R = 5 instead leads to a high match between Figure 2D and Figure 4H (new Figure 7C, coeff. of determination >0.7).
And the full model explains only about half the variance (Figure 4H, Right).
Our main argument for the network model is not based on the absolute value of explained variance, but on the fact that the explained variance does not increase, in contrast to the singlecell model.
The total variance explained by our model is consistent with previous reports from the literature that used similar methods. Fitting a linear dynamical system to motor cortical population activity yielded R^{2} values between 0.6 and 0.8 (see Elsayed et al. 2017, Lara et al. 2018). In our case, for D=100, R=5 and λ=1 for individual stimuli, we found that the crossvalidated (10fold) goodness of fit, as quantified by the coefficient of determination R^{2}, was equal to 0.65. Note also that we used calcium recordings, which are significantly more noisy than electrophysiological recordings used in Elsayed et al. 2017, Lara et al. 2018.
In addition, you claim that the dynamics is autonomous. The test for this is described on lines 2519:
"We first tested the starting assumption that the OFF response trajectories are consistent with linear, autonomous network dynamics driven only by the initial pattern of activity and network interactions. If that is the case, the OFF response dynamics are fully determined by the initial activity at stimulus offset (t = 0). This assumption implies that the geometric structure of the initial activity states r_{0}^{(s)} fully predicts the geometric structure of the dynamics r^{(s)}(t) during the OFF responses (see Materials and methods). In particular, two stimuli s_{1} and s_{2} corresponding to correlated initial states r_{0}^{(s1)} and r_{0}^{(s2)} will lead to correlated trajectories r^{(s1)}(t) and r^{(s2)}(t) in the model. Conversely, if two OFFresponse trajectories r^{(s1)}(t) and r^{(s2)}(t) are orthogonal, the initial states r_{0}^{(s1)} and r_{0}^{(s}_{2}^{)} are necessarily orthogonal."
We're pretty sure that isn't correct: correlations are preserved only if the dynamics causes pure rotation at constant angular speed, something that doesn't happen in general, even for linear dynamics.
We apologize, this paragraph appears to have caused a lot of confusion, and we have now removed it. Our goal was in fact to simply show that the OFF responses depend on the initial state of the network at the end of stimulus presentation. This is consistent with a dynamical mechanism, as has been argued previously in the motor cortex (Churchland et al. 2012), but does not exclude a contribution from external inputs. Since this aspect is consistent with both singlecell and network models, we have now moved the previous Figure 4A to modelfree analysis in Figure 2.
We did not intend to claim to demonstrate that the OFFresponse dynamics in the auditory cortex are autonomous. In fact the word “autonomous” was used only twice in the Results (and only in the paragraph copied above), and we have now completely removed it, as technically the singlecell model we use is not an autonomous dynamical system.
A paragraph in the Discussion (“A major assumption of our model…”) clarifies that we cannot exclude contributions from external inputs during the OFF responses, and do not claim the dynamics are purely autonomous.
The reviewers are absolutely correct that linear dynamics do not in general preserve correlations – additional constraints are needed. The relationship between initial conditions and subsequent responses found in the data is nontrivial. We have now added a plot showing that the fitted network model accounts for this observation (new Figure 3I).
And finally, Figure 4E, right, shows that many u's and v's are not orthogonal  judging from that figure, on the order of 1020%. These terms were of the form
u1 v2 v1 – u2 v1 v2
where v1 and v2 are nearly orthogonal. You claim that terms like this can't lead to transient amplification. That's true if u_1=u_2. But if u_1 and u_2 are sufficiently different, it's not true. But maybe u_1 \approx u_2 in the data?If so, that should be clear. In any case, this seems like a detail  it's certainly possible to do the analysis with these terms included.
The reviewers are absolutely correct. We have now thoroughly reorganized this part of the results and the corresponding methods. Specifically:
– We do not assume anymore orthogonality between the u’s and v’s.
– Instead, based on the properties of the fitted model, we now test the hypothesis that the network can be approximated as a sum of orthogonal rank2 channels of the form u1 v2 v1 – u2 v1 v2
– We now describe under which conditions such channels lead to amplified dynamics (new Figure 5). In particular, as pointed out by the reviewers, a key condition is  u1u2 >2.
– We then test those conditions in the data, in particular in the new panel Figure 5C.
The lack of orthogonality between the u's and v's doesn't seem so serious. But the lack of agreement between Figures 2D and 4H, combined with no strong evidence for autonomous dynamics, seems potentially problematic. Of course, no model ever fits perfectly. But it's hard to figure out exactly what the takehome message is. If there was strong evidence that the dynamics is autonomous, that would help  at least we would know that the intrinsic dynamics is responsible for the transient amplification.
As pointed out at the beginning of the reply, our claim is not that the dynamics are autonomous. Our main message is that A1 OFF responses across stimuli are better described by a network mechanism than by a singlecell mechanism.
The manuscript has been thoroughly reorganized to better communicate this message.
As a potentially important aside: the eigenvalue spectrum of Figure 4D shows one marginally stable, nonoscillatory mode. Based on past experience of one of the reviewers, it may be that the lack of agreement between Figures 2D and 4H is because of this mode. The reasoning is that if this slow mode is present by a small amount in all initial conditions, it will generate the fairly weak overlap in Figure 2D, but because it sticks, it might end up dominating the late offresponses for each stimulus after all the rest has decayed away. If that's the case, then this mode will be part of the principal subspace extracted for each stimulus. The measure of overlap based on "subspace angle" is very sensitive to this, which could explain the large overlaps in Figure 4H. Importantly, that marginally stable eigenvalue may be an artifact of insufficient regularization. There are a lot of "ifs" in the above reasoning, but if this mode can be eliminated, perhaps by different regularization, agreement may improve. We don't know if that's possible, but it's worth thinking about.
We thank the reviewer suggesting this possibility. As explained above, the disagreement between former Figure 2D and former Figure 4H (new Figure 7A) was due to an incorrect setting in the preprocessing. Changing the PC dimensionality from 50 (as in the previous version of the manuscript) to 100 (new version) fixed the disagreement.
As noted by the reviewers, there was indeed a marginally stable eigenvalue in the full spectrum of J. We traced this back to the fact that, for the longest stimuli, the intertrial interval is too short, and the OFF responses do not have enough time to decay to baseline. Excluding the longest stimuli, and extending the analyzed window eliminates the marginal eigenvalues (updated Figure 4).
There is a lot of very nice analysis in the paper, and all reviewers agree that it should be published. However, it needs a clear takehome message. Exactly what that is depends, we believe, on how strongly the evidence is for autonomous dynamics. If that's very strong, then at the very least the takehome message is:
"Off cells show transient amplification that is generated by internal dynamics, not external input. That transient dynamics is mildly consistent with low rank connectivity in which dynamics can mainly be reduced to multiple 2D systems. However, we can't rule out more complex behavior, with transient amplification fundamentally involving several dimensions."
Possibly you can come up with a stronger conclusion, but that's the best we could come up with.
If the evidence for autonomous dynamics is weak, that seems to be a fundamental problem, since the transient amplification may be caused solely by external input. But presumably that can never be ruled out  a fact that you should probably mention. It's not a plausible scenario, so we don't think it will weaken the paper..
As pointed out at the beginning of the reply, our claim is not that the dynamics are autonomous. While we show evidence that initial conditions play a role in the subsequent dynamics, we cannot exclude that external inputs are present. This is explicitly acknowledged in the Discussion.
Our main message is instead that A1 OFF responses across stimuli are better described by a network mechanism than by a singlecell mechanism. As summarized at the top of the response, we have thoroughly reorganized the manuscript to better communicate this message.
Our main comment, then, is to explore more carefully whether or not dynamics is autonomous, and/or be more careful about the conclusions. One possible way to test for autonomous dynamics is to plot the trajectories and look for intersections, or near intersections  what Mark Churchland calls tangled trajectories, for which quantification is possible. Absence of near intersections would at least be consistent with autonomous dynamics. But there may be better methods. For instance, you might try LFADS: you can fit both an RNN and a linear dynamical model, and see which one fits better. That will, though, be a huge amount of work, and you should do it only if you want to.
We thank the reviewers for this suggestion. We have however decided not to follow this route, because our main claim is not that the dynamics are purely autonomous, as explained above.
In addition, we suggest exploring a richer singleneuron model. The current model has the form
r_{i}^{s}(t) = r_{0i}^s L_{i}(t).
If r_{i} is nonnegative, this can be written
dr_{i}/dt = f_{i}(t) r_{i},
with r_{i}(t=0) a function of the stimulus. It would be more convincing to consider a model with a nonlinearity; say
dr_{i}/dt = f_{i}(t) g_{i}(r_{i}).
The nonlinearity g_{i} could be parameterized with, probably, only a handful of parameters. For instance, it could be written as a sum of basis functions, much the way L_{i} was parameterised. The singlecell model was rejected because it explained only about 20% of the variance, versus 50% for the recurrent network. It would be reassuring if adding a nonlinearity didn't bring this up to 50%. Again, this also is only a suggestion. However, if you stick with the current single cell model, you'll need to frame your comparison less generally than "recurrent versus single cell".
We thank the reviewers for this suggestion. We now explicitly frame our results as comparing linear singlecell and network models (Abstract).
Note that the singlecell model captures about 75% of the variance when fitted to responses to a single stimulus. It is therefore clearly expressive enough to capture the dynamics seen in the data. What it does not capture is dynamics across stimuli, and it is not clear to us how a nonlinearity could fix this, while linear network interactions do. The effects of a nonlinearity would need to be included both in the singlecell and network models, and this goes beyond the scope of the present study.
https://doi.org/10.7554/eLife.53151.sa2Article and author information
Author details
Funding
Agence Nationale de la Recherche (ANR16CE370016)
 Giulio Bondanelli
 Srdjan Ostojic
Agence Nationale de la Recherche (ANR17EURE0017)
 Giulio Bondanelli
 Srdjan Ostojic
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Senior Editor
 Barbara G ShinnCunningham, Carnegie Mellon University, United States
Reviewing Editor
 Peter Latham, University College London, United Kingdom
Reviewer
 Guillaume Hennequin, University of Cambridge, United Kingdom
Publication history
 Received: October 30, 2019
 Accepted: March 19, 2021
 Accepted Manuscript published: March 24, 2021 (version 1)
 Version of Record published: April 20, 2021 (version 2)
Copyright
© 2021, Bondanelli et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,721
 Page views

 267
 Downloads

 5
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our opensource software (Neurodata Without Borders, NWB) defines and modularizes the interdependent, yet separable, components of a data language. We demonstrate NWB’s impact through unified description of neurophysiology data across diverse modalities and species. NWB exists in an ecosystem, which includes data management, analysis, visualization, and archive tools. Thus, the NWB data language enables reproduction, interchange, and reuse of diverse neurophysiology data. More broadly, the design principles of NWB are generally applicable to enhance discovery across biology through data FAIRness.

 Neuroscience
Many species of animals exhibit an intuitive sense of number, suggesting a fundamental neural mechanism for representing numerosity in a visual scene. Recent empirical studies demonstrate that early feedforward visual responses are sensitive to numerosity of a dot array but substantially less so to continuous dimensions orthogonal to numerosity, such as size and spacing of the dots. However, the mechanisms that extract numerosity are unknown. Here we identified the core neurocomputational principles underlying these effects: (1) centersurround contrast filters; (2) at different spatial scales; with (3) divisive normalization across network units. In an untrained computational model, these principles eliminated sensitivity to size and spacing, making numerosity the main determinant of the neuronal response magnitude. Moreover, a model implementation of these principles explained both wellknown and relatively novel illusions of numerosity perception across space and time. This supports the conclusion that the neural structures and feedforward processes that encode numerosity naturally produce visual illusions of numerosity. Together, these results identify a set of neurocomputational properties that gives rise to the ubiquity of the number sense in the animal kingdom.