Attentional modulation of neuronal variability in circuit models of cortex
Abstract
The circuit mechanisms behind shared neural variability (noise correlation) and its dependence on neural state are poorly understood. Visual attention is wellsuited to constrain cortical models of response variability because attention both increases firing rates and their stimulus sensitivity, as well as decreases noise correlations. We provide a novel analysis of population recordings in rhesus primate visual area V4 showing that a single biophysical mechanism may underlie these diverse neural correlates of attention. We explore model cortical networks where topdown mediated increases in excitability, distributed across excitatory and inhibitory targets, capture the key neuronal correlates of attention. Our models predict that topdown signals primarily affect inhibitory neurons, whereas excitatory neurons are more sensitive to stimulus specific bottomup inputs. Accounting for trial variability in models of state dependent modulation of neuronal activity is a critical step in building a mechanistic theory of neuronal cognition.
https://doi.org/10.7554/eLife.23978.001eLife digest
The world around us is complex and our brains need to navigate this complexity. We must focus on relevant inputs from our senses – such as the bus we need to catch – while ignoring distractions – such as the eyecatching displays in the shop windows we pass on the same street. Selective attention is a tool that enables us to filter complex sensory scenes and focus on whatever is most important at the time. But how does selective attention work?
Our sense of vision results from the activity of cells in a region of the brain called visual cortex. Paying attention to an object affects the activity of visual cortex in two ways. First, it causes the average activity of the brain cells in the visual cortex that respond to that object to increase. Second, it reduces spontaneous momenttomoment fluctuations in the activity of those brain cells, known as noise. Both of these effects make it easier for the brain to process the object in question.
Kanashiro et al. set out to build a mathematical model of visual cortex that captures these two components of selective attention. The cortex contains two types of brain cells: excitatory neurons, which activate other cells, and inhibitory neurons, which suppress other cells. Experiments suggest that excitatory neurons contribute to the flow of activity within the cortex, whereas inhibitory neurons help cancel out noise. The new mathematical model predicts that paying attention affects inhibitory neurons far more than excitatory ones. According to the model, selective attention works mainly by reducing the noise that would otherwise distort the activity of visual cortex.
The next step is to test this prediction directly. This will require measuring the activity of the inhibitory neurons in an animal performing a selective attention task. Such experiments, which should be achievable using existing technologies, will allow scientists to confirm or disprove the current model, and to dissect the mechanisms that underlie visual attention.
https://doi.org/10.7554/eLife.23978.002Introduction
The behavioral state of the brain exerts a powerful influence on the cortical responses. For example, electrophysiological recordings from both rodents and primates show that the level of wakefulness (Steriade et al., 1993), active sensory exploration (Crochet et al., 2011), and attentional focus (Treue, 2001; Reynolds and Chelazzi, 2004; Gilbert and Sigman, 2007; Moore and Zirnsak, 2017) all modulate synaptic and spiking activity. Despite the diversity of behavioral contexts, in all of these cases an overall elevation and desynchronization of cortical activity accompanies heightened states of processing (Harris and Thiele, 2011). Exploration of the neuronal mechanisms that underly such state changes has primarily centered around how various neuromodulators shift the cellular and synaptic properties of cortical circuits (Hasselmo, 1995; Lee and Dan, 2012; Noudoost and Moore, 2011; Moore and Zirnsak, 2017) However, a coherent theory linking the modulation of cortical circuits to an active desynchronization of population activity is lacking. In this study we provide a circuitbased theory for the known attentionguided modulations of neuronal activity in the visual cortex of primates performing a stimulus change detection task.
The investigation of the neuronal correlates of attention has a rich history. Attention increases the firing rates of neurons engaged in feature and spatialbased processing tasks (McAdams and Maunsell, 2000; Reynolds et al., 1999). Attentional modulation of the stimulusresponse sensitivity (gain) of firing rates is more complicated, often depending on stimulus specifics such as the size and contrast of a visual image (Williford and Maunsell, 2006; Reynolds and Heeger, 2009; Sanayei et al., 2015). In recent years there has been increased focus on how brain states affect trialtotrial spiking variability (Crochet et al., 2011; Lin et al., 2015; Doiron et al., 2016; Stringer et al., 2016). In particular, attention decreases the shared variability (noise correlations) of the firing rates from pairs of neurons (Cohen and Maunsell, 2009; Mitchell et al., 2009; Cohen and Maunsell, 2011; Herrero et al., 2013; Ruff and Cohen, 2014; Engel et al., 2016). The combination of a reduction in noise correlations and an increase in response gain has potentially important functional consequences through an improved population code (Cohen and Maunsell, 2009; Rabinowitz et al., 2015). In total, there is an emerging picture of the impact of attention on the trialaveraged and trialvariable spiking dynamics of cortical populations.
Phenomenological models of attentional modulation have been popular (Reynolds and Heeger, 2009; Navalpakkam and Itti, 2005; Gilbert and Sigman, 2007; Ecker et al., 2016); however, such analyses cannot provide insight into the circuit mechanics of attentional modulation. Biophysical models of attention circuits are difficult to constrain, due in large part to the diversity of mechanisms which control the firing rate and response gain of neurons (Silver, 2010; Sutherland et al., 2009). Nonetheless, several circuit models for attentional modulation have been proposed (Ardid et al., 2007; Deco and Thiele, 2011; Buia and Tiesinga, 2008), but analysis has been mostly confined to trialaveraged responses. Taking inspiration from these studies, mechanistic models of attentional modulation can be broadly grouped along two hypotheses. First, the circuit mechanisms that control trialaveraged responses may be distinct from those that modulate neuronal variability. This hypothesis has support from experiments in primate V1 showing that NmethylDaspartate receptors have no impact on topdown attentional modulation of firing rates, yet have a strong influence of attentional control of noise correlations (Herrero et al., 2013). A second hypothesis is that the modulations of firing rates and noise correlations are reflections of a single biophysical mechanism. Support for this comes from pairs of V4 neurons that each show strong attentional modulation of firing rates, also show a strong attention mediated reductions in noise correlation (Cohen and Maunsell, 2011). In this study we provide novel analysis of the covariability of V4 population activity engaged in an attentionguided detection task (Cohen and Maunsell, 2009) that is consistent with the second hypothesis. Specifically, the modulation of spike count covariance between unattended and attended states has the same dimensionality as the firing rate modulation.
We use the results from our dimensionality analysis to show that an excitatoryinhibitory recurrent circuit model subject to global fluctuations is sufficient to capture both the increase in firing rate and response gain as well as populationwide decrease of noise correlations. Our model makes two predictions regarding neuronal modulation: (1) that attentional modulation favors inhibitory neurons, and (2) that stimulus drive favors excitatory neurons. Finally, we show that our model predicts increased informational content in the excitatory population, which would result in improved readout by potential downstream targets. In total, our study provides a simple, parsimonious, and biologically motivated model of attentional modulation in cortical networks.
Results
Attention decreases noise correlations primarily by decreasing covariance
Two rhesus monkeys (Macaca mulatta) with microelectrode arrays implanted bilaterally in V4 were trained in an orientation change detection task (Figure 1a; see Materials and methods: Data preparation). A display with oriented Gabor gratings on the left and right flashed on and off. The monkey was cued to attend to either the left or right grating before each block of trials, while keeping fixation on a point between the two gratings. After a random number of presentations, one of the gratings changed orientation. The monkey then had to saccade to that side to obtain a reward. The behavioral task and data collection have been previously reported (Cohen and Maunsell, 2009).
A neuron is considered to be in an 'attended state' when the attended stimulus is in the hemifield containing that neuron’s receptive field (contralateral hemifield), and in an 'unattended state' when it is in the other (ipsilateral) hemifield. The trialaveraged firing rates from both attended and unattended neurons displayed a brief transient rise ($\sim $100 ms after stimulus onset), and eventually settled to an elevated sustained rate before the trial concluded (Figure 1b). During the sustained period the mean firing rate of attended neurons ($22.0$ sp/s) was greater than that of unattended neurons ($20.6$ sp/s) ($t$ test, $P\text{}\text{}{10}^{5}$).
A major finding of Cohen and Maunsell (2009) was that the pairwise trialtotrial noise correlations of the neuronal responses decreased with attention (Figure 1c, left, mean unattended 0.065, mean attended 0.045, $t$ test, $P\text{}\text{}{10}^{5}$). The noise correlation between neurons $i$ and $j$ is a normalized measure, $\rho}_{ij}=\mathrm{C}\mathrm{o}\mathrm{v}({n}_{i},{n}_{j})/\sqrt{\mathrm{V}\mathrm{a}\mathrm{r}({n}_{i})\mathrm{V}\mathrm{a}\mathrm{r}({n}_{j})$, where Cov and Var denote spike count covariance and variance respectively. Both spike count variance and covariance significantly change with attention (${\u27e8{\text{Var}}^{U}\u27e9}_{\text{trials}}=5.02{\text{spikes}}^{2}$, ${\u27e8{\text{Var}}^{A}\u27e9}_{\text{trials}}=5.10{\text{spikes}}^{2}$, $t$ test, $P\text{}\text{}{10}^{3}$, ${\u27e8{\text{Cov}}^{A}\u27e9}_{\text{trials}}=0.252{\text{spikes}}^{2}$, $t$ test, $P\text{}\text{}{10}^{5}$), but the decrease in covariance ($34.0\%$) is much more pronounced than the increase in variance ($1.61\%$; Figure 1c, middle and right). We therefore conclude that the attention mediated decrease in noise correlation is primarily due to decreased covariance.
To further validate this observation, we consider the distributions of pairwise changes in covariance (black) and variance (gray) with attention over the entire data set (Figure 1d). Covariance and variance are normalized by their respective maximal unattended or attended values (see Methods: Comparing change in covariance to change in variance). The change in covariance with attention is concentrated below zero with a large spread, whereas the change in variance is centered on zero with a narrower spread. Taken together these results suggest that to understand the mechanism by which noise correlations decrease it is necessary and sufficient to understand how spike count covariance decreases with attention.
Attention is a lowrank modulation of noise covariance
A reasonable simplification of V4 neurons is that they receive a bottomup stimulus alongside an attentionmediated topdown modulatory input. However, to properly model topdown attention we need to first understand the dimension of attentional modulation on the V4 circuit as a whole. Let $A}_{\varphi}:{\varphi}^{U}\text{}\mapsto \text{}{\varphi}^{A$ denote the attentional modulation of measure $\varphi $ from its value in the unattended state, ${\varphi}^{U}$, to its value in the attended state, ${\varphi}^{A}$. For example, the firing rate modulation ${A}_{r}$ can be written as ${\mathbf{r}}^{A}={A}_{r}\circ {\mathbf{r}}^{\mathbf{\mathbf{U}}}$, where ${\mathbf{r}}^{A}$ is an $N\times 1$ vector of neural firing rates in the attended state, ${\mathbf{r}}^{U}$ denotes the firing rate vector in the unattended state, ${A}_{r}$ is a vector the same size as $\mathbf{r}$, and $\circ $ denotes elementwise multiplication. In this case, the entries ${a}_{i}$ of ${A}_{r}$ are the ratios of the firing rates: ${a}_{i}={r}_{i}^{A}/{r}_{i}^{U}$ (Figure 2a).
A less trivial aspect of attentional modulation is the modulation of covariance matrices:
Here ${\mathbf{\mathbf{C}}}^{\mathbf{\mathbf{A}}}$ is the attended spike count covariance matrix, ${\mathbf{\mathbf{C}}}^{\mathbf{\mathbf{U}}}$ the unattended spike count covariance matrix, and ${A}_{C}$ is a matrix the same size as ${\mathbf{\mathbf{C}}}^{\mathbf{\mathbf{U}}}$, consisting of entries ${g}_{ij}$, which we will call covariance gains. Unlike firing rates, the transformation matrix ${A}_{C}$ can be of varying rank. On the one hand ${A}_{C}$ could be constructed from the ratios of the individual elements: ${g}_{ij}={c}_{ij}^{A}/{c}_{ij}^{U}$, with each pair of neurons $(i,j)$ receiving an individualized attentional modulation ${g}_{ij}$ of their shared variability (Figure 2b, left). Under this modulation ${A}_{C}$ is a rank $N$ matrix. A rank $N\text{}{A}_{C}$ will always perfectly (and trivially) capture the matrix mapping in Equation (1). However, it is difficult to conceive of a topdown circuit mechanism that would allow attention to modulate each pair individually. On the other hand, ${g}_{ij}$ could depend not on the specific pair $(i,j)$, but on the individual neurons of the pairing: ${g}_{ij}={g}_{i}{g}_{j}$ (Figure 2b, right). In this case, only $N$ values are needed to characterize $A}_{C}:{A}_{C}={\mathbf{g}\mathbf{g}}^{T$, where $\mathbf{\mathbf{g}}$ is a $N\times 1$ column vector, meaning ${A}_{C}$ has rank of $1$. This is a more parsimonious and biophysically plausible scenario for attentional modulation, since in this case the covariance gain ${g}_{ij}$ of neurons $i$ and $j$ is simply emergent from the attentional modulation of the individual neurons. To test whether ${A}_{C}$ is low rank we analyzed the V4 population recordings during the visual attention task (Figure 1), specifically measuring ${A}_{C}$ under the assumption that ${A}_{C}$ is rank 1:
Equation (2) is a system of $N(N1)/2$ equations of the form ${c}_{ij}^{A}={g}_{i}{g}_{j}{c}_{ij}^{U}$ in $N$ unknowns $\mathbf{\mathbf{g}}={[{g}_{1},\mathrm{\dots}{g}_{N}]}^{T}$ (we only consider $i\ne j$ to exclude variance modulation from our analysis). For $N>3$ this is an overdetermined system, and we solve for $\mathbf{\mathbf{g}}$ using a nonlinear equation solver. Let $\widehat{\mathbf{\mathbf{g}}}$ be the optimal solution obtained by the solver (measured as a minimization of the ${L}^{2}$norm of the error; see Methods: objfxn). Then ${\widehat{C}}^{A}:=\widehat{\mathbf{\mathbf{g}}}{\widehat{\mathbf{\mathbf{g}}}}^{T}\circ {C}^{U}$ provides an approximation to the attended covariance matrix. In an example data set from a single recording session with $N=39$ units, the correlation coefficient $\rho $ of the actual attended covariance values from ${\mathbf{\mathbf{C}}}^{\mathbf{\mathbf{A}}}$ versus the approximated attended covariance values from ${\widehat{C}}^{A}$ was $0.77$ (Figure 2c). A shuffled ${\mathbf{\mathbf{C}}}^{\mathbf{\mathbf{A}}}$ matrix provides a reasonable null model, and the example data set produces the lower bound correlation ${\rho}_{\text{shuf}}=0.22$ (Figure 2d; see Materials and methods: Shuffled covariance matrices). Finally, a Poisson model that perfectly decomposes as Equation (2), yet sampled with the same number of trials as in the experiment, gives an upper bound for the rank one structure, the example data yields ${\rho}_{\mathrm{u}\mathrm{b}}=0.90$ (Figure 2e; see Materials and methods: Upper bound covariance matrices). In total, the combination of $\rho $, $\rho}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}$, and $\rho}_{\mathrm{u}\mathrm{b}$ (Figure 2f) suggests that the rank one model of attention modulation of covariance ${A}_{C}$ is well justified.
We applied this analysis to 21 recording sessions from the right hemisphere of one monkey (Figure 2g). For most of the recording sessions $\rho $ is closer to $\rho}_{\mathrm{u}\mathrm{b}$ than $\rho}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}$. The averaged performance of all sessions for both hemispheres of two monkeys generally agreed with this trend (Figure 2h). We normalized $\rho $ and $\rho}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}$ by $\rho}_{\mathrm{u}\mathrm{b}$ for each session to better compare different sessions that were subject to daytoday variations outside of the experimenter’s control, such as the task performance or the internal state of the monkey. To further validate our model we show the distribution of ${g}_{i}$s computed from the entire data set (Figure 3a). The majority of ${g}_{i}$ values are less than one, consistent with $\u27e8{{\textstyle \text{Cov}}}^{A}{\u27e9}_{{\textstyle \text{trials}}}\text{}\text{}\u27e8{{\textstyle \text{Cov}}}^{U}{\u27e9}_{{\textstyle \text{trials}}}$ (Figure 1c). Further, there was little relation between the attentional modulation of firing rates, measured by ${r}_{i}^{A}/{r}_{i}^{U}$, and the attentional modulation of covariance through ${g}_{i}$ (Figure 3b). This indicates that the circuit modulation of firing rates and covariance are not trivially related to one another (Doiron et al., 2016).
We additionally tested the validity of our model in Equation (2) with a leaveoneout crossvalidation analysis (see Materials and methods: Leaveoneout crossvalidation). We accurately predicted an omitted covariance ${C}_{ij}^{A}$ (Figure 2i and j), consistent with our original analysis (Figure 2g and h). The individual sessionbysession performance values for both the standard and leaveoneout setups are provided (Appendix: Model performance for all monkeys and hemispheres).
Finally, we investigated to what extent the actual value of the covariance gain ${g}_{i}$ of neuron $i$ depends on the population of neurons in which it was computed. We solved the system of equations ${C}_{ij}^{A}={g}_{i}{g}_{j}{C}_{ij}^{U}$ using covariance matrices computed from recordings from distinct sets of neurons, overlapping only by neuron $i$. This gives two estimates of ${g}_{i}$, that nevertheless agreed largely with one another (Appendix: Lowdimensional modulation is intrinsic to neurons). This supported the hypothesis that covariance gain ${g}_{i}$ is an intrinsic property of neuron $i$.
The standard and crossvalidation tests verify that the lowrank model of attentional modulation defined in Equation (2) explains between $66$ and $82\%$ (standard), or $56$ and $77\%$ (crossvalidation) of the data. Taking this to be a positive result, we conclude that the covariance gain modulation depends largely on the modulation of individual neurons.
Network requirements for attentional modulation
Having described attentional modulation statistically our next goal is to develop a circuit model to understand the process mechanistically. Consider a network of $N$ coupled neurons, and let the spike count from neuron $i$ on a given trial be ${y}_{i}$. The network output has the covariance matrix $\mathbf{\mathbf{C}}$ with elements ${c}_{ij}=\text{Cov}({y}_{i},{y}_{j})$. In this section we identify the minimal circuit elements so that the attentional mapping $A}_{\mathbf{C}}:{\mathbf{C}}^{U}\text{}\mapsto \text{}{\mathbf{C}}^{A$ satisfies the following two conditions (on average):
C1: ${c}_{ij}^{A}={g}_{i}{g}_{j}{c}_{ij}^{U}$ ; attentional modulation of covariance is rank one (Figure 2).
C2: ${g}_{i}\text{}\text{}1$ ; spike count covariance decreases with attention (Figure 1).
What follows is only a sketch of our derivation (a complete treatment is given in Appendix: Network requirements for attentional modulation).
If inputs are weak then ${y}_{i}$ can be described by a linear perturbation about a background state (Ginzburg and Sompolinsky, 1994; Doiron et al., 2004; Trousdale et al., 2012):
Here ${y}_{iB}$ is the background activity of neuron $i$, ${J}_{ik}$ is the coupling strength from neuron $k$ to $i$, and ${L}_{i}$ is the inputtooutput gain of neuron $i$. In addition to internal coupling we assume a source of external fluctuations ${\xi}_{i}$ to neuron $i$. Here ${y}_{i}$, ${y}_{iB}$, and ${\xi}_{i}$ are random variables that vary across trials. The trialaveraged firing rate of neuron $i$ is ${r}_{i}=\u27e8{y}_{i}\u27e9/T$ (where $\u27e8\cdot \u27e9$ denotes averaging over trials of length $T$). The background state has variability ${b}_{i}=\mathrm{V}\mathrm{a}\mathrm{r}({y}_{iB})$ which we assume to be independent across neurons, meaning the background network covariance is $\mathbf{B}=\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}({b}_{i})$. Finally, the external fluctuations have covariance matrix $\mathbf{\mathbf{X}}$ with element ${x}_{ij}=\mathrm{C}\mathrm{o}\mathrm{v}({\xi}_{i},{\xi}_{j})$.
Motivated by our analysis of population recordings (Figure 2) we study attentional modulations that target individual neurons. This amounts to considering only $A}_{r}:{r}_{i}^{U}\text{}\mapsto \text{}{r}_{i}^{A$ and $A}_{L}:{L}_{i}^{U}\text{}\mapsto \text{}{L}_{i}^{A$. Additionally, we assume that any model of attentional modulation must result in $r}_{i}^{A}\text{}\text{}{r}_{i}^{U$ (Figure 1b). A widespread property of both cortical pyramidal cells and interneurons is that an increase of firing rate ${r}_{i}$ causes an increase of inputoutput gain $L$ (Cardin et al., 2007), thus we will also require $L}^{A}\text{}\text{}{L}^{U$.
Spiking covariability in recurrent networks can be due to internal interactions (through ${J}_{ik}$) or external fluctuations (through ${\xi}_{i}$), or both (Ocker et al., 2017). Networks with unstructured connectivity have internally generated covariability that vanishes as $N$ grows. This is true if the connectivity is sparse (van Vreeswijk and Sompolinsky, 1998), or dense having weak synapses where ${J}_{ik}\sim 1/N$ (Trousdale et al., 2012) or strong synapses where ${J}_{ik}\sim 1/\sqrt{N}$ combined with a balance between excitation and inhibition (Renart et al., 2010; Rosenbaum et al., 2017). In these cases spiking covariability requires external fluctuations to be applied and subsequently filtered by the network. We follow this second scenario and choose $\mathbf{\mathbf{X}}$ so as to provide external covariability to our network.
Recent analysis of cortical population recordings show that the shared spiking variability across the population can be well approximated by a rank one model of covariability (Kelly et al., 2010; Ecker et al., 2014; Lin et al., 2015; Ecker et al., 2016; Rabinowitz et al., 2015; Whiteway and Butts, 2017) (we remark that Rabinowitz et al., 2015 analyzed the same data set that we have in Figures 1 and 2). Thus motivated we take the external fluctuations $\mathbf{\mathbf{X}}$ to be rank one with ${x}_{ij}={x}_{i}{x}_{j}$, reflecting a single source of global external variability $\xi $ with unit variance (neuron $i$ receives ${\xi}_{i}={x}_{i}\xi $). Combining this assumption with the linear ansatz in Equation (3) yields:
where matrix $\mathbf{\mathbf{K}}$ has element ${K}_{ij}={L}_{i}{J}_{ij}$ and $\mathbf{L}=\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}({L}_{i})$. We have also defined the vectors $\mathbf{\mathbf{x}}={[{x}_{1},\mathrm{\dots},{x}_{N}]}^{T}$ and $\mathbf{\mathbf{c}}={[{c}_{1},\mathrm{\dots},{c}_{N}]}^{T}$ with ${c}_{i}={({(\mathbf{\mathbf{I}}\mathbf{\mathbf{K}})}^{1}\mathbf{\mathbf{L}\mathbf{x}})}_{i}$. In total, the output covariability $\mathbf{\mathbf{C}}$ will simply inherit the rank of the input covariability $\mathbf{\mathbf{X}}$. Attentional modulation affects ${c}_{i}$ through $\mathbf{\mathbf{K}}$ and $\mathbf{\mathbf{L}}$ and we easily satisfy condition $\mathbf{\mathbf{C}\U0001d7cf}$ with ${g}_{i}={c}_{i}^{A}/{c}_{i}^{U}$.
What remains is to find constraints on $\mathbf{\mathbf{J}}$ and the attentional modulation of $\mathbf{\mathbf{L}}$ that satisfy condition $\mathbf{\mathbf{C}\U0001d7d0}$. Let us consider the case where ${c}_{i}^{U},{c}_{i}^{A}\text{}\text{}0$ so that condition $\mathbf{\mathbf{C}\U0001d7d0}$ is satisfied when ${c}_{i}^{A}{c}_{i}^{U}\text{}\text{}0$. For the sake of mathematical simplicity let us separate the population into $qN$ excitatory neurons and $(1q)N$ inhibitory neurons ($0\text{}\text{}q\text{}\text{}1$). Let all excitatory (inhibitory) neurons project with synaptic strength ${J}_{E}$ (${J}_{I}$), have gain ${L}_{E}$ (${L}_{I}$), and receive the external inputs of strength ${x}_{E}$ (${x}_{I}$). Finally, let the probability for all connections be $p$, and consider only weak connections ($J\propto 1/N$ and $N$ large) so that we can ignore the influence of polysynaptic paths in the network (Pernice et al., 2011; Trousdale et al., 2012). Then the attentional modulation of an excitatory neuron decomposes into:
The first term is the direct transfer of the external fluctuations, and the second and third terms are indirect transfer of external fluctuations via the excitatory and inhibitory populations, respectively. Recall that ${L}^{A}{L}^{U}\text{}\text{}0$, meaning that for ${c}_{E}^{A}{c}_{E}^{U}\text{}\text{}0$ to be satisfied we require the third term to outweigh the combination of the first and second terms. In other words, the inhibitory population must experience a sizable attentional modulation. A similar cancelation of correlations by recurrent inhibition has been recently studied in a variety of cortical models (Renart et al., 2010; Tetzlaff et al., 2012; Ly et al., 2012; Doiron et al., 2016; Rosenbaum et al., 2017).
In the above we considered weak synaptic connections where ${J}_{ij}\sim 1/N$. Rather, if we scale ${J}_{ij}\sim 1/\sqrt{N}$, as would be the case for classical balanced networks (van Vreeswijk and Sompolinsky, 1998), then for very large $N$ the solution no longer depends upon the gain $L$. Finite $N$ or the inclusion of synaptic nonlinearities through short term plasticity (Mongillo et al., 2012) may be necessary to satisfy condition $\mathbf{\mathbf{C}\U0001d7d0}$ with large synapses. Furthermore, the large synaptic weights associated with ${J}_{ij}\sim 1/\sqrt{N}$ do not allows us to neglect polysynaptic paths, as is needed for Equation (5). Extending our analysis to networks with balanced scaling will be the focus of future work.
In summary our analysis has identified two circuit features that allow recurrent networks to capture conditions $\mathbf{\mathbf{C}\U0001d7cf}$ and $\mathbf{\mathbf{C}\U0001d7d0}$ for attentional modulation. First, the network must be subject to a global source of external fluctuations that dominates network covariability ($\mathbf{\mathbf{C}\U0001d7cf}$). Second, the network must have recurrent inhibitory connections that are subject to a large attentional modulation ($\mathbf{\mathbf{C}\U0001d7d0}$).
Mean field model of attention
We next apply the intuition gained in the preceding section to propose a cortical model that captures key neural correlates of attentional modulation. We model V4 as a recurrently coupled network of excitatory and inhibitory leaky integrateandfire model neurons (Tetzlaff et al., 2012; Ledoux and Brunel, 2011; Trousdale et al., 2012; Doiron et al., 2004) (Figure 4a). In addition to recurrent synaptic inputs, each neuron receives private and global sources of external fluctuating input (Figure 4b). The global noise is an attentionindependent source of input correlation that the network filters and transforms into networkwide output spiking correlations (Figure 4c).
While the linear response theory introduced in Equation (3) is well suited to study large networks of integrateandfire neurons driven by weakly correlated inputs (Tetzlaff et al., 2012; Ledoux and Brunel, 2011; Trousdale et al., 2012; Doiron et al., 2004), the analysis offers little analytic insight. Instead, we consider the instantaneous activity across population $\alpha :{r}_{a}(t)=\frac{1}{{N}_{\alpha}}\sum _{i}yi\alpha (t)$, where ${y}_{i\alpha}(t)$ is the spike train from neuron $i$ of population $\alpha $ and ${N}_{\alpha}$ is the population size ($\alpha =E$ or $I$). This approach reduces the model to just the two dynamic variables, the excitatory population rate ${r}_{E}(t)$ and the inhibitory population rate ${r}_{I}(t)$ (${r}_{E}(t)$ is shown in Figure 4d). Despite this severe reduction the model retains the key ingredients for attentional modulation identified in the previous section – recurrent excitation and inhibition combined with a source of global fluctuations.
We take the population sizes to be large and consider a phenomenological dynamic mean field (Tetzlaff et al., 2012; Ledoux and Brunel, 2011) of the cortical network (see Materials and methods: Mean field model):
The function ${f}_{\alpha}$ is the inputoutput transfer of population $\alpha $, taken to be the mean firing rate for a fixed input (Figure 4e for the $E$ population and Figure 4f for the $I$ population). The parameter ${J}_{\alpha \beta}$ is the coupling strength from population $\beta $ to population $\alpha $. Finally, ${\mu}_{\alpha}$ and ${\sigma}_{\alpha}$ are the respective strengths of the mean input and the global fluctuation $\xi (t)$ to population $\alpha $ (throughout $\xi (t)$ has a zero mean). To simplify our exposition we take symmetric coupling ${J}_{EE}={J}_{IE}\equiv {J}_{E}$ and ${J}_{EI}={J}_{II}\equiv {J}_{I}$ and symmetric timescales ${\tau}_{E}={\tau}_{I}(=1)$. We set the recurrent coupling so that the model has a stationary mean firing rate (${\overline{r}}_{E},{\overline{r}}_{I}$), about which $\xi (t)$ induces fluctuations in ${r}_{E}(t)$ and ${r}_{I}(t)$.
Attention is modeled as a topdown influence on the static input: ${\mu}_{\alpha}={\mu}_{\alpha B}+A\mathrm{\Delta}{\mu}_{\alpha}$. Here ${\mu}_{\alpha B}$ is a background input, the parameter $A$ models attention with $A=0$ denoting the unattended state and $A=1$ the fully attended state, and $\mathrm{\Delta}{\mu}_{\alpha}\text{}\text{}0$ is the increase in ${\mu}_{\alpha}$ due to attention. We note that the choice of representing the unattended state by $A=0$ and the attended state by $A=1$ is only due to convenience, and is not meant to make any statement about particular bounds on these states. In this model attention simply increases the excitability of all of the neurons in the network (Figure 4a). This modulation is consistent with the rank one structure of attentional modulation in the data (Figure 2), since ${\mu}_{\alpha}$ is a single neuron property. The attentioninduced increase in $({\mu}_{E},{\mu}_{I})$ causes an increase in the mean firing rates $({\overline{r}}_{E},{\overline{r}}_{I})$ (red paths in Figure 4e,f), consistent with recordings from putative excitatory (McAdams and Maunsell, 2000; Reynolds et al., 1999) and inhibitory neurons (Mitchell et al., 2007) in visual area V4. Since ${f}_{\alpha}$ is a simple rising function then there is a unique mapping of an attentional path in $({\mu}_{E},{\mu}_{I})$ space to a path in $({\overline{r}}_{E},{\overline{r}}_{I})$ space (Figure 4g).
In total, our population model has the core features required to satisfy Conditions C1 and C2 of the previous section. We next use our mean field model to investigate how attentional paths in $({\overline{r}}_{E},{\overline{r}}_{I})$ space affect population spiking variability.
Attention modulates population variability
The global input $\xi (t)$ causes fluctuations about the network stationary state: ${r}_{\alpha}(t)={\overline{r}}_{\alpha}+\delta {r}_{\alpha}(t)$. The fluctuations $\delta {r}_{\alpha}(t)$ are directly related to coordinated spiking activity in population $\alpha $. In particular, in the limit of large ${N}_{\alpha}$ we have that ${V}_{E}\equiv \mathrm{V}\mathrm{a}\mathrm{r}({r}_{E})\propto \u27e8\mathrm{C}\mathrm{o}\mathrm{v}({y}_{i},{y}_{j})\u27e9$, where the expectation is over $(i,j)$ pairs in the spiking network. Thus, in our mean field network we require attentional modulation to decrease population variance ${V}_{E}$.
For sufficiently small ${\sigma}_{\alpha}$ the fluctuations $\delta {r}_{E}(t)$ and $\delta {r}_{I}(t)$ obey linearized mean field equations (see Materials and methods: Mean field model, Equation (17)). The linear system is readily analyzed and we obtain the population variance ${V}_{E}$ computed over long time windows (see Materials and methods: Computing ${V}_{E}$):
Here ${L}_{\alpha}\equiv {f}_{\alpha}^{\prime}$ is the response gain of neurons in population $\alpha $. Equation (7) shows that ${V}_{E}$ depends directly on ${L}_{\alpha}$, and we recall that ${L}_{\alpha}$ changes with attention (the slope of ${f}_{\alpha}$ in Figure 4e,f). Thus, while the derivation of ${V}_{E}$ requires linear fluctuations about a steady state, attentional modulation samples the nonlinearity in the transfer ${f}_{\alpha}$ by changing the state about which we linearize. Any attentionmediated change in ${V}_{E}$ is not obvious since both $L}_{I}^{A}\text{}\text{}{L}_{I}^{U$ and $L}_{E}^{A}\text{}\text{}{L}_{E}^{U$, meaning that both the numerator and denominator in Equation (7) will change with attention.
We explore ${V}_{E}$ by sweeping over (${\overline{r}}_{E}$, ${\overline{r}}_{I}$) space (Figure 5a). When the network has high ${\overline{r}}_{E}$ and low ${\overline{r}}_{I}$ then ${V}_{E}$ is large, while ${V}_{E}$ is low for the opposite case of high ${\overline{r}}_{I}$ and low ${\overline{r}}_{E}$. Along our attention path ${r}_{E}$ increases while ${V}_{E}$ decreases (Figure 5b), satisfying our requirements for attentional modulation. The attention path that we highlight is just one potential path that reduces population variability, however all paths which reduce ${V}_{E}$ share a large attentionmediated recruitment of inhibition. If we start with the unattended state (turquoise dot in Figure 5c) we can label all ($\mathrm{\Delta}{\mu}_{E}\text{}\text{}0,\mathrm{\Delta}{\mu}_{I}\text{}\text{}0$) points that have a smaller population variance than the unattended point (light green region in Figure 5c). These modulations all share that $\mathrm{\Delta}{\mu}_{I}\text{}\text{}\mathrm{\Delta}{\mu}_{E}$ (Figure 5c, green region is below the $\mathrm{\Delta}{\mu}_{E}=\mathrm{\Delta}{\mu}_{I}$ line). While the absolute comparison between $\mathrm{\Delta}{\mu}_{E}$ and $\mathrm{\Delta}{\mu}_{I}$ may depend on model parameters, a robust necessary feature of topdown attentional modulation is that it must significantly recruit the inhibitory population. This observation is a major circuit prediction of our model.
An intuitive way to understand inhibition’s role in the decrease in population variance is through the stability analysis of the mean field equations. The eigenvalues of the linearized system are ${\lambda}_{1}=1{J}_{I}{L}_{I}+{J}_{E}{L}_{E}\text{}\text{}0$ and ${\lambda}_{2}=1$ (see Materials and methods: Mean field model, Equation (18)). Note that the denominator of the population variance (Equation 7) equals the square of the eigenvalue product ${\lambda}_{1}{\lambda}_{2}=1+{J}_{I}{L}_{I}{J}_{E}{L}_{E}$. The stability of the network activity is determined by ${\lambda}_{1}$; the more negative ${\lambda}_{1}$, the more stable the point $({\overline{r}}_{E},{\overline{r}}_{I})$, and the better the network dampens the perturbations about the point due to input fluctuations $\xi (t)$. The decrease of ${\lambda}_{1}$ along the example attention path is clear (Figure 5d), and overcomes the increase in the numerator of ${V}_{E}$ due to increases in ${L}_{E}$ and ${L}_{I}$. The enhanced damping is why ${V}_{E}$ decreases, explicitly seen in the steeper decline of the excitatory population autocovariance function in the attended compared to the unattended state (Figure 5e).
This enhanced stability due to recurrent inhibition is a reflection of inhibition canceling population variability provided by external fluctuations and recurrent excitation (Renart et al., 2010; Tetzlaff et al., 2012; Ozeki et al., 2009). Indeed, taking the coupling $J$ to be weak allows the expansion ${(1+{J}_{I}{L}_{I}{J}_{E}{L}_{E})}^{2}\approx 1+2{J}_{E}{L}_{E}2{J}_{I}{L}_{I}$ in Equation (7), so that the attention mediated increase in ${L}_{I}$ reduces population variance through cancellation, as in Equation (5). However, this expansion is not formally required to compute the eigenvalues ${\lambda}_{1}$ and ${\lambda}_{2}$, and these measure the stability of the firing rate dynamics. We mention the expansion only to compare to the original motivation for inhibition.
The expression for ${V}_{E}$ given above (Equation 7) assumes a symmetry in the network coupling, namely that ${J}_{EE}={J}_{IE}\equiv {J}_{E}$ and ${J}_{EI}={J}_{II}\equiv {J}_{I}$. This allowed ${V}_{E}$ to be compactly written, facilitating the analysis of how attention affects both the numerator and denominator of Equation (7). However, the linearization of the mean field equations and the subsequent analysis of population variability do not require this assumption (see Materials and methods: Mean field model Equations (18–20)). To explore the robustness of our main result we let ${J}_{IE}=\alpha {J}_{E}$ and ${J}_{II}=\beta {J}_{I}$, thereby breaking the coupling symmetry for $\alpha ,\beta \ne 1$. The reduction in ${V}_{E}$ with attention is robust over a large region of ($\alpha ,\beta $) (Figure 6a, green region). Focusing on selected $(\alpha ,\beta )$ pairings within the region where ${V}_{E}$ decreases shows that the attentional path identified for the network with coupling symmetry produces qualitatively similar behavior in the more general network (compare Figure 5c to Figure 6b–e). In total, the inhibitory mechanism for attention mediated reduction in population variability is robust to changes in the recurrent coupling with the network.
While the reduced mean field equations are straightforward to analyze, a similar attenuation of pairwise covariance $\mathrm{C}\mathrm{o}\mathrm{v}({y}_{i},{y}_{j})$ along the same attentional path occurs in the LIF model network (Appendix: Spiking network). Using linear response analysis for the spiking network we can relate the effect of inhibition to previous work in spiking networks (Renart et al., 2010; Tetzlaff et al., 2012; Ly et al., 2012; Doiron et al., 2016). In particular, the attentionmediated decrease of $\mathrm{C}\mathrm{o}\mathrm{v}({y}_{i},{y}_{j})$ occurs for a wide range of timescale, ranging as low as 20 ms. However, for short timescales that match the higher gamma frequency range (approximately 60–70 Hz) this attentional modulation increases $\mathrm{C}\mathrm{o}\mathrm{v}({y}_{i},{y}_{j})$ (Appendix 1—figure 6). This finding is consistent with reports of attentionmediated increases of neuronal synchrony on gamma frequency timescales(Fries et al., 2001; Buia and Tiesinga, 2008), particularly when inhibitory circuits are engaged (Kim et al., 2016).
Attention can simultaneously increase stimulus gain and decrease noise covariance
An important neural correlate of attention is enhanced stimulus response gain (McAdams and Maunsell, 2000). The previous section outlines how the recruitment of recurrent inhibitory feedback by attention reduces response variability. However, inhibitory feedback is also a common gain control mechanism, and increased inhibition reduces response gain through the same mechanism that dampens population variability (Sutherland et al., 2009). Thus it is possible that the decorrelating effect of attention in our model may also reduce stimulus response gain as well, which would make the model inconsistent with experimental data.
To insert a bottomup stimulus $s$ in our model we let the attentionindependent background input have a stimulus term: ${\mu}_{\alpha B}={k}_{\alpha}s+{\widehat{\mu}}_{\alpha B}$. Here ${k}_{\alpha}$ is the feedforward stimulus gain to population $\alpha $ and ${\widehat{\mu}}_{\alpha B}$ is the background input that is both attention and stimulus independent. Our model captures a bulk firing rate ${r}_{E}$ rather than a population model with distributed tuning. Because of this the stimulus $s$ should either be conceived as the contrast of an input, or the population conceived as a collection of identicallytuned neurons (i.e a single cortical column).
Straightforward analysis shows that the stimulus response gain of the excitatory population can be written as (Materials and methods: Computing stimulus response gain):
If ${k}_{E}={k}_{I}$ then ${G}_{E}\propto \sqrt{{V}_{E}}$, and thus any attentional modulation that reduces population variability will necessarily reduce population stimulus sensitivity. However, for $k}_{E}\text{}\text{}{k}_{I$ the second term in Equation (8) can counteract this effect and decouple stimulus sensitivity and variability modulations.
Consider the example attentional path (Figure 4g) with the extreme choice of ${k}_{E}=1$ and ${k}_{I}=0$. In this case attention causes an increase in ${G}_{E}$ (Figure 7a,b), while simultaneously causing a decrease in ${V}_{E}$ (Figure 5a,b). This is a robust effect, as seen by the region in (${\overline{r}}_{E},{\overline{r}}_{I}$) space for which the change in ${V}_{E}$ from the unattended state is negative, and the change in ${G}_{E}$ is positive (green region, Figure 7c). Further, for fixed ${k}_{I}$ the proportion of the gray rectangle that the green region occupies increases with $k}_{E}\text{}\text{}{k}_{I$ (Figure 7d). Thus, the decoupling of attentional effects on population variability and stimulus sensitivity is robust to both attentional path ($\mathrm{\Delta}{\mu}_{E},\mathrm{\Delta}{\mu}_{I}$) and feedforward gain (${k}_{E},{k}_{I}$) choices. The condition that $k}_{E}\text{}\text{}{k}_{I$ implies that feedforward stimuli must directly target excitatory neurons to a larger degree than inhibitory neurons (or at least the inhibitory neurons subject to attentional modulation). This gives us a complementary prediction to the one from the previous section: while topdown attention favors inhibitory neurons, the bottomup stimulus favors excitatory neurons.
In total, our model of attentional modulation in recurrently coupled excitatory and inhibitory cortical networks subject to global fluctuations satisfies three main neural correlates of attention: (1) increase in excitatory firing rates and in (2) stimulusresponse gain, with a (3) decrease in pairwise excitatory neuron covariability.
Impact of attentional modulation on neural coding
Attention serves to enhance cognitive performance, especially on discrimination tasks that are difficult (Moore and Zirnsak, 2017). Thus, it is expected that the attentionmediated reduction in population variability and increase in stimulus response gain subserve an enhanced stimulus estimation (Cohen and Maunsell, 2009; Ruff and Cohen, 2014). In this section we investigate how the attentional modulation outlined in the previous sections affects stimulus coding by the population.
As mentioned above our simplified mean field model (Equation 6) considers only a bulk response, where any individual neuron tuning is lost. As such a proper analysis of population coding is not possible. Nonetheless, our model has two basic features often associated with enhanced coding, decreased population variability (Figure 5) and increased stimulusresponse gain (Figure 7).
Fisher information (Averbeck et al., 2006; Beck et al., 2011) gives a lower bound on the variance of a stimulus estimate constructed from noisy population responses, and is an often used metric for population coding. The linear Fisher information (Beck et al., 2011) ${\text{FI}}_{EI}$ computed from our twodimensional recurrent network is:
Here ${V}_{\alpha}=\mathrm{V}\mathrm{a}\mathrm{r}({r}_{\alpha})$, ${G}_{\alpha}=d{\overline{r}}_{\alpha}/ds$, and ${C}_{EI}=\mathrm{C}\mathrm{o}\mathrm{v}({r}_{E},{r}_{I})$. The important result is that ${\text{FI}}_{EI}$ is invariant with attention, meaning that attention does not increase the network’s capacity to estimate the stimulus $s$.
While the proof of Equation (9) is straightforward and applies to our recurrent excitatoryinhibitory population (see Materials and methods: Fisher information), the invariance of the total information ${F}_{EI}$ with attention is most easily understood by analogy with an uncoupled, onedimensional excitatory population (Figure 8a). Without coupling, the input to the population is simply ${k}_{E}s+{\sigma}_{E}\xi (t)$, which is then passed through the firing rate nonlinearity ${f}_{E}$. In this case the gain is ${G}_{E}={k}_{E}{L}_{E}$, and assuming a linear transfer the population variance is ${V}_{E}={\sigma}_{E}^{2}{L}_{E}^{2}$. In total the linear Fisher information from the uncoupled population is then:
The proportion ${L}_{E}^{2}$ by which attention increases the squared gain (Figure 8a, top) is exactly matched by the attention related increase in population variance (Figure 8a, bottom), resulting in cancellation of any attentiondependent terms in ${\text{FI}}_{E}$.
The majority of projection neurons in the neocortex are excitatory, so we now consider the stimulus estimation from a readout of only the excitatory population. Combining our previous results we obtain:
Restricting the readout to be from only the excitatory population drastically reduces the total information (compare ${\text{FI}}_{EI}$ to ${\text{FI}}_{E}$ in Figure 8c). As with the uncoupled population the response gain ${G}_{E}$ of the excitatory neurons in the coupled population increases with attention (Figure 8b, top). Yet unlike the uncoupled population the net input variability to the $E$ population is reduced by attention through a cancelation of the external variability $\xi (t)$ via inhibition (Figure 8b, bottom). These two components combine so that despite $\mathrm{F}\mathrm{I}}_{E}\text{}\text{}{\mathrm{F}\mathrm{I}}_{EI$, we have that $\mathrm{F}\mathrm{I}}_{E$ does increase with attention (Figure 8c). In sum, even though the total stimulus information in the network does not change with attention, the amount of information extractable from the excitatory population increases, which could lead to improved downstream stimulus estimation in the attended state.
Discussion
Using population recordings from visual area V4 we identified rank one structure in the mapping of population spike count covariability between unattended and attended states. We used this finding to motivate an excitatoryinhibitory cortical circuit model that captures both the attentionmediated increases in the firing rate and stimulus response gain, as well as decreases in noise correlations. Our model accomplishes this with only an attention dependent shift in the overall excitability of the cortical population, in contrast to a scheme where distinct biophysical mechanisms would be responsible for respective firing rate and noise correlations modulations. The model makes two key predictions about how stimulus and modulatory inputs are distributed over the excitatoryinhibitory cortical circuit. First, topdown attentional signals must affect inhibitory neurons more than excitatory neurons to allow a better damping of global fluctuations in the attended state. Second, bottomup stimulus information must be biased towards excitatory cells to permit higher gain in the attended state. In total, the increased response gain and decreased correlations enhance the flow of information when the readout is confined to the excitatory population.
Candidate physiological mechanisms for attentional modulation
Our model does not consider a specific type of inhibitory neuron, and rather models a generic recurrent excitatoryinhibitory circuit. However, inhibitory circuits in cortex are complex, with at least three distinct interneuron types being prominent in many areas: parvalbumin (PV), somatostatin (SOM), and vasointestinal peptideexpressing (VIP) interneurons (Rudy et al., 2011; Pfeffer et al., 2013; Kepecs and Fishell, 2014). In mouse visual cortex, both SOM and PV cells form recurrent circuits with pyramidal cells, with PV cells having stronger inhibitory projections to pyramidal cells than those of SOM cells (Pfeffer et al., 2013). Furthermore, PV and SOM neurons directly inhibit one another, with the SOM to PV connection being stronger than the PV to SOM connection (Pfeffer et al., 2013). Finally, VIP cells project strongly to SOM cells (Pfeffer et al., 2013) and are activated from inputs outside of the circuit (Lee et al., 2013; Fu et al., 2014), making them an attractive target for modulation. Recent studies in visual, auditory, and somatosensory cortical circuits show that VIP cell activation provides an active disinhibition of pyramidal cells via a suppression of SOM cells (Kepecs and Fishell, 2014). Basal forebrain (BF) stimulation modulates both muscarinic and nicotinic ACh receptors (mAChRs and nAChRs respectively) in a fashion that mimics attentional modulation (Alitto and Dan, 2012). In particular, the recruitment of VIP cell activity in vivo through BF stimulation is strongly dependent on both the muscarinic and nicotinic cholinergic pathways (Alitto and Dan, 2012; Kuchibhotla et al., 2017; Fu et al., 2014), and it has thus been hypothesized VIP cells activation could be an important component of attentional modulation (Alitto and Dan, 2012; Poorthuis et al., 2014).
If we consider the inhibitory population in our model to be PV interneurons then the recruitment of VIP cell activity via topdown cholinergic pathways is consistent with our attentional model in two ways. First, activation of the VIP $\to $ SOM $\to $ pyramidal cell pathway provides a disinhibition to pyramidal cells, modeled simply as an overall depolarization to pyramidal cells in the attended state (Figure 4). Second, the activation of the VIP $\to $ SOM $\to $ PV cell pathway disinhibits PV cells, and the strong SOM $\to $ PV projection would suggest that the disinhibition is sizable as required by our model (Figure 5c). Finally, a recent study in mouse medial prefrontal cortex reports that identified PV interneurons show an attention related increase in activity, and that optogenetic silencing of PV neurons impairs attentional processing (Kim et al., 2016).
However, our logic is perhaps overly simplistic and neglects the direct modulation of SOM cells via muscarinic and nicotinic cholinergic pathways (Alitto and Dan, 2012; Kuchibhotla et al., 2017) that could compromise the disinhibitory pathways. Further, there is evidence of a direct ACh modulation of PV cells (Disney et al., 2014) as opposed to through a disinhibitory pathway. Finally, there may be important differences across both species (mouse vs. primate) and visual area (V1 vs. V4) that fundamentally change the pyramidal, PV, SOM, and VIP circuit that is understood from mouse V1 (Pfeffer et al., 2013). Future studies in the inhibitory to excitatory circuitry of primate visual cortex, and its attentional modulation via neuromodulation, are required to navigate these issues.
Finally, the simultaneous increase in response gain and decrease in noise correlations with attention requires excitatory neurons to be more sensitive to bottomup visual stimulus than inhibitory neurons ($k}_{E}\text{}\text{}{k}_{I$, Figure 7). In mouse visual cortex, GABAergic interneurons show overall less stimulus selectivity than pyramidal neurons (Sohya et al., 2007), however this involves both direct feedforward and recurrent contributions to stimulus tuning. While our model simplified the feedforward stimulus gain ${k}_{E}$ and ${k}_{I}$ to be constant with attention, it is known that attention also modulates feedforward gain through presynaptic nACh receptors (Disney et al., 2007). Notably, nAChRs are found at thalamocortical synapses onto layer 4 excitatory cells and not onto inhibitory neurons, suggesting that ${k}_{E}$ would increase with attention while ${k}_{I}$ would not. Thus, ${k}_{E}$ should also increase with attention while ${k}_{I}$ should not, further supporting that $k}_{E}\text{}\text{}{k}_{I$.
Modeling global network fluctuations and their modulation
Our model considered the source of global fluctuations as external to the network. This choice was due in part to difficulties in producing global, long timescale fluctuations through strictly internal coupling (Renart et al., 2010; Rosenbaum et al., 2017). Our model assumed that the intensity of these external input fluctuation were independent of attention. Rather, attention shifted the operating point of the network such that the transfer of input variability to populationwide output activity was attenuated in the attended state.
Recent analysis of population recordings show that generative models of spike trains that consider gain fluctuations in conjunction with standard spike emission variability capture much of the variability of cortical dynamics (Rabinowitz et al., 2015; Lin et al., 2015). Further, these gain fluctuations are well approximated by a onedimensional, global stochastic process affecting all neurons in the population (Ecker et al., 2014; Rabinowitz et al., 2015; Lin et al., 2015; Ecker et al., 2016; Engel et al., 2016; Whiteway and Butts, 2017). When these techniques are applied to population recordings subject to attentional modulation, the global gain fluctuations are considerably reduced in the attended state (Rabinowitz et al., 2015; Ecker et al., 2016). Our assumption that external input fluctuations to our network are attentioninvariant is consistent with this statistical analysis since it is necessarily constructed from only output activity. Nevertheless, another potential model is that the reduction in population variability is simply inherited from an attentionmediated suppression of the global input fluctuations. Unfortunately, it is difficult to distinguish between these two mechanisms when restricted to only output spiking activity.
However, a model where output variability reductions are simply inherited from external inputs suffers from two criticisms. First, it begs the question: what is the mechanism behind the shift in input variability? Second, our model requires only an increase in the external depolarization to excitatory and inhibitory populations to account for all attentional correlates. An inheritance model would necessarily decouple the attentional mechanisms behind increases in network firing rate (still requiring a depolarization) and the decrease in global input variability. Thus, our model offers a parsimonious and biologically motivated explanation of these neural correlates of attention. Further work dissecting the various external and internal sources of variability to cortical networks, and their attentional modulation, is needed to properly validate or refute these different models.
Attentional modulation of neural coding through inhibition
Our network model assumed attentioninvariant external fluctuations and weak recurrent inputs, permitting a linear analysis of network activity. As a consequence the linear information transfer by the entire population was attentioninvariant (Figure 8), because attention modulated the network’s transfer of signal and noise equivalently. However, this invariance was only apparent if the decoder had access to both the excitatory and inhibitory populations. However, most of the neurons in cortex that project between areas are excitatory. When the decoder was restricted to only the activity of the excitatory population then our analysis uncovered two main results. First, the excitatory population carried less information than the combined excitatoryinhibitory activity, suggesting an inherently suboptimal coding scheme used by the cortex. Second, the attentionmediated modulation of the inhibitory neurons increased the information carried by the excitatory population. This agrees with the wealth of studies that show that attention improves behavioral performance on stimulus discrimination tasks.
Determining the impact of populationwide spiking variability on neural coding is complicated (Averbeck et al., 2006; Kohn et al., 2016). A recent theoretical study has shown that noise correlations that limit stimulus information must be parallel to the direction in which population activity encodes the stimulus (MorenoBote et al., 2014). The fluctuations in our network satisfy this criteria, albeit trivially since all neurons share the same stimulus input. Indeed, in our network the external inputs appear to the network as $s+x(t)$, meaning that fluctuations from the noise source $x(t)$ are indistinguishable from fluctuations in the stimulus $s$. This is an oversimplified view and assumes that the decoder treats the neurons as indistinguishable from one another, at odds with classic work in population coding (Pouget et al., 2000). Extending our network to include distributed tuning and featurebased recurrent connectivity is a natural next step (BenYishai et al., 1995; Rubin et al., 2015). To do this the spatial scales of feedforward tuning, recurrent projections, external fluctuations, as well as attention modulation must all be specified. It is not clear how noise correlations will depend on these choices yet work in spatially distributed balanced networks shows that solutions can be complex (Rosenbaum et al., 2017).
The role of inhibition in shaping cortical function is a longstanding topic of study (Isaacson and Scanziani, 2011), including recent work showing inhibition can actively decorrelate cortical responses (Renart et al., 2010; Tetzlaff et al., 2012; Ly et al., 2012). Our work gives a concrete example of how this decorrelation can be gated and used to control the flow of information. Of interest are tasks that probe a distributed population where attention again decreases noise correlations between neurons with similar stimulus preference, yet increases noise correlations between cells with dissimilar stimulus preference (Ruff and Cohen, 2014). The circuit mechanisms underlying this neural correlate of attention are unclear. However, there is ample work in understanding how recurrent inhibition shapes cortical activity in distributed populations (Isaacson and Scanziani, 2011), including in models of attentional circuits (Ardid et al., 2007; Buia and Tiesinga, 2008). Adapting our model to include distributed tuning is an important next step and will be a better framework to discuss the coding consequences of the attentional modulation circuits proposed in our study.
Methods and materials
Data preparation
Request a detailed protocolData was collected by from two rhesus monkeys with microelectrode arrays implanted bilaterally in V4 as they performed an orientationchange detection task (Figure 1a) (Cohen and Maunsell, 2009). All animal procedures were in accordance with the Institutional Animal Care and Use Committee of Harvard Medical School. Two oriented Gabor stimuli flashed on and off several times, until one of them changed orientation. The task of the monkey was to then saccade to the stimulus that changed. Each recording session consisted of at least four blocks of trials in which the monkey’s attention was cued to the left or right. We excluded from the analysis instruction trials which occurred at the start of each block to cue the monkey to one side to attend to, catch trials in which the monkey was rewarded just for fixating, and trials in which the monkey did not perform the task correctly. Moreover, the first and last stimulus presentations in each trial were not analyzed, to prevent transients due to stimulus appearance or change from affecting the results. The total number of trials included in the analysis from all the recording sessions was $42,496$. Each trial consisted of between $3$ and $12$ stimulus presentations, of which all but the first and last were analyzed.
Recordings from the left and right hemispheres of each monkey were analyzed separately because the activities of the neurons in opposite hemispheres had nearzero correlations (Cohen and Maunsell, 2009). Neurons in the right hemisphere were considered to be in the attended state when the attentional cue was on the left, and viceversa. We note that because our criteria for choosing which trials and units to analyze were based on different needs for data analysis compared to the original study (Cohen and Maunsell, 2009) the specific firing rates and covariances differ quantitatively from those previously reported.
In monkey 1, an average of $51.1$ (min $35$, max $80$) units were analyzed from the right hemisphere, and an average of $27.5$ (min $14$, max $56$) units were analyzed from the left hemisphere. From monkey 2, an average of $56.6$ (min $43$, max $71$) units from the right hemisphere, and an average of $37.7$ (min $32$, max $46$) units from the left hemisphere were analyzed. From each recording, spikes falling between $60$ and $260$ ms from stimulus onset were considered for the firing rate analysis, to account for the latency of neuronal responses in V4.
Comparing change in covariance to change in variance
Request a detailed protocolLet ${S}^{U}$ be the matrix containing spike counts of the neurons on trials in which they are in the unattended state, and ${S}^{A}$ the matrix containing spike counts of the neurons on trials in which they are in the attended state. Denote the unattended spike count covariance matrix by ${C}^{U}=\mathrm{C}\mathrm{o}\mathrm{v}({S}^{U})$, and the attended one by ${C}^{A}=\mathrm{C}\mathrm{o}\mathrm{v}({S}^{A})$. Attentional changes in covariance and variance were measured both on average (Figure 1c) and as distributions (Figure 1d). The distributions of the normalized differences
reveal a concentration of negative covariance changes, and a distribution of variance changes symmetric about zero. Here, $\mathrm{C}\mathrm{o}\mathrm{v}}^{A$ and $\mathrm{C}\mathrm{o}\mathrm{v}}^{U$ ($\mathrm{V}\mathrm{a}\mathrm{r}}^{A$ and $\mathrm{V}\mathrm{a}\mathrm{r}}^{U$) are vectors containing covariance (variance) values of the entire data set. Note that the distributions are bounded between $2$ and $2$ by construction.
Solving systems of equations by error minimization
Request a detailed protocolWhen solving systems of the form of Equation (2) in order to quantify the fit of the model, a nonlinear equation solver (fminunc) in MATLAB was used. The solver found minima of an objective function which we defined as the Euclidean norm of the difference of the approximation of the attended covariance matrix and the original attended covariance matrix, in other words, the error of the approximation:
Shuffled covariance matrices
Request a detailed protocolFor finite population sizes ($N\text{}\text{}\mathrm{\infty}$) we expect our algorithm to extract some lowrank structure between arbitrary covariance matrices. Let $\sqrt{{C}^{A}}$ be the principal square root of the attended covariance matrix, the unique positivesemidefinite square root of a positivesemidefinite matrix. Consider the symmetric matrix $D=\mathrm{p}\mathrm{e}\mathrm{r}\mathrm{m}(\sqrt{{C}^{A}})$ computed from the a random permutation of the uppertriangular entries of $\sqrt{{C}^{A}}$. Finally, let ${C}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}^{A}=\mathrm{r}\mathrm{e}\mathrm{a}\mathrm{l}(DD)$. The square rootpermutationsquaring procedure guarantees a positivesemidefinite matrix, as the square of any matrix is positivesemidefinite. Shuffling removes any relation between ${\mathbf{\mathbf{C}}}^{\mathbf{\mathbf{U}}}$ and ${C}_{\text{shuf}}^{A}$, and any remaining detected structure would be due to finite sampling. The shuffled covariance gain $\hat{\mathbf{g}}}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}$ provides the prediction $\hat{C}}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}^{A}:={\hat{\mathbf{g}}}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}{\hat{\mathbf{g}}}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}^{T}\circ {C}^{U$, and $\rho}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}$ measures the relation between $\hat{C}}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}^{A$ and $C}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}^{A$. Synthetic data shows that as population size $N$ becomes large the coefficient $\rho}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}$ approaches 0 (Appendix: Detected structure in random covariance matrices is a finitesize effect).
Upper bound covariance matrices
Request a detailed protocolThe covariance matrices ${\mathbf{\mathbf{C}}}^{\mathbf{\mathbf{U}}}$ and ${\mathbf{\mathbf{C}}}^{\mathbf{\mathbf{A}}}$ are estimates obtained from a finite number of trials, and any estimation error will compromise the ability to detect rank one structure of ${A}_{C}$. Here we outline an upper bound for the model performance based on a finite number of trials over which the covariance matrices were originally estimated. Let ${\widehat{C}}^{A}:=\widehat{\mathbf{\mathbf{g}}}{\widehat{\mathbf{\mathbf{g}}}}^{T}\circ {C}^{U}$ with $\widehat{\mathbf{\mathbf{g}}}$ minimizing the ${L}^{2}$ norm of ${C}^{A}:={\mathbf{\mathbf{g}\mathbf{g}}}^{T}\circ {C}^{U}$. We remark that ${\widehat{C}}^{A}$ perfectly decomposes according to the statistical model in Equation (2). We used ${\widehat{C}}^{A}$ to generate an artificial set of $N$ correlated Poisson spike counts, using an algorithm based on a latent multivariate gaussian model (Macke et al., 2009). We sampled these population spike counts with a fixed number of trials ($M$) with $D$ be the resulting $M\times N$ matrix of Poisson samples for each process. Let ${C}_{\mathrm{u}\mathrm{b}}^{A}=\mathrm{C}\mathrm{o}\mathrm{v}(D)$ be the 'upper bound' covariance matrix: a finite trial sampling approximation to the perfectly decomposable matrix ${\widehat{C}}^{A}$. Finally, we employ our algorithm to give $\hat{C}}_{\mathrm{u}\mathrm{b}}^{A}:={\hat{\mathbf{g}}}_{\mathrm{u}\mathrm{b}}{\hat{\mathbf{g}}}_{\mathrm{u}\mathrm{b}}^{T}{C}^{U$, where the vector $\hat{\mathbf{g}}}_{\mathrm{u}\mathrm{b}$ minimizes the ${L}^{2}$ norm of the error.
Since ${\widehat{C}}^{A}$ is perfectly decomposable then for $M\to \mathrm{\infty}$ we have $\hat{C}}_{\mathrm{u}\mathrm{b}}^{A}={C}_{\mathrm{u}\mathrm{b}}^{A}={\hat{C}}^{A$. Thus in the large $M$ limit the coefficient $\rho}_{\mathrm{u}\mathrm{b}$ between elements of $\hat{C}}_{\mathrm{u}\mathrm{b}}^{A$ and $C}_{\mathrm{u}\mathrm{b}}^{A$ converges to 1 (Appendix: Performance limited by available number of trials). However, for finite $M$ we have that ${\rho}_{\mathrm{u}\mathrm{b}}\text{}\text{}1$, solely due to inaccuracies in estimating ${\widehat{C}}^{A}$ with $C}_{\mathrm{u}\mathrm{b}}^{A$. To account for the possibility of particular strings of realizations $D$ introducing random biases into $C}_{\mathrm{u}\mathrm{b}}^{A$, we performed the following analysis on $10$ independently generated upperbound covariance matrices $C}_{\mathrm{u}\mathrm{b}}^{A$.
Leaveoneout crossvalidation
Request a detailed protocolInstead of solving the system consisting of all Equations (2), we remove one of them. Denote the complete set of equations by $S$, an individual equation as ${s}_{ij}:=\{{C}_{ij}^{A}={g}_{i}{g}_{j}{C}_{ij}^{U}\}$ and the set of equations with one of them removed as ${S}_{ab}:=S{s}_{ab}$. We then solve the system ${S}_{ab}$. Denote the solution by ${\mathbf{\mathbf{g}}}_{ab}$. We can then compare ${C}_{ab}^{A}$ and ${\widehat{C}}_{ab}^{A}={\mathbf{\mathbf{g}}}_{ab}(a){\mathbf{\mathbf{g}}}_{ab}(b){C}_{ab}^{U}$. We do this for $\text{max}(1000,N(N1)/2$ possible systems ${S}_{ab}$. The $\rho $ of the vector of resulting ${C}_{ab}^{A}$ vs ${\widehat{C}}_{ab}^{A}$ values is a measure of how well the system can predict one of its elements, or in other words, how well the structure holds together when one element is taken out. This leaveoneout crossvalidation was performed for the shuffled and the upperbound cases as well.
Mean field model
The mean spiking activity over the population $\alpha \text{}(=E\text{}\mathrm{o}\mathrm{r}\text{}I)$ is
where ${y}_{i\alpha}(t)={\sum}_{j=1}^{{n}_{i\alpha}}\delta (t{t}_{i\alpha}^{j})$ is the spike train of excitatory neuron $i$ of population $\alpha $, ${n}_{i\alpha}$ is the number of spikes from that neuron, and ${t}_{i\alpha}^{j}$ is the time of spike $j$. We follow previous studies (Tetzlaff et al., 2012; Ozeki et al., 2009; Ledoux and Brunel, 2011) and consider the firing rate dynamics of the $E$ and $I$ populations given by the system in Equations (6):
Here ${\mu}_{\alpha B}$ is the attention independent drive to population $\alpha $, $A\in [0,1]$ is the attention variable, and $\mathrm{\Delta}{\mu}_{\alpha}$ is the maximal drive to population $\alpha $ due to attention. The parameter ${J}_{\alpha \beta}$ is the coupling from population $\beta $ to populations $\alpha $. The stochastic processes ${x}_{E}(t)$, ${x}_{I}(t)$, and $x(t)$ are the global fluctuations applied to the network. The excitatory and inhibitory populations have private fluctuations ${x}_{\alpha}(t)$ and also common fluctuations $x(t)$ given to both populations; the parameter $\chi $ scales the degree of private versus common fluctuations. We perform calculations for arbitrary $\chi $ and then take $\chi \to 1$ to match the system given in Equations (6). The total intensity of fluctuations to population $\alpha $ is set by ${\sigma}_{\alpha}$. These simplified rate equations give an accurate picture of the longtimescale dynamics of networks of coupled spiking neuron models that are in the fluctuation driven regime (Ledoux and Brunel, 2011). The operative timescale reflects a combination of synaptic and membrane integration; since we are interested in spiking covariance over time windows that are much longer than these, we take them to be unity for simplicity.
To give a quantitative match between the equilibrium statistics of the rate equations and the leaky integrateandfire (LIF) network simulations we take the transfer function $f$ to be the inverse first passage time of an LIF neuron driven by white noise (Ledoux and Brunel, 2011):
The parameter ${\eta}_{\alpha}$ is the intensity of the external fluctuations given to the LIF neurons (Appendix: Spiking model). The membrane timescale $\tau $ gives the dimensions of 1/s to the firing rate ${r}_{\alpha}$. The parameter ${V}_{T}$ denotes spike threshold while ${V}_{R}$ is the reset potential. Model parameters are given in Table 1.
If the input fluctuations, $x(t)$, ${x}_{E}(t)$, and ${x}_{I}(t)$ are white noise processes then the nonlinearity in $f$ makes the stochastic dynamics of ${r}_{E}(t)$ and ${r}_{I}(t)$ complicated (nondiffusive). To simply the analysis we consider $x(t)$ as the limiting process from:
for ${\tau}_{x}\to 0$, with $\u27e8{\xi}_{x}(t)\u27e9=0$ and $\u27e8{\xi}_{x}(t){\xi}_{x}({t}^{\prime})\u27e9=\delta (t{t}^{\prime})$. This makes $x(t)$ sufficiently smooth in time (the same is true for ${x}_{E}(t)$ and ${x}_{I}(t)$).
We restrict the coupling ${J}_{\alpha \beta}$ such that for ${\sigma}_{\alpha}=0$ the equilibrium point $({\overline{r}}_{E},{\overline{r}}_{I})$ is stable and given by:
For sufficiently small ${\sigma}_{\alpha}$ the fluctuations in population activity about the equilibrium firing rate, $\delta {r}_{\alpha}(t)={r}_{\alpha}(t){\overline{r}}_{\alpha}$, obey the linearized stochastic system:
Here $L}_{\alpha}=\frac{d{f}_{\alpha}}{dI}{}_{I={I}_{\alpha}^{\mathrm{e}\mathrm{f}\mathrm{f}}$ is the slope of the transfer function ${f}_{\alpha}$ evaluated at the equilibrium point $I}_{\alpha}^{\mathrm{e}\mathrm{f}\mathrm{f}}={\mu}_{\alpha}+A\mathrm{\Delta}{\mu}_{\alpha}+{J}_{\alpha E}{\overline{r}}_{E}{J}_{\alpha I}{\overline{r}}_{I$. Equation (17) is a two dimensional OrnsteinUhlenbeck process (Gardiner, 2004) that is readily amenable to analysis.
Computing ${V}_{E}$
Request a detailed protocolIn matrix form the system Equation(17) is written as:
Here $\delta \mathbf{r}=[\delta {r}_{E},\delta {r}_{I}]$, $\mathbf{\mathbf{x}}=[{x}_{E},{x}_{I},x]$, and
$M=\left[\begin{array}{cc}\hfill 1+{L}_{E}{J}_{EE}\hfill & \hfill {L}_{E}{J}_{EI}\hfill \\ \hfill {L}_{I}{J}_{IE}\hfill & \hfill 1{L}_{I}{J}_{II}\hfill \end{array}\right]$ and $D=\left[\begin{array}{ccc}\hfill {L}_{E}{\sigma}_{E}\sqrt{1\chi}\hfill & \hfill 0\hfill & \hfill {L}_{E}{\sigma}_{E}\sqrt{\chi}\hfill \\ \hfill 0\hfill & \hfill {L}_{I}{\sigma}_{I}\sqrt{1\chi}\hfill & \hfill {L}_{I}{\sigma}_{I}\sqrt{\chi}\hfill \end{array}\right]$.
The stationary autocovariance function is computed as:
where $s$ is a time lag and $\mathrm{\Sigma}=\frac{(\mathrm{D}\mathrm{e}\mathrm{t}M)D{D}^{T}+[M(\mathrm{T}\mathrm{r}M)1]D{D}^{T}[M(\mathrm{T}\mathrm{r}M)1{]}^{T}}{2(\mathrm{T}\mathrm{r}M)(\mathrm{D}\mathrm{e}\mathrm{t}M)}$ is the variance matrix (Det and Tr denote the determinant and trace operations, respectively). Here, $1$ is the $2\times 2$ identity matrix.
The covariance between populations $\alpha $ and $\beta $ over long time scales is given by
where the integration is performed over the appropriate element of the matrix $\stackrel{~}{C}(s)$. In particular, the long timescale variance of the excitatory population is given by (after some algebra):
We remark that the long timescale covariance matrix can alternatively be computed from $C={M}^{1}D{[{M}^{1}D]}^{T}$ (Gardiner, 2004). To obtain the compact expression for ${V}_{E}$ we have assumed symmetric coupling: ${J}_{I}:={J}_{EI}={J}_{II}$, ${J}_{E}:={J}_{EE}={J}_{IE}$, and $\chi \to 1$. These are not required for the main results of our study and merely ease the analysis of equations.
Computing stimulus response gain
Request a detailed protocolWe decompose ${\mu}_{\alpha B}={k}_{\alpha}s+{\widehat{\mu}}_{\alpha B}$ and define the gain of population $\alpha $ to stimulus $s$ as ${G}_{\alpha}=\frac{d{\overline{r}}_{\alpha}}{ds}={L}_{\alpha}\frac{d{I}_{\alpha}}{ds}$. The term $\frac{d{I}_{\alpha}}{ds}$ is obtained by differentiating Equations (16)) with respect to $s$:
Solving the system of two equations for ${G}_{E}$ yields:
For the sake of compactness we set ${\sigma}_{E}={\sigma}_{I}$ to obtain the result in Equation (8).
Fisher information
Request a detailed protocolLinear Fisher Information depends on the stimulus response gains and covariance matrix of the excitatory and inhibitory populations:
When the input correlation $0\le \chi \text{}\text{}1$ we have:
and
Inserting these expressions and those for ${G}_{E}$ and ${G}_{I}$ into Equation (23) and simplifying yields:
We remark that ${\text{FI}}_{EI}$ is independent of ${L}_{E}$ and ${L}_{I}$ and thus independent of attentional modulation.
Notice that we have reintroduced the correlation constant $\chi $ into the equations, rather than only considering the limit $\chi \to 1$. If $\chi =1$, the excitatory and inhibitory populations are receiving completely identical noise. If this is the case, the correlation cancellation would be perfect, leading to infinite informational content, as can be seen in Equation (27).
Appendix 1
Detected structure in random covariance matrices is a finitesize effect
Here we show that any prediction of rank one structure in our shuffled covariance matrix (nonzero ${\rho}_{\mathrm{shuf}}$ in Figure 2 of the main text) is a finitedata effect. The trialbytrial covariance matrices of the experimental data are computed from the spike counts recorded from a set number of units. To explore the effect of population size on the detected structure in the shuffled covariance matrices we must rely on synthetic data.
We construct the synthetic covariance matrices by generating Gaussian random numbers with the same mean and standard deviation as the actual covariance matrices from the data. This construction serves as a substitute for the shuffled covariance matrices, and allows for arbitrarily large populations. As we increase the number of units from near $10$ to $500$, ${\rho}_{\mathrm{shuf}}$ decreases accordingly, indicating that any positive ${\rho}_{\mathrm{shuf}}$ is due to the finite population size, rather than any inherent structure in the data (Appendix 1—figure 1).
Model performance is limited by number of trials in data
The upper bound for our model ${\rho}_{\mathrm{ub}}$ did not saturate 1 (see Figure 2 of the main text). Here, we show that this is also due the finite data available. If infinitely many trials were available to compute the spike count covariance matrices from the data, and the data obeyed by the lowrank statistical model, the performance of the model (${\rho}_{\mathrm{ub}}$) should tend to one. To test this, we generate synthetic data from correlated Poisson processes as in the upper bound computation of the main text but do not limit the number of samples to the number of trials in the original data. As the number of samples increases we find that ${\rho}_{\mathrm{ub}}\to 1$ (Appendix 1—figure 2).
Model performance for all monkeys and hemispheres
The model performance for individual recording sessions are given here for transparency (Appendix 1—figure 3 for the full data and Appendix 1—figure 4 for the leaveoneout cross validation).
Lowdimensional modulation is intrinsic to neurons
In order to further test our model, we asked to what extent the actual value of the covariance gain ${g}_{i}$ of neuron $i$ depends on the neural population whose covariance matrix ${g}_{i}$ was estimated from. If we had solved the system $S$ of equations ${C}_{i,j}^{A}={g}_{i}{g}_{j}{C}_{i,j}^{U}$ using covariance matrices computed from recordings from a different set of neurons (including neuron $i$), would the value of ${g}_{i}$ be different? If not, this would be further indication of the independence of the attentional modulation of neuron $i$ from the particular set of other neurons it is analyzed with.
We tackle this question by dividing a set of $N$ neurons into $k$ sets ${S}_{i}^{(1)},{S}_{i}^{(2)},\mathrm{\dots},{S}_{i}^{(k)}$ of $m\equiv (N+1)/2$ neurons each that all contain the neuron ${n}_{i}$ ($m\equiv N/2+1$ if $N$ is originally even). As an example take $k=2$ and consider the set of neurons ${n}_{1},\mathrm{\dots},{n}_{2i1}$ partitioned into two subsets ${S}_{i}^{(1)}=\{{n}_{1},\mathrm{\dots},{n}_{i}\}$ and ${S}_{i}^{(2)}=\{{n}_{i},\mathrm{\dots},{n}_{2i1}\}$ (Appendix 1—figure 5a). We solve Equation (1) using the systems of equations obtained from ${S}_{i}^{(1)}$ and ${S}_{i}^{(2)}$, and obtain two solutions ${\mathbf{\mathbf{g}}}_{i}^{(1)}$ and ${\mathbf{\mathbf{g}}}_{i}^{(2)}$. We take the variance of the $g$estimations as a metric for how closely the different subsets can estimate an intrinsic value of $g$. A higher variance would indicate a poorer convergence, and therefore a lower degree of independence from other neurons. Appendix 1—figure 5b shows the spread of $g$estimates from one dataset for the data, as well as the upper (UB) and lower (shuf) bounds. This spread includes estimates for all $g$values for all neurons. The spread in the shuffled case (SEM$=7.42$) is largest by two orders of magnitude, and the spread of the upper bound (SEM$=2.60\times {10}^{3}$) is only one order of magnitude tighter than that of the data (SEM$=1.03\times {10}^{2}$), so this case is close to ideal.
For each data set, the analysis is done for each neuron for $100$ different permutations of the neurons to generate ${S}_{i}^{(k)}$, $k=1,\mathrm{\dots},100$. For shuffled and upperbound analysis, $10$ shuffles or Poisson realizations, and $10$ permutations were used. In all cases there was a total of $100\times \mathrm{\#}\text{neurons}$ points. Appendix 1—figure 5c shows an overview of the performance for all datasets. The abscissa is the mean variance of the $g$estimates computed from the data, normalized by the mean variance computed from the shuffled data: $\frac{\u27e8{\mathrm{V}\mathrm{a}\mathrm{r}}_{k}({g}_{i}^{(k)}){\u27e9}_{i}}{\u27e8\u27e8{\mathrm{V}\mathrm{a}\mathrm{r}}_{k}({g}_{i,\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}^{(k)}){\u27e9}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}{\u27e9}_{i}}$. The 'shuf' subscript denotes averaging over each shuffle. The ordinate is the mean variance of the $g$estimates computed from the upper bound, with the same normalization: $\frac{\u27e8\u27e8{\mathrm{V}\mathrm{a}\mathrm{r}}_{k}({g}_{i,UB}^{(k)}){\u27e9}_{\mathrm{p}\mathrm{o}\mathrm{i}\mathrm{s}\mathrm{s}}{\u27e9}_{i}}{\u27e8\u27e8{\mathrm{V}\mathrm{a}\mathrm{r}}_{k}({g}_{i,\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}^{(k)}){\u27e9}_{\mathrm{s}\mathrm{h}\mathrm{u}\mathrm{f}}{\u27e9}_{i}}$. The 'poiss' subscript denotes averaging over each Poisson realization of the upper bound covariance matrix. We chose to normalize the mean data and upperbound variances by the mean shuffled variance so that a value of $1$ would mean equality to the lower bound, meaning the only detected structure comes from finitesize effects, and a value of $0$ would mean perfect convergence of the $g$estimates. The gray regions are a visualization for the points which are closer to $1$ than $0$ (values above $0.5$) on the logaxes. Most of the data unsurprisingly falls below the diagonal, so the variance is greater for the data than the upper bound. Less trivially, most of the data falls outside of the gray regions, and are much closer to $0$ than $1$, indicating excellent performance. This implies a structure in the modulation of the (unshuffled) covariance matrices that is preserved over analysis in the contexts of different groups of other neurons. In other words, attention modulates the individual neurons to a large extent independently, in a lowdimensional manner.
Network requirements for attentional modulation
In this section we study a network of $N$ neurons with the spike train output from neuron $i$ being ${y}_{i}(t)={\sum}_{k}\delta (t{t}_{ik})$ where ${t}_{ik}$ is the ${k}^{\text{th}}$ spike time from neuron $i$. We consider multiple trials of the discrimination experiment and model the spike train only over a time period $t\in (0,T)$, where we assume that the spike trains to have have reached equilibrium statistics. We abuse notation and take the spike count from neuron $i$ over a trial as ${y}_{i}={\int}_{0}^{T}{y}_{i}(t)dt$. The trialtotrial covariance matrix of the network response is $\mathbf{\mathbf{C}}$ with element ${c}_{ij}=\mathrm{C}\mathrm{o}\mathrm{v}({y}_{i},{y}_{j})$.
To analyze the network activity we first assume that each spike train is simply perturbed about a background state and employ the linear response ansatz (Ginzburg and Sompolinsky, 1994; Doiron et al., 2004; Trousdale et al., 2012) :
Here, ${J}_{ik}$ is the synaptic coupling from neuron $k$ to neuron $i$ (proportional to the synaptic weight), and ${\xi}_{i}$ is a fluctuating external input given to neuron $i$. The background state of neuron $i$ is ${y}_{iB}$, and it represents the stochastic output of a neuron that is not due to the recurrence from the network $(J=0)$ or the external input (${\xi}_{i}=0$). Finally, ${L}_{i}$ is the input to output gain of a neuron $i$. In this framework, ${y}_{i}$, ${y}_{iB}$, and ${\xi}_{i}$ are random variables, while ${L}_{i}$ and ${J}_{ik}$ are parameters that describe the intrinsic and network properties of the system. Without loss of generality we take $\u27e8{y}_{iB}\u27e9=0$, $\u27e8{\xi}_{i}\u27e9=0$, making $\u27e8{y}_{i}\u27e9=0$ a solution for the mean activity. We remark that formally Equation (28) is incorrect as written; ${y}_{i}$ is a random integer while, for instance, ${L}_{i}{J}_{ik}{y}_{k}$ need not be an integer. Equation (28) is only correct upon taking an expectation (over trials) of ${y}_{i}$.
Here we derive the requirements for external fluctuations and internal coupling for network covariability $\mathbf{\mathbf{C}}$ to satisfy the following two conditions (on average):
C1: ${c}_{ij}^{A}={g}_{i}{g}_{j}{c}_{ij}^{U}$ ; attentional modulation of covariance is rank one.
C2: ${g}_{i}\text{}\text{}1$ ; spike count covariance decreases with attention.
It is convenient to write Equation (28) in matrix form and isolate for the population response:
Here $\overrightarrow{\mathbf{\mathbf{y}}}={[{y}_{1},\mathrm{\dots}{y}_{N}]}^{T}$ with similar notation for ${\overrightarrow{\mathbf{\mathbf{y}}}}_{\mathbf{\mathbf{B}}}$ and $\overrightarrow{\xi}$. The matrix $\mathbf{\mathbf{K}}$ has element ${\mathbf{\mathbf{K}}}_{ij}={L}_{i}{J}_{ij}$, while $\mathbf{L}=\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}({L}_{i})$ and $\mathbf{\mathbf{I}}$ is the identity matrix. Using Equation (29) we can express the covariance matrix $\mathbf{\mathbf{C}}=\u27e8\overrightarrow{\mathbf{\mathbf{y}}}{\overrightarrow{\mathbf{\mathbf{y}}}}^{T}\u27e9$ as:
where ${}^{T}$ denotes the transpose operation. Here $\mathbf{\mathbf{B}}=\u27e8{\overrightarrow{\mathbf{\mathbf{y}}}}_{\mathbf{\mathbf{B}}}{\overrightarrow{\mathbf{\mathbf{y}}}}_{\mathbf{\mathbf{B}}}^{T}\u27e9$ is the background covariance, which we take to be simply $\mathbf{B}=\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}({b}_{i})$. The input covariance matrix is $\mathbf{\mathbf{X}}=\u27e8\overrightarrow{\xi}{\overrightarrow{\xi}}^{T}\u27e9$ with elements ${x}_{ij}$. In the above we assumed that $\u27e8{\overrightarrow{\mathbf{\mathbf{y}}}}_{\mathbf{\mathbf{B}}}{\overrightarrow{\xi}}^{T}\u27e9=\mathrm{\U0001d7ce}$, meaning that the background state is uncorrelated with the external noisy input.
It is clear that $\mathbf{\mathbf{C}}$ naturally decomposes into two terms. The first term represents the correlations that are internally generated within the network, via the direct synaptic coupling $\mathbf{\mathbf{K}}$ acting upon the background state $\mathbf{\mathbf{B}}$. The second term is how the direct synaptic coupling $\mathbf{\mathbf{K}}$ filters the externally applied correlations $\mathbf{\mathbf{X}}$.
Satisfying C1
The background matrix $\mathbf{\mathbf{B}}$ is a diagonal matrix and is hence rank $N$. The high rank $\mathbf{\mathbf{B}}$ combined with attentional modulations of both $\mathbf{\mathbf{B}}$ and $\mathbf{\mathbf{K}}$ make it impossible to satisfy condition $\mathbf{\mathbf{C}\U0001d7cf}$. If the spectral radius of $\mathbf{\mathbf{K}}$ is less than 1, then we can expand ${\left(\mathbf{\mathbf{I}}\mathbf{\mathbf{K}}\right)}^{1}=\mathbf{\mathbf{I}}+{\sum}_{n=1}^{\mathrm{\infty}}{\mathbf{\mathbf{K}}}^{n}$ (Pernice et al., 2011; Trousdale et al., 2012). Inserting this expansion into the expression for the internally generated covariability yields:
Extracting the covariance between neuron $i$ and $j$ ($i\ne j$) due to internal coupling within the network gives:
If we take ${J}_{ij}\sim 1/N$ and the network connectivity to be dense (meaning the connection probability is $\sim \mathcal{O}(1)$) then each term is $\mathcal{O}(1/N)$. So long as the spectral radius of $\mathbf{\mathbf{K}}$ is less than one then the series converges and as $N\to \mathrm{\infty}$ we have that ${c}_{ijB}$ vanishes (Pernice et al., 2011; Trousdale et al., 2012; Helias et al., 2014).
This argument can be extended to networks with ${J}_{ij}\sim 1/\sqrt{N}$ when combined with a balance condition between recurrent excitation and inhibition. Such networks also produce an asynchronous state where ${c}_{ijB}\sim 1/N$, vanishing in the large $N$ limit (Renart et al., 2010). However, formally balanced networks in the asynchronous state with $N\to \mathrm{\infty}$ have solutions that do not depend on the firing rate transfer $L$. The attention dependent modulation ${A}_{L}:{L}^{U}\to {L}^{A}$ is a critical component of our model and care must be taken in ensuring that
In contrast, the external covariance $\mathbf{\mathbf{X}}$ is not a diagonal matrix, so that the contributions from external fluctuations to $\mathbf{\mathbf{C}}$ scale as ${N}^{2}{J}^{2}$. This is $\mathcal{O}(1)$ for $J\propto 1/N$. Thus, while the terms in $\mathbf{\mathbf{X}}$ must be weak for the linear approximation in Equation (28) to hold, they need not vanish for large $N$. Indeed, for moderate $\mathbf{\mathbf{X}}$ and large network size it is reasonable to ignore the contribution of internally generated fluctuations to $\mathbf{\mathbf{C}}$. Recent analysis of cortical population recordings show that the shared spiking variability across the population can be well approximated by a rank one model of covariability (Ecker et al., 2014; Lin et al., 2015; Ecker et al., 2015; Rabinowitz et al., 2015). Thus motivated, we take the external fluctuations $\mathbf{\mathbf{X}}={\mathbf{\mathbf{x}\mathbf{x}}}^{T}$ where $\mathbf{\mathbf{x}}={[{x}_{1},\mathrm{\dots},{x}_{N}]}^{T}$. In total, we have for large $N$ the approximation:
Hence $\mathbf{\mathbf{C}}$ is rank one matrix with $\mathbf{\mathbf{c}}=\left({(\mathbf{\mathbf{I}}\mathbf{\mathbf{K}})}^{1}\mathbf{\mathbf{L}\mathbf{x}}\right)={[{c}_{1},\mathrm{\dots},{c}_{N}]}^{T}$. It is trivial to satisfy condition $\mathbf{\mathbf{C}\U0001d7cf}$ with ${g}_{i}={c}_{i}^{A}/{c}_{i}^{U}$.
Satisfying C2
We again use the expansion ${\left(\mathbf{\mathbf{I}}\mathbf{\mathbf{K}}\right)}^{1}=\mathbf{\mathbf{I}}+{\sum}_{n=1}^{\mathrm{\infty}}{\mathbf{\mathbf{K}}}^{n}$. Truncating this expansion at $n=1$ yields an approximation considering only synaptic paths of length one in the network, and neglecting higher order paths. This is appropriate for ${J}_{ij}$ sufficiently small. Truncating after inserting the expansion into Equation (30) yields the following approximation for $\mathbf{\mathbf{c}}$:
The analysis in the main text begins with this approximation to derive Equation (5) of the main text.
Spiking network
Spiking network description
We implement a network of leaky integrateandfire neurons (LIF) with $1000$ excitatory neurons and $200$ inhibitory neurons. Individual neurons were modeled as integrateandfire units whose voltages obeyed
for neuron $i$. When the voltage reached a threshold ${V}_{\text{th}}=1$, a spike was recorded and the voltage reset to ${V}_{\text{re}}=0$. Time was measured in units of the membrane time constant, $\tau =1$ for all neurons. The bias $\mu $ depended on neuron type and attentional state. In the unattended state, the bias for excitatory neurons was ${\mu}_{E}^{\text{un}}=0.6089$ and ${\mu}_{I}^{\text{un}}=0.5388$. In the attended state, ${\mu}_{E}^{\text{att}}=0.8713$ and ${\mu}_{I}^{\text{att}}=0.8996$. The recurrent input to neuron $i$ was
where ${\mathbf{\mathbf{W}}}_{\mathbf{\mathbf{i}\mathbf{j}}}$ is the strength of the connection from neuron $j$ to neuron $i$, ${J}_{ij}(t)$ is the synaptic filter for the projection from neuron $j$ to neuron $i$, $*$ denotes convolution and ${y}_{j}(t)$ is neuron $j$’s spike train – a series of $\delta $functions centered at spike times. The synaptic filters were taken to be alpha functions,
with ${\tau}_{s}=0.3$ of the passive membrane time constant for all synapses. The connection probability from neurons in population $A$ to population $B$ was ${p}^{AB}$, with ${p}^{EE}=0.2$ and ${p}^{EI}={p}^{IE}={p}^{II}=0.4$. Synaptic weights for connections between excitatory neurons were ${\mathbf{W}}^{EE}=0.0075$ and ${\mathbf{W}}^{IE}=0.0037$, ${\mathbf{W}}^{EI}=0.0375$, ${\mathbf{W}}^{II}=0.0375$. These parameters, and the bias voltages $\mu $, were chosen so that the mean field theory derived above was valid for the spiking network’s firing rates.
The excitatory neurons were divided into four clusters, each excitatory neuron receiving half of its inputs from neurons in the same cluster and half from others. Projections to and from inhibitory neurons were unclustered.
External input from outside the network was contained in ${I}_{i}^{\text{ext}}$. We modeled this as a partially correlated Gaussian white noise process: ${I}_{i}^{\text{ext}}(t)={\sigma}_{i}\left(\sqrt{1c}{\xi}_{i}(t)+\sqrt{c}{\xi}_{c}(t)\right)$. ${\xi}_{i}(t)$ was Gaussian white noise private to neuron $i$ and ${\xi}_{c}(t)$ was shared between all neurons. $c=0.05$ denoted the fraction of common input and the noise intensity for excitatory neurons was ${\sigma}_{E}=0.3$ and for inhibitory neurons ${\sigma}_{I}=0.35$.
The firing rate of neuron $i$ in a trial of length $L$ is given by its spike count in that trial ${n}_{i}^{L}$, ${r}_{i}=\u27e8{n}_{i}^{L}\u27e9/L$ where $\u27e8\cdot \u27e9$ denotes averaging over trials. The spike train covariance between neurons $i$ and $j$ describes the abovechange likelihood that action potentials occur in each spike train separated by a time lag $s$:
For simulations, we measure the populationaveraged spike train crosscovariance function $Q(s)={\left({N}_{E}({N}_{E}1)\right)}^{1}{\sum}_{i,j=1,i\ne j}^{{N}_{E}}{q}_{ij}(s)$ by average a randomly chosen subsample of 100 spike train crosscovariances from pairs of neurons in the same cluster.
In order to calculate the covariance of neuron $i$ and $j$’s spike counts in windows of length $T$, ${n}_{i}^{T}$ and ${n}_{j}^{T}$, we use the relation
The crosscorrelation of input currents was averaged over the same random subsample of the network as the spike train covariances. Current crosscorrelations were normalized so that each current’s autocorrelation at zero lag was 1.
Spiking network analysis
The LIF model simulates voltages and produces spike trains, from which we can compute firing rates and covariances. Appendix 1—figure 6a,b show example voltage traces of individual excitatory neurons, with the spikes they produce shown above. Note that in the attended state, more spikes are produced, corresponding to a higher firing rate. Appendix 1—figure 6c,d show rasters for all the neurons in the unattended (c) and attended (d) states. Higher firing rates can be observed, especially for the inhibitory neurons. Averaging the spike trains over the excitatory population gives us the PSTH of the excitatory neurons. Appendix 1—figure 6e, left shows the unattended (turquoise) and attended (orange) PSTH smoothed with a sliding Gaussian window with width (std dev) $10$ ms. The histograms on the right demonstrate the decrease in population variance with attention.
The spiking model provides the opportunity to directly compute the pairwise spiking covariance, in addition to the population variance. Appendix 1—figure 6f shows the pairwise spike count covariance computed over counting windows from $0$ to $200$ ms. For small counting windows, corresponding to highfrequency correlations, neurons in the attended state have slightly higher spike count covariance. This is consistent with the slightly higher peak in the attended autocovariance function from the meanfield theory (Figure 4e, main text), as well as experimental results (Fries et al., 2001). For counting windows greater than $30$ ms, the spike count covariance notably decreases with attention. The experiments we are modeling (Cohen and Maunsell, 2009) measure spike count correlations over $200$ ms counting windows, corresponding to the rightmost points in Appendix 1—figure 6f. The proportional changes in the spike count covariance are expressed in the covariance ratio ${R}_{\mathrm{C}\mathrm{o}\mathrm{v}}={\mathrm{C}\mathrm{o}\mathrm{v}}^{A}({n}_{1},{n}_{2})/{\mathrm{C}\mathrm{o}\mathrm{v}}^{U}({n}_{1},{n}_{2})$, shown in Appendix 1—figure 6g. Values of $R}_{\mathrm{C}\mathrm{o}\mathrm{v}$ greater than one indicate increased spike count covariance with attention, and values of $R}_{\mathrm{C}\mathrm{o}\mathrm{v}$ less than one indicate decreased spike count covariance with attention. The crossing of the ${R}_{\mathrm{C}\mathrm{o}\mathrm{v}}=1$ line is apparent at counting windows of approximately $30$ ms. The theoretical values were computed using linear response theory (Trousdale et al., 2012).
To dissect the spike count covariance by different time lags, we consider the spike train covariance function (Equation 36), which is the pairwiseneuron analogue of the autocovariance function of the populationaveraged activity (Figure 5e, main text). Appendix 1—figure 6h,i show the spike train covariance functions of excitatory neurons in the unattended and attended states. To compare the two, Appendix 1—figure 6j shows them normalized so that their maximum values are $1$. In accordance with our meanfield results, the attended spike train covariance decays faster than the unattended spike train covariance, indicating increased stability in the attended state.
The spiking model also provides the opportunity to investigate the inputs to individual neurons, something that is difficult to do experimentally, and does not apply to meanfield models. Appendix 1—figure 6k,l shows the correlation functions of different types of inputs to a pair of excitatory neurons, averaged over pairs of excitatory neurons, in the unattended (k) and attended (l) states. Computing the correlation functions of the total recurrent input (black curves) reveals that correlations between excitatory inputs (EPSCEPSC, dashed green), and correlations between inhibitory inputs (IPSCIPSC, dashed red), are canceled by anticorrelations between excitatory and inhibitory inputs (EPSCIPSC, blue). This is consistent with the idea of correlation cancellation by inhibitory tracking of excitatory activity (Renart et al., 2010; Tetzlaff et al., 2012; Ly et al., 2012). Attention, by shifting the system into a more stable state, allows this cancellation to occur more efficiently, thereby reducing the pairwise covariance. Appendix 1—figure 6m shows the input current correlation functions of the total recurrent inputs to pairs of excitatory neurons, normalized to peak at $1$. We conclude that the correlation cancellation brought about by recurrent inhibitory feedback suppresses correlations of the total recurrent input, which in turn decreases the output correlations.
References

Celltypespecific modulation of neocortical activity by basal forebrain inputFrontiers in Systems Neuroscience 6:79.https://doi.org/10.3389/fnsys.2012.00079

An integrated microcircuit model of attentional processing in the neocortexJournal of Neuroscience 27:8486–8495.https://doi.org/10.1523/JNEUROSCI.114507.2007

Neural correlations, population coding and computationNature Reviews Neuroscience 7:358–366.https://doi.org/10.1038/nrn1888

Role of interneuron diversity in the cortical microcircuit for attentionJournal of Neurophysiology 99:2158–2182.https://doi.org/10.1152/jn.01004.2007

Stimulus feature selectivity in excitatory and inhibitory neurons in primary visual cortexJournal of Neuroscience 27:10333–10344.https://doi.org/10.1523/JNEUROSCI.169207.2007

Attention improves performance primarily by reducing interneuronal correlationsNature Neuroscience 12:1594–1600.https://doi.org/10.1038/nn.2439

Cholinergic control of cortical network interactions enables feedbackmediated attentional modulationEuropean Journal of Neuroscience 34:146–157.https://doi.org/10.1111/j.14609568.2011.07749.x

The mechanics of statedependent neural correlationsNature Neuroscience 19:383–393.https://doi.org/10.1038/nn.4242

On the structure of neuronal population activity under fluctuations in attentional stateJournal of Neuroscience 36:1775–1789.https://doi.org/10.1523/JNEUROSCI.204415.2016

BookHandbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences (3rd edn)SpringerVerlag.

Theory of correlations in stochastic neural networksPhysical Review E 50:3171–3191.https://doi.org/10.1103/PhysRevE.50.3171

Cortical state and attentionNature Reviews Neuroscience 12:509–523.https://doi.org/10.1038/nrn3084

Neuromodulation and cortical function: modeling the physiological basis of behaviorBehavioural Brain Research 67:1–27.https://doi.org/10.1016/01664328(94)00113T

The correlation structure of local neuronal networks intrinsically results from recurrent dynamicsPLoS Computational Biology 10:e1003428.https://doi.org/10.1371/journal.pcbi.1003428

Local field potentials indicate network state and account for neuronal response variabilityJournal of Computational Neuroscience 29:567–579.https://doi.org/10.1007/s1082700902089

Correlations and Neuronal Population InformationAnnual Review of Neuroscience 39:237–256.https://doi.org/10.1146/annurevneuro070815013851

Parallel processing by cortical inhibition enables contextdependent behaviorNature Neuroscience 20:62–71.https://doi.org/10.1038/nn.4436

Dynamics of networks of excitatory and inhibitory neurons in response to timedependent inputsFrontiers in Computational Neuroscience 5:25.https://doi.org/10.3389/fncom.2011.00025

A disinhibitory circuit mediates motor integration in the somatosensory cortexNature Neuroscience 16:1662–1670.https://doi.org/10.1038/nn.3544

Cellular and circuit mechanisms maintain low spike covariability and enhance population coding in somatosensory cortexFrontiers in Computational Neuroscience 6:7.https://doi.org/10.3389/fncom.2012.00007

Generating spike trains with specified correlation coefficientsNeural Computation 21:397–423.https://doi.org/10.1162/neco.2008.0208713

Attention to both space and feature modulates neuronal responses in macaque area V4Journal of Neurophysiology 83:1751–1755.

Neural mechanisms of selective visual attentionAnnual Review of Psychology 68:47–72.https://doi.org/10.1146/annurevpsych122414033400

Modeling the influence of task on attentionVision Research 45:205–231.https://doi.org/10.1016/j.visres.2004.07.042

The role of neuromodulators in selective attentionTrends in Cognitive Sciences 15:585–591.https://doi.org/10.1016/j.tics.2011.10.006

How structure determines correlations in neuronal networksPLoS Computational Biology 7:e1002059.https://doi.org/10.1371/journal.pcbi.1002059

Information processing with population codesNature Reviews Neuroscience 1:125–132.https://doi.org/10.1038/35039062

Competitive mechanisms subserve attention in macaque areas V2 and V4Journal of Neuroscience 19:1736–1753.

Attentional modulation of visual processingAnnual Review of Neuroscience 27:611–647.https://doi.org/10.1146/annurev.neuro.26.041002.131039

The spatial structure of correlated neuronal variabilityNature Neuroscience 20:107–114.https://doi.org/10.1038/nn.4433

Three groups of interneurons account for nearly 100% of neocortical GABAergic neuronsDevelopmental Neurobiology 71:45–61.https://doi.org/10.1002/dneu.20853

Attention can either increase or decrease spike count correlations in visual cortexNature Neuroscience 17:1591–1597.https://doi.org/10.1038/nn.3835

Attention and normalization circuits in macaque V1European Journal of Neuroscience 41:949–964.https://doi.org/10.1111/ejn.12857

Feedbackinduced gain control in stochastic spiking networksBiological Cybernetics 100:475–489.https://doi.org/10.1007/s0042200902985

Decorrelation of neuralnetwork activity by inhibitory feedbackPLoS Computational Biology 8:e1002596.https://doi.org/10.1371/journal.pcbi.1002596

Neural correlates of attention in primate visual cortexTrends in Neurosciences 24:295–300.https://doi.org/10.1016/S01662236(00)018142

Impact of network structure and cellular response on spike time correlationsPLoS Computational Biology 8:e1002408.https://doi.org/10.1371/journal.pcbi.1002408

Chaotic balanced state in a model of cortical circuitsNeural Computation 10:1321–1371.https://doi.org/10.1162/089976698300017214

Effects of spatial attention on contrast response functions in macaque area V4Journal of Neurophysiology 96:40–54.https://doi.org/10.1152/jn.01207.2005
Decision letter

Peter LathamReviewing Editor; University College London, United Kingdom
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Attentional modulation of neuronal variability in circuit models of cortex" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors and the evaluation has been overseen by Timothy Behrens as the Senior Editor. The following individual involved in review of your submission has agreed to reveal his identity: Ruben CoenCagli (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
All three reviewers found the paper interesting and potentially important. Of particular note is the demonstration that a single source of attentional modulation in an EI network can explain both increases in firing rate and decreases in noise covariance, without the need to postulate separate effects. In addition, the modelling is solid and convincing, and the result is an important advance in our understanding of attentional effects.
However, there are a couple if issues that need to be addressed:
1) Robustness to network parameters and assumptions needs to be explored.
2) The rank1 covariance needs to be better quantified.
3) Information was about contrast, whereas the experiment was about orientation change detection. This is important because the rank 1 covariance matrix is unlikely to have much effect on information about orientation.
It won't be completely trivial to address these issues, but we believe they are addressable.
It is eLife's policy to provide a summary of essential revisions. That's hard to do for these reviews, as they were all relatively extensive. I am, though, going to give it a shot. The exposition will be a little uneven, as I combined the reviews, without trying to edit for uniformity.
1) A big issue is robustness to parameters. Essentially what we want to know is: what are the constraints on parameter space for the results to hold qualitatively? It's probably hard to fully answer this, but it should be possible to provide answers for some of the more important parameters. Following are some more specific points.
First is a long comment about two of the modelling assumptions:
 The weights scale as 1/N.
 Perfect balance (J_{EE} = J_{IE} = J_{E} and J_{II} = J_{EI} = J_{I}).
This 1/N scaling is different from the usual one, which is $1\sqrt{N}$and in that regime perfect balance is problematic. Granted, the $1\sqrt{N}$scaling is probably too large, but 1/N is probably too small. At the very least, the authors need to comment on this scaling – after all, the last author just published a paper on correlations which was based on $1\sqrt{N}$scaling. We're not suggesting that the analysis be redone, but it would be good to know whether the analysis really does apply to biologically plausible connectivity.
Now, a couple of technical comments relating to these points.
The first one is mainly a suggestion. The authors use the 1/N scaling to argue that the first term in Equation 3 of SM (the fullrank component of the covariance matrix) is O(1/N). But this may not be necessary. An instructive case is completely homogeneous coupling, in which J_{ij} depends only on the type (E versus I) of neurons _{i} and _{j}, the probability of connection is 1. In this case, if the scaling is $1\sqrt{N}$a back of the envelope calculations indicates that the first term in Equation 3 of SM scales as 1/N. I believe that if iid noise is added to the weight matrix (while retaining the $1\sqrt{N}$scaling), the first term in Equation 3 of SM would still scale as 1/N, but I'm not sure. This should probably be checked: if it one turns out to be correct, it would go a long way toward dispelling doubts about the 1/N scaling.
Second, in Equation 7 the authors derive an expression for the variance over long time windows. This was derived under the perfect balance assumption. If that assumption is dropped, there's an additional term in the denominator that scales as L_{E} L_{I} Det(J) (where J is the 2x2 matrix of weights). If the components of J are large – as they probably are in realistic networks  this can have a large effect. Is it possible to estimate its effect for realistic networks? How would that change the results?
A couple semiminor points on robustness:
a) Equation 7 depends on $\sigma$_{E} – σ_{I}. That's taken to be negative (Table 1). How much do the results change if it's positive?
b) In real networks, an increase in drive (which is how top down attention is modelled) would probably lead to an increase in noise (because variance scales with spike count). I think it would be important to estimate how large an increase in noise could be tolerated without an increase in covariance with attention.
2) The rank1 covariance
a) The authors tell us that the modulation matrix is close to being rank one, but they tell us nothing about the vector of modulation gains defining this rank one matrix. I would like to see answers to basic questions such as: What is the distribution of gs? Are they correlated with as (firing rate modulations) and/or with baseline firing rates? Or with the vectors obtained using lowrank approximation of the covariance matrix itself? The model must make specific predictions about the answers to all these questions, and it would be nice to see these predictions tested.
b) It is unclear why the authors used their method for low rank approximation, as opposed to more standard methods based on SVD (that naturally provides a quantification of the quality of a general low rank approximation based on singular values). I think it would be useful to check what they get using alternate methods, to check the robustness of their results.
c) The data in Figure 1D show a broad range of effects of attention on noise covariance, but the model addresses only the overall reduction in the mean, not any other property of the distribution (including the fact that there are a substantial minority of cases with increased covariance under attention). Isn't is possible to study the distribution across the network model, at least in simulations? And again, the structure of the covariance (and tuning) is important to determine information.
d) A separate, smaller issue is the assumption that attention acts as a lowrank modulation of noise covariance. The opening statement in the Results, subsection “Attention as a lowrank modulation of noise covariance” is that "we need to first understand the dimension of attentional modulation", as if a modelcomparison of some sort was going to be performed between lowrank and fullrank modulations. Instead, there is only a quantification that the lowrank assumption works reasonably well, but no comparison to a higherrank description. Also, why is the assumption of a multiplicative effect better than e.g. additive modulation? This could be quantified.
3) Fisher information: contrast versus orientation.
My main concern is that I see a disconnect between the modeling and the data/experimental paradigm that motivate the modeling.
I am not convinced about the generality of the conclusions on the effects of attentional modulation on population coding in the model. The experiment is about orientationchange detection, but in the modeling the stimulus dependence is more like contrast (all neurons are identically modulated by the stimulus intensity) than like orientation (where a change in stimulus value would drive some neurons up, and others down). This is acknowledged in the closing paragraphs, and suggested as future work, but I wonder if it should instead be done as part of this paper. I am no expert in EI networks, I don't know how long it would take, but here is concretely what I would like to see and why. Add the stimulus drive in the actual network, not just in the mean field solutions. And while doing that, assume heterogeneity (and possibly nonlinearity) of tuning. If the stimulus acts like contrast, then doing the information analysis on the mean field is fine; but otherwise the mean field solution is effectively a suboptimal decoder (weight all neurons equally), and the conclusions about information may be only valid for that decoder, not for the optimal decoder. The rankone external noise by itself does not limit information for orientationlike stimulus dimensions (unless you modify it to exactly align it with the signal) (e.g. MorenoBote et al., 2014), so some other source of differential correlations needs to be considered if you want the attentional modulation to have any chance of improving information.
We'll admit that this may be a hard one to address rigorously. But the authors should provide an extended discussion of this issue. And an attempt should be made to provide approximate calculations and/or estimates.
https://doi.org/10.7554/eLife.23978.018Author response
All three reviewers found the paper interesting and potentially important. Of particular note is the demonstration that a single source of attentional modulation in an EI network can explain both increases in firing rate and decreases in noise covariance, without the need to postulate separate effects. In addition, the modelling is solid and convincing, and the result is an important advance in our understanding of attentional effects.
However, there are a couple if issues that need to be addressed:
We thank the reviewers for their very careful read of our manuscript and for their insightful comments concerning our work. We have made changes to the paper in response to these comments, and as a result we feel our manuscript has significantly improved. In particular, we have added two new figures (Figures 3 and 6). We have also incorporated the previous supplementary section as an appendix, in response to both reviewer requests as well as those from the editorial staff. Below, we list our responses to each detailed comment from the reviewers and summarize the changes we have made.
1) Robustness to network parameters and assumptions needs to be explored.
2) The rank1 covariance needs to be better quantified.
3) Information was about contrast, whereas the experiment was about orientation change detection. This is important because the rank 1 covariance matrix is unlikely to have much effect on information about orientation.
It won't be completely trivial to address these issues, but we believe they are addressable.
It is eLife's policy to provide a summary of essential revisions. That's hard to do for these reviews, as they were all relatively extensive. I am, though, going to give it a shot. The exposition will be a little uneven, as I combined the reviews, without trying to edit for uniformity.
1) A big issue is robustness to parameters. Essentially what we want to know is: what are the constraints on parameter space for the results to hold qualitatively? It's probably hard to fully answer this, but it should be possible to provide answers for some of the more important parameters. Following are some more specific points.
First is a long comment about two of the modelling assumptions:
 The weights scale as 1/N.
 Perfect balance (J_{EE} = J_{IE} = J_{E} and J_{II} = J_{EI} = J_{I}).
The reviewers raise two excellent points and we outline our response to both of them below.
This 1/N scaling is different from the usual one, which is $1/\sqrt{N}$ and in that regime perfect balance is problematic. Granted, the $1/\sqrt{N}$scaling is probably too large, but 1/N is probably too small. At the very least, the authors need to comment on this scaling – after all, the last author just published a paper on correlations which was based on $1/\sqrt{N}$scaling. We're not suggesting that the analysis be redone, but it would be good to know whether the analysis really does apply to biologically plausible connectivity.
We understand the reviewer’s suggestion and also think that we should discuss the case where weights scale as $1/\sqrt{N}.$ Our analysis aims to establish the network requirements for:
C1:$\frac{A}{ij}=\text{}{g}_{i}{g}_{i\text{}}c\frac{U}{ij}$; attentional modulation of covariance is rank one (Figure 2).
C2: g_{i} < 1: spike count covariance decreases with attention (Figure 1).
We will argue below that $1/\sqrt{N}$coupling will not affect how C1 is satisfied, but will technically prevent C2 (in the large N limit).
To establish C1 we first observe that the covariance C decomposes as:
Here the matrix K has element K_{ij} = L_{i}J_{ij} where L_{i} is the linear response and J_{ij} is the synaptic strength from neuron j to i. The issue at hand is that we require that the internally generated covariability vanish for large N. If the spectral radius of K is less than 1, then we can expand $\left(IK\right)}^{1}=I+\text{}\sum _{n=1}^{\mathrm{\infty}}{K}^{n$(Pernice et al., 2011; Trousdale et al., 2012). Inserting this expansion into the expression for the internally generated covariability yields:
(I − K)^{−1}B(I − K^{T)−1} = B + BK^{T} + KB + KBK^{T} + · · ·.
Extracting the covariance between neuron i and j (i ≠ j) due to internal coupling within the network gives:
c_{ijB} = b_{i}L_{j}J_{ji} + bjL_{i}J_{ij} + $\sum _{K}L$_{i}L_{j}b_{k}J_{ik}J_{jk} + · · ·.
If we take J_{ij} ∿ 1/N and the network connectivity to be dense (meaning the connection probability is ∿ O(1)) then each term is O (1/N). So long as the spectral radius of K is less than 1 then the series converges and as N → ∞ we have that c_{ijB} vanishes (Pernice et al., 2011; Trousdale et al., 2012; Helias et al., 2014). As the reviewers intuit this argument can be extended to networks with J_{ij}∼ $1/\sqrt{N}$, however the analysis is more complicated. When J ∼ $1/\sqrt{N}$we require a balance condition where large excitation and inhibition effectively cancel. Renart et al., 2010 showed that even densely coupled balanced networks produce an asynchronous state where c_{ijB} ∿1/N, vanishing in the large N limit (Renart et al., 2010). Showing this requires that we consider the excitatory/inhibitory subnetworks and extend the balance condition so that c_{ijB} is the sum of O(1) terms that nonetheless combine to be O(1/N).
While we appreciate the reviewers comment we feel that this is a somewhat tagential point and will derail the flow of the manuscript. The above arguments feature in our Supplementary Materials and in the main text we now write:
“Spiking covariability in recurrent networks can be due to internal interactions (through J_{ik}) or external fluctuations (through ξ_{i}), or both (Ocker, 2017). […] In these cases spiking covariability requires external fluctuations to be applied and subsequently filtered by the network.”
Condition C2 involves the attentional modulation itself. In the main text we derived:
In the above each term involves the gain modulation LA − LU; recall that L = dr/dµ (slope of the firing rate curve). In networks with J ∼ $1/\sqrt{N}$then for N → ∞ the balance condition takes precedence and the firing rate solution does not depend upon the firing rate function f and consequently neither upon L (vanVreeswijk and Sompolinsky, 1988). As such a modulation of L will not change the ability of a balanced network to track an external input. Of course, finite size balanced networks may rescue us from this, or including short term plasticity mechanisms will also impart nonlinearities in the transfer (Mongillo et al., 2012). But we again feel that these issues are best left for another paper. We have addressed this in our resubmitted manuscript by including the sentences:
“In the above we considered weak synaptic connections where J_{ij} ∼ 1/N. […] synaptic nonlinearities through short term plasticity (Mongillo et al., 2012) may be necessary to satisfy condition C2 with large synapses.”
Now, a couple of technical comments relating to these points.
The first one is mainly a suggestion. The authors use the 1/N scaling to argue that the first term in Equation 3 of SM (the fullrank component of the covariance matrix) is O(1/N). But this may not be necessary. An instructive case is completely homogeneous coupling, in which J_{ij} depends only on the type (E versus I) of neurons _{i} and _{j}, the probability of connection is 1. In this case, if the scaling is $1/\sqrt{N}$), a back of the envelope calculations indicates that the first term in Equation 3 of SM scales as 1/N. I believe that if iid noise is added to the weight matrix (while retaining the $1/\sqrt{N}$ scaling), the first term in Equation 3 of SM would still scale as 1/N, but I'm not sure. This should probably be checked: if it one turns out to be correct, it would go a long way toward dispelling doubts about the 1/N scaling.
We hope that the above discussion satisfies this point.
Second, in Equation 7 the authors derive an expression for the variance over long time windows. This was derived under the perfect balance assumption. If that assumption is dropped, there's an additional term in the denominator that scales as L_{E} L_{I} Det(J) (where J is the 2x2 matrix of weights). If the components of J are large – as they probably are in realistic networks – this can have a large effect. Is it possible to estimate its effect for realistic networks? How would that change the results?
This is an excellent point. The assumption of perfect balance was made so that the formula for population variance was compact (our old Equation 7). However, the reduction of population variance with our attentional modulation is not critically dependent on this assumption. To demonstrate this we have added the following new text to the manuscript as well as a new figure exploring the robustness of our result.
“The expression for VE given above (Equation 7) assumes a symmetry in the network coupling, namely that J_{EE} = J_{IE}^{≡} J_{E}and J_{EI}= J_{II}^{≡} J_{I}. […] In total, the inhibitory mechanism for attention mediated reduction in population variability is robust to changes in the recurrent coupling with the network.”
A couple semiminor points on robustness:
a) Equation 7 depends on σ_{E} – σ_{I}. That's taken to be negative (Table 1). How much do the results change if it's positive?
b) In real networks, an increase in drive (which is how top down attention is modelled) would probably lead to an increase in noise (because variance scales with spike count). I think it would be important to estimate how large an increase in noise could be tolerated without an increase in covariance with attention.
There are several parameters that we could vary, as well as the transfer functions fEand fI themselves. In most cases the parameters like σ are scaled by J. We hope that by changing the synaptic coupling J (see new Figure 6) that we have addressed these concerns somewhat. A full analysis remains to be done.
2) The rank1 covariance
a) The authors tell us that the modulation matrix is close to being rank one, but they tell us nothing about the vector of modulation gains defining this rank one matrix. I would like to see answers to basic questions such as: What is the distribution of gs? Are they correlated with as (firing rate modulations) and/or with baseline firing rates? Or with the vectors obtained using lowrank approximation of the covariance matrix itself? The model must make specific predictions about the answers to all these questions, and it would be nice to see these predictions tested.
The reviewers raise some important points. To address them we have added some new analysis and a new figure to the manuscript.
“To further validate our model we show the distribution of g_{i}s computed from the entire data set (Figure 3A). […] This indicates that the circuit modulation of firing rates and covariance may not be trivially related to one another (Doiron et al., 2016).”
b) It is unclear why the authors used their method for low rank approximation, as opposed to more standard methods based on SVD (that naturally provides a quantification of the quality of a general low rank approximation based on singular values). I think it would be useful to check what they get using alternate methods, to check the robustness of their results.
The low rank approximation we require is that [ggT]ij = [CU]ij /[CA]ij (here [A]ij denotes the ijth element of matrix A). The number of trials are limited in our data and while we are confident that C is well estimated the ratio matrix [CU]ij /[CA]ij unfortunately is not. The justification for the low rank approximation of C is given in Rabinowitz et al., 2016 as they analyze the exact same data that we have analyzed.
c) The data in Figure 1D show a broad range of effects of attention on noise covariance, but the model addresses only the overall reduction in the mean, not any other property of the distribution (including the fact that there are a substantial minority of cases with increased covariance under attention). Isn't is possible to study the distribution across the network model, at least in simulations? And again, the structure of the covariance (and tuning) is important to determine information.
Our study focuses on a mean field theory of population dynamics. Necessarily, this theory cannot address the heterogeneity of correlations in the data. Our spiking simulations do have heterogeneity, owing to the random coupling within the network. However, the simplified binary coupling makes this heterogeneity quite weak. A full accounting of the spread in noise correlations shown in Figure 1 would require a better understanding of the mechanistic source of the heterogeneity in the network firing rates and variability itself. We feel that this is beyond the scope of this study and hope that our field theory provides sufficient insight into the main mechanisms underlying attentional modulation.
d) A separate, smaller issue is the assumption that attention acts as a lowrank modulation of noise covariance. The opening statement in the Results, subsection “Attention as a lowrank modulation of noise covariance” is that "we need to first understand the dimension of attentional modulation", as if a modelcomparison of some sort was going to be performed between lowrank and fullrank modulations. Instead, there is only a quantification that the lowrank assumption works reasonably well, but no comparison to a higherrank description. Also, why is the assumption of a multiplicative effect better than e.g. additive modulation? This could be quantified.
A high rank model (rank N) of attentional modulation would always work perfectly since we would simply set gij= $c}_{ij}^{A}/{c}_{ij}^{U$. The problem is that each pair of neurons would require a modulation specific to them. From a biological standpoint this would be very complicated, while a rank 1 modulation is much simpler since only individual neurons need to be targeted. To express this thought we have reworded the text in the Results, subsection “Attention as a lowrank modulation of noise covariance” as follows (for reference Equation 1 is CA = AC ◦CU):
“On the one hand A_{C}could be constructed from the ratios of the individual elements: g_{ij}=$c}_{ij}^{A}/{c}_{ij}^{U$, with each pair of neurons (i, j) receiving an individualized attentional modulation gijof their shared variability (Figure 2B, left). […] To test whether AC is low rank we analyzed the V4 population recordings during the visual attention task (Figure 1), specifically measuring AC under the assumption that AC is rank 1:
CA = ggT ◦ CU.”
3) Fisher information: contrast versus orientation.
My main concern is that I see a disconnect between the modeling and the data/experimental paradigm that motivate the modeling.
I am not convinced about the generality of the conclusions on the effects of attentional modulation on population coding in the model. The experiment is about orientationchange detection, but in the modeling the stimulus dependence is more like contrast (all neurons are identically modulated by the stimulus intensity) than like orientation (where a change in stimulus value would drive some neurons up, and others down). This is acknowledged in the closing paragraphs, and suggested as future work, but I wonder if it should instead be done as part of this paper. I am no expert in EI networks, I don't know how long it would take, but here is concretely what I would like to see and why. Add the stimulus drive in the actual network, not just in the mean field solutions. And while doing that, assume heterogeneity (and possibly nonlinearity) of tuning. If the stimulus acts like contrast, then doing the information analysis on the mean field is fine; but otherwise the mean field solution is effectively a suboptimal decoder (weight all neurons equally), and the conclusions about information may be only valid for that decoder, not for the optimal decoder. The rankone external noise by itself does not limit information for orientationlike stimulus dimensions (unless you modify it to exactly align it with the signal) (e.g. MorenoBote et al., 2014), so some other source of differential correlations needs to be considered if you want the attentional modulation to have any chance of improving information.
We'll admit that this may be a hard one to address rigorously. But the authors should provide an extended discussion of this issue. And an attempt should be made to provide approximate calculations and/or estimates.
We agree with the reviewers that we only grazed the surface of the implications of our attentional modulation mechanism for neural coding. Extending our model to include distributed tuning would be a natural extension, yet one that would be overly cumbersome for this study. In particular, this would involve putting the network on a ring (at minimum) where orientation preference is coded. We would need to choose (and explore) how noise correlations depends on the spatial scale of recurrent interactions, the spatial scale of feedforward stimulus tuning, and even allow attentional modulation to be localized on the ring (to model feature attention). These aspects make a full study worthy of its own report, and we (Doiron and Cohen) currently have a student working on this project.
Rather, we chose to highlight only the simplest consequences of the attentional modulation mechanism we present. If we use a linear framework to study the inputoutput transfer of both signal and noise then in our simplified model the attentional modulation has no affect on information transfer as decoded from the whole network. However, if the decoder has access to only the excitatory population then 1) the code is suboptimal, and 2) attention can now improve the code.
By virtue of the scalar nature of the model the signal and noise are ‘aligned’ in a trivial sense (µ = ks + σξ(t)). The reviewers are correct to point out that should we consider a true population model with distributed tuning where these fluctuations are orthogonal (meaning ‘not parallel’ when speaking about high dimensions) to the dimension over which stimulus is coded then our treatment is overly simplistic. Since extending the model to include a feature dimension is the focus of a later study we will rather interpret our stimulus or population in simpler terms. When first setting up the stimulus we now write:
“Our model captures a bulk firing rate rE rather than a population model with distributed tuning. Because of this the stimulus s should either be conceived as the contrast of an input, or a population conceived as a collection of identicallytuned neurons (i.e a single cortical column).”
In the section where we analyze the stimulus estimation by our model we now write:
“As mentioned above our simplified mean field model (Equation 6) considers only a bulk response, where any individual neuron tuning is lost. As such a proper analysis of population coding is not possible. Nonetheless, our model has two basic features often associated with enhanced coding, decreased population variability (Figure 5) and increased stimulusresponse gain (Figure 7).”
Finally, in the Discussion section we now write:
“Determining the impact of populationwide spiking variability on neural coding is complicated (Averbeck et al., 2006, Kohn et al., 2016). […] It is not clear how noise correlations will depend on these choices yet work in spatially distributed balanced networks shows that solutions can be complex (Rosenbaum et al., 2017).”
https://doi.org/10.7554/eLife.23978.019Article and author information
Author details
Funding
National Science Foundation (DMS1313225)
 Tatjana Kanashiro
 Gabriel Koch Ocker
 Brent Doiron
National Science Foundation (DMS1517082)
 Gabriel Koch Ocker
 Brent Doiron
Simons Foundation (Simons Collaboration on the Global Brain)
 Marlene R Cohen
 Brent Doiron
National Institutes of Health (R01 EY022930)
 Marlene R Cohen
National Institutes of Health (CRCNSR01DC015139)
 Brent Doiron
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
The research was supported by National Science Foundation grants NSFDMS1313225 (BD), NSF DMS1517082 (BD), a grant from the Simons Foundation collaboration on the global brain (SCGB #325293MC;BD), NIH grants 4R00EY02084403 and R01 EY022930 (MRC), a Whitehall Foundation Grant (MRC), KlingensteinSimons Fellowship (MRC), a Sloan Research Fellowship (MRC), and a McKnight Scholar Award (MRC). We thank John Maunsell for the generous use of the data, and Kenneth Miller, Ashok LitwinKumar, Douglas Ruff, and Robert Rosenbaum for useful discussions.
Ethics
Animal experimentation: All animal procedures were in accordance with the Institutional Animal Care and Use Committee of Harvard Medical School (Harvard IACUC protocol number: 04214).
Reviewing Editor
 Peter Latham, University College London, United Kingdom
Version history
 Received: December 8, 2016
 Accepted: May 20, 2017
 Accepted Manuscript published: June 7, 2017 (version 1)
 Accepted Manuscript updated: June 8, 2017 (version 2)
 Version of Record published: June 19, 2017 (version 3)
Copyright
© 2017, Kanashiro et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 3,353
 Page views

 742
 Downloads

 43
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
How does the human brain combine information across the eyes? It has been known for many years that cortical normalization mechanisms implement ‘ocularity invariance’: equalizing neural responses to spatial patterns presented either monocularly or binocularly. Here, we used a novel combination of electrophysiology, psychophysics, pupillometry, and computational modeling to ask whether this invariance also holds for flickering luminance stimuli with no spatial contrast. We find dramatic violations of ocularity invariance for these stimuli, both in the cortex and also in the subcortical pathways that govern pupil diameter. Specifically, we find substantial binocular facilitation in both pathways with the effect being strongest in the cortex. Nearlinear binocular additivity (instead of ocularity invariance) was also found using a perceptual luminance matching task. Ocularity invariance is, therefore, not a ubiquitous feature of visual processing, and the brain appears to repurpose a generic normalization algorithm for different visual functions by adjusting the amount of interocular suppression.

 Neuroscience
Tastes typically evoke innate behavioral responses that can be broadly categorized as acceptance or rejection. However, research in Drosophila melanogaster indicates that taste responses also exhibit plasticity through experiencedependent changes in mushroom body circuits. In this study, we develop a novel taste learning paradigm using closedloop optogenetics. We find that appetitive and aversive taste memories can be formed by pairing gustatory stimuli with optogenetic activation of sensory neurons or dopaminergic neurons encoding reward or punishment. As with olfactory memories, distinct dopaminergic subpopulations drive the parallel formation of short and longterm appetitive memories. Longterm memories are protein synthesisdependent and have energetic requirements that are satisfied by a variety of caloric food sources or by direct stimulation of MBMP1 dopaminergic neurons. Our paradigm affords new opportunities to probe plasticity mechanisms within the taste system and understand the extent to which taste responses depend on experience.