1. Neuroscience
Download icon

Amplitude modulations of cortical sensory responses in pulsatile evidence accumulation

  1. Sue Ann Koay
  2. Stephan Thiberge
  3. Carlos D Brody  Is a corresponding author
  4. David W Tank  Is a corresponding author
  1. Princeton Neuroscience Institute, Princeton University, United States
  2. Bezos Center for Neural Circuit Dynamics, Princeton University, United States
  3. Howard Hughes Medical Institute, Princeton University, United States
Research Article
  • Cited 1
  • Views 1,115
  • Annotations
Cite this article as: eLife 2020;9:e60628 doi: 10.7554/eLife.60628

Abstract

How does the brain internally represent a sequence of sensory information that jointly drives a decision-making behavior? Studies of perceptual decision-making have often assumed that sensory cortices provide noisy but otherwise veridical sensory inputs to downstream processes that accumulate and drive decisions. However, sensory processing in even the earliest sensory cortices can be systematically modified by various external and internal contexts. We recorded from neuronal populations across posterior cortex as mice performed a navigational decision-making task based on accumulating randomly timed pulses of visual evidence. Even in V1, only a small fraction of active neurons had sensory-like responses time-locked to each pulse. Here, we focus on how these ‘cue-locked’ neurons exhibited a variety of amplitude modulations from sensory to cognitive, notably by choice and accumulated evidence. These task-related modulations affected a large fraction of cue-locked neurons across posterior cortex, suggesting that future models of behavior should account for such influences.

Introduction

As sensory information about the world is often noisy and/or ambiguous, an evidence accumulation process for increasing signal-to-noise ratio is thought to be fundamental to perceptual decision-making. Neural circuits that perform this are incompletely known, but canonically hypothesized to involve multiple stages starting from the detection of momentary sensory signals, which are then accumulated through time and later categorized into an appropriate behavioral action (Gold and Shadlen, 2007; Brody and Hanks, 2016; Caballero et al., 2018). In this picture, the sensory detection stage has a predominantly feedforward role, that is, providing input to but not otherwise involved in accumulation and decision formation. However, another large body of literature has demonstrated that sensory processing in even the earliest sensory cortices can be modified by various external and internal contexts, including motor feedback, temporal statistics, learned associations, and attentional control (Roelfsema and de Lange, 2016; Gilbert and Sigman, 2007; Kimura, 2012; Gavornik and Bear, 2014; Glickfeld and Olsen, 2017; Niell and Stryker, 2010; Saleem et al., 2013; Shuler and Bear, 2006; Fiser et al., 2016; Haefner et al., 2016; Lee and Mumford, 2003; Zhang et al., 2014; Saleem et al., 2018; Makino and Komiyama, 2015; Keller et al., 2012; Poort et al., 2015; Li et al., 2004; Stănişor et al., 2013; Petreanu et al., 2012; Romo et al., 2002; Luna et al., 2005; Nienborg et al., 2012; Yang et al., 2016; Britten et al., 1996; Froudarakis et al., 2019; Keller and Mrsic-Flogel, 2018). For example, feedback-based gain control of sensory responses has been suggested as an important mechanism for enhancing behaviorally relevant signals, while suppressing irrelevant signals (Manita et al., 2015; Hillyard et al., 1998; Harris and Thiele, 2011; Azim and Seki, 2019; Douglas and Martin, 2007; Ahissar and Kleinfeld, 2003).

The above two ideas—evidence accumulation and context-specific modulations—make two different but both compelling points about how sensory signals should be processed to support behavior. The two ideas are not mutually incompatible, and insight into the brain’s specific implementation may be gained from a systematic investigation of sensory representations in the brain. To observe how each sensory increment influences neural dynamics, we utilized a behavioral paradigm with precisely initiated timings of sensory inputs that should drive an evidence accumulation process (Brunton et al., 2013). Specifically, we recorded from posterior cortical areas during a navigational decision-making task (Pinto et al., 2018; BRAIN CoGS Collaboration, 2017) where as mice ran down the central corridor of a virtual T-maze, pulses of visual evidence (‘cues’) randomly appeared along both left and right sides of the corridor. To obtain rewards, mice should accumulate the numerosities of cues, then turn down the maze arm corresponding to the side with more cues. The well-separated and randomized timing of cues allowed us to clearly identify putative sensory responses that were time-locked to each pulse, whereas the seconds-long periods over which cues were delivered allowed us to observe the timecourse of neural responses throughout a gradually unfolding decision.

Across posterior cortices, the bulk of neural activity was sequentially active vs. time in the trial, in a manner that did not depend directly on the sensory cues, as we describe in detail in another article (Koay et al., 2019). Even in the primary (V1) and secondary visual areas, only 5–15% of neurons active during the task had responses that were time-locked to sensory cues (‘cue-locked cells’). Still, it is known that remarkably small signals on the order of a few cortical neurons can influence behavior (Doron and Brecht, 2015; Buchan and Rowland, 2018; Tanke et al., 2018; Lerman et al., 2019; Carrillo-Reid et al., 2019; Marshel et al., 2019). Here, we focused on the cue-locked cells, as candidates for momentary sensory inputs that may drive an accumulation and decision-making process. The responses of these cells to cues were well-described by a single impulse response function per neuron, but with amplitudes that varied across the many cue presentations. The cue-response amplitudes of most cells varied systematically across time in the trial, as well as across trials depending on behavioral context, thus suggesting gain modulation effects potentially related to decision-making dynamics. Across posterior cortices and including as early as in V1, these variations in cue-response amplitudes contained information about multiple visual, motor, cognitive, and memory-related contextual variables. Notably, in all areas about 50% of cue-locked cells had response amplitudes that depended on the choice reported by the animal at the end of the trial, or depended on the value of the gradually accumulating evidence. Top-down feedback, potentially from non-sensory regions in which the choice is formed, has been proposed to explain choice-related effects in sensory responses (Britten et al., 1996; Romo et al., 2003; Nienborg and Cumming, 2009; Yang et al., 2016; Bondy et al., 2018; Wimmer et al., 2015; Haefner et al., 2016). The dependence on accumulating evidence that we observed supports the hypothesis that this feedback may originate from an accumulator that itself eventually drives choice.

In sum, the amplitude modulations of cue-locked responses in this report can be thought of as due to multiplicative effects (or equivalently, changes in gain) on the brain’s internal representation of individual sensory pulses. These multiplicative effects were moreover not entirely random from one cue to the next, but rather depended on task-specific factors including those that the brain presumably keeps track of using internal neural dynamics, such as the accumulated evidence. We thus suggest that psychophysical studies of pulsatile-evidence accumulation may benefit from considering that even at the earliest, sensory input stage, neural variability can have a component that is correlated across responses to multiple cues in a trial, as opposed to the independent noise often assumed under lack of knowledge otherwise. Our findings in this article point to candidate neural bases for temporally correlated noise that has been deduced from behavioral data to limit perceptual accuracy, for example in odor discrimination tasks for rats where the subject was free to continue acquiring sensory samples (Zariwala et al., 2013).

Results

We used cellular-resolution two-photon imaging to record from six posterior cortical regions of 11 mice trained in the Accumulating-Towers task (Figure 1a–c). These mice were from transgenic lines that express the calcium-sensitive fluorescent indicator GCaMP6f in cortical excitatory neurons (Materials and methods), and prior to behavioral training underwent surgical implantation of an optical cranial window centered over either the right or left parietal cortex. The mice then participated in previously detailed behavioral shaping (Pinto et al., 2018) and neural imaging procedures as summarized below.

Figure 1 with 1 supplement see all
Two-photon calcium imaging of posterior cortical areas during a navigation-based evidence accumulation task.

(a) Layout of the virtual T-maze in an example left-rewarded trial. (b) Example snapshot of the cue region corridor from a mouse’s point of view when facing straight down the maze. Two cues on the right and left sides can be seen, closer and further from the mouse in that order. (c) Illustration of the virtual viewing angle θ. The visual angle ϕcue of a given cue is measured relative to θ and to the center of the cue. The y spatial coordinate points straight down the stem of the maze, and the x coordinate is transverse. v is the velocity of the mouse in the virtual world. (d) Sigmoid curve fits to behavioral data for how frequently mice turned right for a given difference in total right vs. total left cue counts at the end of the trial, Δ#R#L. Dots: Percent of trials (out of those with a given Δ) in which mice turned right, pooling data from all mice. Error bars: 95% binomial C.I. (e) Logistic regression weights for predicting the mice’s choice given spatially-binned evidence {Δi} where i{ 1,2,3 } indexes three equally sized spatial bins of the cue region. Error bars: 95% C.I. across bootstrap experiments. (f) Average visual field sign map (n=5 mice) and visual area boundaries, with all recorded areas labeled. The visual field sign is −1 (dark blue) where the cortical layout is a mirror image and +1 (dark red) where it follows a non-inverted layout of the physical world.

Figure 1—source data 1

Data points, summary statistics, and kernel bandwidths.

https://cdn.elifesciences.org/articles/60628/elife-60628-fig1-data1-v2.zip

Mice were trained in a head-fixed virtual reality system (Dombeck et al., 2010) to navigate in a T-maze. As they ran down the stem of the maze, a series of transient, randomly located tower-shaped cues (Figure 1b,c) appeared along the right and left walls of the cue region corridor (length Lcue200 cm, average running speed in cue region 60cm/s; see Materials and methods), followed by a delay region where no cues appeared. The locations of cues were drawn randomly per trial, with Poisson-distributed mean counts of 7.7 on the majority and 2.3 on the minority side, and mice were rewarded for turning down the arm corresponding to the side with more cues. In agreement with previous work (Pinto et al., 2018), all mice in this study exhibited characteristic psychometric curves (Figure 1d) and utilized multiple pieces of evidence to make decisions, with a small primacy effect (Figure 1e).

As the timing and visual location of the tower-shaped cues are important information about the behavior that we wished to relate to the neural activity, we programmed the virtual reality software to make a given cue visible to the mouse exactly when it reached a distance of 10 cm before the cue’s location along the T-maze stem (i.e. in the y coordinate, see Figure 1c). We refer to this instant at which a cue becomes visible as the ‘onset time’ for that cue, and used it to define the behavioral timings of visual pulses in all neural data analyses. Cues were made to vanish from view after 200 ms, although 1/11 mice ran so quickly that cues occasionally fell outside of the display range of the virtual reality system before that period (cue duration was ~190 ms for that mouse, see Materials and methods and Figure 1—figure supplement 1 for details on timing precision). Lastly, as mice controlled the virtual viewing angle θ, cues could appear at a variety of visual angles ϕcue (Figure 1c). We accounted for this in all relevant data analyses, as well as conducted control experiments in which θ was restricted to be exactly zero from the beginning of the trial up to midway in the delay period (referred to as θ-controlled experiments; see Materials and methods).

For each mouse, we first identified the locations of the visual areas (Figure 1f; Materials and methods) using one-photon widefield imaging and a retinotopic visual stimulation protocol (Zhuang et al., 2017). Then, while the mice performed the task, we used two-photon imaging to record from 500μm×500μm fields of view in either layers 2/3 or 5 from one of six areas (Supplementary file 1, Supplementary file 2): the primary visual cortex (V1), secondary visual areas (V2 including anteromedial area [AM], posteromedial area [PM], medial-to-AM area [MMA], medial-to-PM area [MMP] [Zhuang et al., 2017]), and retrosplenial cortex (RSC). These fields of view were selected only to have good imaging quality (high apparent density of cells as unobscured as possible by brain vasculature), that is, prior to the start of the behavioral session and without any criteria based on neural responses. After correction for rigid brain motion, regions of interest representing putative single neurons were extracted using a semi-customized (Materials and methods) demixing and deconvolution procedure (Pnevmatikakis et al., 2016). The fluorescence-to-baseline ratio ΔF/F was used as an estimator of neural activity, and only cells with 0.1 transients per trial were selected for analysis. In total, we analyzed 10,113 cells from 143 imaging sessions, focusing on 891 neurons identified as time-locked to the visual cues as explained in the next sections.

Pulses of evidence evoke transient, time-locked responses in all recorded areas

We found neurons in all areas/layers that had activities clearly time-locked to the pulsatile cues (examples in Figure 2a–b). In trials with sparse occurrences of preferred-side cues, the activities of these cells tended to return to baseline following a fairly stereotyped impulse response. Individually, they thus represented only momentary information about the visual cues, although as a population they can form a more persistent stimulus memory (Goldman, 2009; Scott et al., 2017; Miri et al., 2011). Interestingly, the amplitudes of these cells’ responses seemed to vary in a structured way, both across time in a trial, as well as across trials where the mouse eventually makes the choice to turn right vs. left (columns of Figure 2a–b). We therefore wished to quantify whether or not these putatively sensory amplitude changes also encoded other task-related information.

Figure 2 with 1 supplement see all
Pulses of evidence evoke transient, time-locked responses that are well described by an impulse response model.

(a) Trial-by-trial activity (rows) vs. time of an example right-cue-locked cell recorded in area AM, aligned in time to the end of the cue period (dashed line). Onset times of left (right) cues in each trial are shown as red (blue) dots. (b) Same as (a), but for an atypical right-cue-locked cell (in area AM) that has some left-cue-locked responses. (c) Depiction of the impulse response model for the activity level ΔF/F of a neuron vs. time (x-axis). Star indicates the convolution operator. (d) Prediction of the impulse response model for the cell in (a) in one example trial. This cell had no significant secondary (left-cue) responses. (e) Same as (d) but for the cell in (b). The model prediction is the sum of primary (right-cue) and secondary (left-cue) responses. (f) Trial-average impulse response model prediction (purple) vs. the residual of the fit (ΔF/F data minus model prediction, black), in 10 equally sized spatial bins of the cue region. For a given cell, the average model prediction (or average residual) is computed in each spatial bin, then the absolute value of this quantity is averaged across trials, separately per spatial bin. Line: Mean across cells. Dashed line: 95% C.I. across cells. Band: 68% C.I. across cells. For comparability across cells, ΔF/F was expressed in units such that the mean model prediction of each cell is 1. The model prediction rises gradually from baseline at the beginning of the cue period due to nonzero lags in response onsets. (g) Distribution (kernel density estimate) of cue-locking significance for cells in various areas/layers. Significance is defined per cell, as the number of standard deviations beyond the median AICC score of models constructed using shuffled data (Materials and methods). Error bars: S.E.M. of cells. Stars: significant differences in means (Wilcoxon rank-sum test). (h) Percent of significantly cue-locked cells in various areas/layers. Chance: 103%. Error bars: 95% binomial C.I. across sessions. (i) Distribution (kernel density estimate) of the half-maximum onset time of the primary response, for cells in various areas. Data were pooled across layers (inter-layer differences not significant). Error bars: S.E.M. across cells. Stars: significant differences in means (Wilcoxon rank-sum test). (j) As in (i) but for the full-width-at-half-max. Statistical tests use data pooled across layers. Means were significantly different across layers for areas AM and PM (Wilcoxon rank-sum test).

Figure 2—source data 1

Data points including individual entries for histograms, summary statistics and kernel bandwidths.

https://cdn.elifesciences.org/articles/60628/elife-60628-fig2-data1-v2.zip

For a given cell, we estimated the amplitude of its response to each cue i by modeling the cell’s activity as a time series of non-negative amplitudes Ai convolved with an impulse response function (Figure 2c). The latter was defined by lag, rise-time and fall-time parameters that were fit to each cell, but were the same for all cue-responses of that cell (deconvolving calcium dynamics; see Materials and methods). For a subset of neurons, this impulse response model resulted in excellent fits when the model included only primary responses to either right- or left-side cues (e.g. Figure 2d). In much rarer instances, adding a secondary response to the opposite-side cues resulted in a significantly better fit (e.g. Figure 2e; discounting for number of parameters by using AICC[Hurvich and Tsai, 1989] as a measure of goodness of fit). We defined cells to be cue-locked if the primary-response model yielded a much better fit to the data than a permutation test (data with cue timings shuffled within the cue region, see Materials and methods). For these cells, the trial-averaged activity predicted by the impulse response model (Figure 2f, magenta) was substantially larger than the magnitude of residuals of the fits (Figure 2f, ‘data - model’ prediction in black). For example, if cells had systematic rises or falls in baseline activity levels vs. time/place that could not be explained as transient responses to cues, then the residual would grow/diminish vs. y location in the cue region. Figure 2—figure supplement 1a shows that systematic trends (i.e. slopes) for the residual vs. y was small for most cells (68% C.I. of slopes across cells were within [ 0.098,0.062 ], where a slope of ±1 corresponds to a change in residuals from the start to the end of the cue region being equal to the average signal predicted by the impulse response model). There were thus no large, unaccounted-for components in the activity of these identified cue-locked cells, in particular no components with long timescales.

Significantly cue-locked cells comprised a small fraction of the overall neural activity, but were nevertheless present in all areas/layers and exhibited some progression of response properties across posterior cortical areas in a roughly lateral-to-medial order (V1, V2, RSC). Cells with the most precisely time-locked responses to cues were found in the visual areas as opposed to RSC (high-significance tail of distributions in Figure 2g; low significance means that the model fit comparably well to data where cue timings were shuffled within the cue region). Reflecting this, about 5–15% of cells in visual areas were significantly cue-locked, compared to ~5% in RSC (Figure 2h). Of these significant cells, only ∼3% had secondary responses that were moreover much less significantly time-locked (Figure 2—figure supplement 1b); most cells responded to only contralateral cues (Figure 2—figure supplement 1c). The onset of the half-maximum response was ∼200 ms after each pulse (Figure 2i), and the response full-width-at-half-max (FWHM) was ∼100 ms but increased from V1 to secondary visual areas to RSC (Figure 2j). The impulse response model thus identified cells that follow what one might expect of purely visual-sensory responses on a cue-by-cue basis, but up to amplitude changes that we next discuss.

Cue-locked response amplitudes contain information about visual, motor, cognitive, and memory-related contextual task variables

Studies of perceptual decision-making have shown that the animal’s upcoming choice affects the activity of stimulus-selective neurons in a variety of areas (Britten et al., 1996; Nienborg and Cumming, 2009). We analogously looked for such effects (and more) while accounting for the highly dynamical nature of our task. As neurons responded predominantly to only one laterality of cues, all our subsequent analyses focus on the primary-response amplitudes of cue-locked cells. Importantly, the impulse response model deconvolves responses to individual cues, so the response amplitude Ai can be conceptualized as a multiplicative gain factor that the cell’s response was subject to at the instant at which the ith cue appeared.

We used a neural-population decoding analysis to quantify how much information the cue-locked response amplitudes contained about various contextual variables. First, for the ith cue in the trial, we defined the neural state as the vector of amplitudes Ai of cells that responded to contralateral cues only. Then using the neural states corresponding to cues that occurred in the first third of the cue period, we trained a support vector machine (SVM) to linearly decode a given task variable from these neural states (cross-validated and corrected for multiple comparisons; see Materials and methods). This procedure was repeated for the other two spatial bins (second third and final third) of the cue period, to observe changes in neural information that may reflect place-/time-related changes in task conditions (illustrated in Figure 3a). Figure 3b shows that across posterior cortex, four task variables were accurately decodable from the cue-response amplitudes: the view angle θ, running speed, the running tally of evidence (Δ#R#L), and the eventual choice to turn right or left. The reward outcome from the previous trial could also be decoded, albeit less accurately, while in contrast decoding of the past-trial choice was near chance levels.

Figure 3 with 3 supplements see all
Multiple visual, motor, cognitive, and memory-related variables can be decoded from the amplitudes of cue-locked cell responses.

(a) Example time-traces of two statistically correlated task variables, the view angle θ (black) and the eventual navigational choice (magenta). (b) Cross-validated performance for decoding six task variables (individual plots) from the amplitudes of cue-locked neuronal responses, separately evaluated using responses to cues in three spatial bins of the cue region (Materials and methods). The performance measure is Pearson’s correlation between the actual task variable value and the prediction using cue-locked cell amplitudes. Lines: mean performance across recording sessions for various areas (colors). Bands: S.E.M. across sessions, for each area. (c) Example time-traces of the two uncorrelated modes obtained from a polar decomposition of the correlated task variables in (a). This decomposition (Materials and methods) solves for these uncorrelated modes such that they were linear combinations of the original time-traces that were closest, in the least-squares sense, to the original traces, while constrained to be themselves uncorrelated with each other. Correlation coefficients between individual uncorrelated modes and their corresponding original variables were > 0.85 for all modes (Figure 3—figure supplement 1). (d) As in (a), but for decoding the uncorrelated task-variable modes illustrated in (c). (e) Proportion of imaging sessions that had significant decoding performance for the six task variables in (b) (dark gray points) and uncorrelated modes in (d) (blue points), compared to shuffled data and corrected for multiple comparisons. Data were restricted to 140/143 sessions with at least one cue-locked cell. Error bars: 95% binomial C.I. across sessions. (f) Linear regression (Support Vector Machine) weights for how much the decoding performance for uncorrelated task-variable modes in (d) depended on cortical area/layer and number of recorded cue-locked cells. The decoder accuracy was evaluated at the middle of the cue region for each dataset. The area and layer regressors are indicator variables, e.g. a recording from layer 5 of V1 would have regressor values (V1 = 1, AM = 0, PM = 0, MMA = 0, MMP = 0, RSC = 0, layer = 1). Weights that are not statistically different from zero are indicated with open circles. The negative weight for layer dependence of past-reward decoding means that layer five had significantly lower decoding performance than layers 2/3. Error bars: 95% C.I. computed via bootstrapping sessions.

As the six variables were statistically correlated by nature of the task (e.g. the mouse controls θ to execute the navigational choice, Figure 3a), indirect neural information about one variable could be exploited to increase the performance of decoding another correlated variable (Krumin et al., 2018; Koay et al., 2019). To account for this, we repeated the decoding analyses for a modified set of variables that had statistical correlations removed. As explained in the Materials and methods and illustrated in Figure 3c, we solved for uncorrelated modes being linear combinations of the original time-traces that were closest, in the least-squares sense, to the original traces, while constrained to be themselves uncorrelated with each other. As inter-variable correlations were low throughout the cue region, these uncorrelated modes were very similar to the original task variables. Each uncorrelated mode was identified with its closest original variable and labeled as such, and correlation coefficients between individual uncorrelated modes and their corresponding original variables were > 0.85 for all modes (Figure 3—figure supplement 1). Performances for decoding the uncorrelated modes were a little lower than the original task variables (Figure 3d), as expected since contributions from indirect neural information could no longer be present. Nevertheless, the modes that resembled view angle, speed, evidence, choice, and past-trial reward could all be consistently decoded across imaging sessions for all examined areas (Figure 3e). There was also comparably high performance of decoding evidence and choice in the θ-controlled experiments (Figure 3—figure supplement 2), which explicitly shows that neural information about these variables do not originate solely from changes in visual perspective. In a comparable task where choice was highly correlated with view angle (θ) and y spatial location in the maze, it has previously been reported that θ and y explains most of neural responses in parietal posterior cortex, with small gains from including choice as a third factor (Krumin et al., 2018). Interestingly however, our findings indicate that in a task where choice was distinguishable from other behavioral factors (here, at least within the cue region), there was significant neural information in all examined posterior cortical areas about this internally generated variable, choice.

As a population, the amplitudes of cue-locked cells thus reflected a rich set of present- and past-trial contextual information, with some apparent anatomical differences seen in Figure 3b,d. However, instead of the neural representation being different across different cortical regions, an alternative explanation could be that the accuracy of decoding task variable information depended on experimental factors such as the number of recorded neurons (which differed systematically across cortical areas/layers). To address this, we constructed a linear regression model to predict the decoding accuracy for various datasets as a weighted sum of a set of factors: the cortical area, layer, and number of recorded cells. The cortical area and layer regressors are indicator variables (0 or 1) that specify whether a given dataset was a recording from a particular area and layers 2/3 vs. 5. Likely due to the small numbers of recorded cue-locked cells per session (~0–10), the decoding performance for all variables depended most strongly on the number of cells (Figure 3f). Figure 3f also shows that RSC had significantly lower view angle and speed decoding performance than other regions, which we can think of as increased invariance of cue-locked response amplitudes to low-level visual parameters of the stimuli. Layer 5 was also distinguishable from layer 2/3 data in having reduced performance for decoding speed and past-trial reward.

Decision-related changes in cue-locked response amplitudes are compatible with a feedback origin

Interestingly, the response amplitudes of some individual cue-locked cells appeared to systematically depend on time (e.g. Figure 2a–b), as did the population-level decoding performance for variables such as choice (Figure 3b,d). To understand if these neural dynamics may reflect a gradually unfolding decision-making process, we turned to modeling how amplitudes of cue-locked cell responses may depend on choice and place/time, while accounting for other time-varying behavioral factors.

As a null hypothesis based on previous literature, we hypothesized that cue-response amplitudes can depend on a receptive field specified by the visual angle of the cue (ϕcue, Figure 1c), as well as running speed (Niell and Stryker, 2010; Saleem et al., 2013). Given limited data statistics, we compared this null hypothesis to three other conceptually distinct models (Materials and methods), each of which aims to parsimoniously explain cue-response amplitudes using small sets of behavioral factors. These models predict the observed cue-response amplitudes to be random samples from a Gamma distribution, where the mean of the Gamma distribution is a function of various behavioral factors at the time at which a given cue appeared. The mean functions for all models have the form ρ(ϕcue) f(v) g(), where ρ(ϕcue) is an angular receptive field function, f(v) is a running speed (v) dependence function, and g() is specific to each of the three models, as follows. First, the ‘SSA’ model parameterizes stimulus-specific adaptation (Ulanovsky et al., 2003; Sobotka and Ringo, 1994) or enhancement (Vinken et al., 2017; Kaneko et al., 2017) with exponential time-recovery in between cues. Second, the ‘choice’ model allows for a flexible change in amplitudes vs. place/time in the cue region, with a potentially different trend for right- vs. left-choice trials. Third, the ‘cue-counts’ model allows the amplitudes to depend on the running tally of #R, #L, or Δ=#R#L. This selection of models allows us to ask if cue-locked responses are sufficiently explained by previously known effects, or if after accounting for such there are still effects related to the accumulation process, such as choice or cue-count dependence.

We constructed the amplitude model prediction as the AICC-likelihood-weighted average of the above models, which accounts for when two or more are comparably good (Volinsky et al., 1999). As illustrative examples, Figure 4a shows how the amplitudes of two simultaneously recorded cue-locked cells in area AM depended on behavioral factors and compared to model predictions. There are clear differences in predictions for right- vs. left-choice trials that can also be seen in the raw amplitude data (restricted to a range of ϕcue such that angular receptive field effects are small, 2nd and 3rd columns of Figure 4a). Although both cells responded preferentially to right-side cues, they had oppositely signed choice modulation effects, defined as the difference between amplitude model predictions on contralateral- vs. ipsilateral-choice trials (Materials and methods). Figure 4b shows two more example choice-modulated cells that had near-constant angular receptive fields. We note that except for the parameters of SSA, all findings in this section were qualitatively similar in θ-controlled experiments where there can be no angular receptive field effects (Figure 4—figure supplement 1).

Figure 4 with 2 supplements see all
Cue-locked response amplitudes depend on view angle, speed, and cue frequency, but a large fraction exhibit choice-related modulations that increase during the course of the trial.

(a) Response amplitudes of two example right-cue-locked cells (one cell per row) vs. (columns) the visual angle at which the cue appeared (ϕcue), running speed (v), and y location of the cue in the cue region. Points: amplitude data in blue (red) according to the upcoming right (left) choice. Lines: AICC-weighted model mean functions for right- vs. left-choice trials (lines); the model predicts the data to be random samples from a Gamma distribution with this behavior-dependent mean function. The data in the right two columns were restricted to a subset where angular receptive field effects are small, corresponding to the indicated area in the leftmost plots. (b) Same as (a) but for two (left-cue-locked) cells with broader angular receptive fields. (c) Percentages of cells that significantly favor various amplitude modulation models (likelihood ratio <0.05, defaulting to null model if none are significant), in the indicated cortical areas and layers. For layer 2/3 data, V1 has a significantly higher fraction of cells preferring the null model than other areas (p = 0.02, two-tailed Wilcoxon rank-sum test). For layer 5 data, V1 has a significantly lower choice-model preferring fraction than the other areas (p = 0.003). (d) Distribution (kernel density estimate) of adaptation/enhancement factors for cells that favor the SSA model. A factor of 1 corresponds to no adaptation, while for other values the subsequent response is scaled by this amount with exponential recovery toward 1. Error bars: S.E.M. Stars: significant differences in means (Wilcoxon rank-sum test). (e) Comparison of the behaviorally deduced weighting of cues (green, same as Figure 1e) to the neural choice modulation strength vs. location in the cue region (for contralateral-cue-locked cells only, but ipsilateral-cue-locked cells in Figure 4—figure supplement 2g have similar trends). The choice modulation strength is defined using the amplitude-modulation model predictions, and is the difference between predicted amplitudes on preferred-choice minus anti-preferred-choice trials, where preferred choice means that the neuron will have higher amplitudes on trials of that choice compared to trials of the opposite (anti-preferred) choice. For comparability across cells, the choice modulation strength is normalized to the average amplitude for each cell (Materials and methods). Lines: mean across cue-locked cells, computed separately for positively vs. negatively choice-modulated cells (data from all brain regions). Bands: S.E.M.

To summarize the prevalence and composition of amplitude-modulation effects, we selected the best model per cell using AICC, defaulting in ambiguous cases (relative likelihood <0.05) to the null hypothesis. Figure 4—figure supplement 2a shows that there were large fractions of cells with very high AICC likelihoods for all three alternative models compared to the null hypothesis. Cells that favored the cue-counts model could also be clearly distinguished from those that favored the SSA model (Figure 4—figure supplement 2b); in fact, exclusion of cells that exhibited SSA had little effect on how well evidence and other variables could be decoded from the neural population (Figure 3—figure supplement 3). In all areas and layers, >85% of cue-locked cells exhibited some form of amplitude modulations beyond angular receptive field and running speed effects (Figure 4c). Overall, 274+4% of cells were best explained by SSA while 674+4% favored either choice or cue-counts models. The notable inter-area difference is for layer 5 data, which had a qualitatively smaller proportion of choice-model preferring cells in V1 compared to other areas (p = 0.003, Wilcoxon rank-sum test). Most cells thus exhibited some form of amplitude modulations beyond visuomotor effects, with little difference in composition across areas and layers.

Although SSA, choice, and cue-counts dependencies all predict changes in cue-response amplitudes vs. time in the trial, there were qualitative differences that distinguished SSA from choice and cue-count modulations, as we next discuss. Cells in the two largest categories, SSA and choice, had qualitatively different population statistics for how their cue-response amplitudes depended on place/time in the trial. Most cells (924+3%) that favored the SSA model corresponded to a phenotype with decreased responses to subsequent cues. Adaptation effects were weakest in V1 and stronger other areas (Figure 4d, but see Figure 4—figure supplement 1f–g for θ-controlled experiments), although the ∼0.8 s recovery timescale had no significant inter-area differences (Figure 4—figure supplement 2d). In contrast, cue-locked cells with both choice laterality preferences were intermixed in all areas and layers (Figure 4—figure supplement 2e). Also unlike the decrease in response amplitudes vs. time for cells that favored the SSA model, both subpopulations of positively and negatively choice-modulated cells exhibited gradually increasing effect sizes vs. place/time in the trial (Figure 4e for contralateral cue-locked cells, Figure 4—figure supplement 2g for ipsilateral cue-locked cells). Cells that favored the cue-counts modulated category also had qualitatively different population statistics compared to cells that exhibited SSA. Comparable proportions of cue-counts modulated cells were best explained by dependence on counts on either the contralateral side, ipsilateral side, or the difference of the two sides (Figure 4—figure supplement 2f). For (say) right-cue-locked cells, #L or Δ dependencies are not directly explainable by SSA because the modulation is by left-side cues that the cells do not otherwise respond to. The remaining time-independent #R modulation also cannot be explained by SSA, unless SSA has an infinitely long timescale. Such infinite-timescale SSA would require some additional prescription for ‘resetting’ the adaptation factor, for example at the start of each trial, because otherwise amplitudes would continue to decrease/increase throughout the ~1 hr long session (which we do not observe).

Although relationships between sensory responses and choice can arise in a purely feedforward circuit structure, because sensory neurons play a causal role in producing the behavioral choice (Shadlen et al., 1996), others have noted that this should result in similar timecourses of neural and behavioral fluctuations (Nienborg and Cumming, 2009). Instead, we observed contrasting timecourses: as each trial evolved, there was a slow increase in time in choice modulations of cue-locked responses (Figure 4e; Figure 4—figure supplement 2g), which was opposite to the behaviorally-assessed decrease in time in how sensory evidence fluctuations influenced the mice’s choice (green line in Figure 4e, which was replicated from Figure 1e). Additionally, a feedforward structure predicts that positive fluctuations in right- (left)-preferring cue-locked neurons should produce rightwards (leftwards) fluctuations in choice. Instead, we observed that about half of the cue-locked cells were modulated by choice in a manner opposite to their cue-side preference (Figure 4—figure supplement 2e). Both of these observations argue against a purely feedforward structure, and thus support the existence of feedback influences on sensory responses (Wimmer et al., 2015; Nienborg and Cumming, 2009; Haefner et al., 2016).

Discussion

Psychophysics-motivated evidence accumulation models Ratcliff and McKoon, 2008; Stone, 1960; Bogacz et al., 2006 have long guided research into how such algorithms may map onto neural activity and areas in the brain. A complementary, bottom-up approach starts from data-driven observations and formulates hypotheses based on the structure of the observations (Shadlen et al., 1996; Wimmer et al., 2015). In this direction, we exploited the mouse model system to systematically record from layers 2/3 and 5 of six posterior cortical areas during a task involving temporal accumulation of pulsatile visual evidence. A separate optogenetic perturbation study showed that all of these areas contributed to mice’s performance of the Accumulating-Towers task (Pinto et al., 2019). We reasoned that to understand how cortical areas contribute to evidence accumulation, a necessary first step is to understand the neural representation of sensory inputs to the process. In this work, we therefore focused on cue-locked cells that had sensory-like responses that is, time-locked to individual pulses of evidence, which comprised ~5–15% of active neurons in visual areas and ~5% in the RSC. These cells are candidates for sensory inputs that may feed into an accumulation process that drives behavior, but could also reflect more complex neural dynamics such as from top-down feedback throughout the seconds-long decision formation process. We characterized properties of cue-locked responses across the posterior cortex, which revealed that although we selected cells that had highly stereotypical time-courses of impulse responses to individual cues, the amplitudes of these responses varied across cue presentations in intriguingly task-specific ways.

One long-standing postulated function of the visual cortical hierarchy is to generate invariant visual representations (DiCarlo et al., 2012), for example for the visual cues regardless of viewing perspective or placement in the T-maze. On the other hand, predictive processing theories propose that visual processing intricately incorporates multiple external and internal contextual information, in a continuous loop of hypothesis formation and checking (Rao and Ballard, 1999; Bastos et al., 2012; Keller and Mrsic-Flogel, 2018). Compatible with the latter hypotheses, we observed that across posterior cortices, cue-locked cells had amplitude modulations that reflected not only visual perspective and running speed (Niell and Stryker, 2010; Saleem et al., 2013), but also the accumulated evidence, choice, and reward history (neural population decoding in Figure 3). Inter-area differences were mostly in degree (Minderer et al., 2019), with V1 having significantly lower performance for decoding view angle and choice, whereas RSC had lower decoding performance for speed but higher decoding performance for evidence (Figure 3f). We also observed an anatomical progression from V1 to secondary visual areas to RSC in terms of increasing timescales of cue-locked responses (Figure 2i–j) and increasing strengths of stimulus-specific adaptation (Figure 4d). Our results are compatible with other experimental findings of increasing timescales along a cortical hierarchy (Murray et al., 2014; Runyan et al., 2017; Dotson et al., 2018; Schmolesky et al., 1998), and theoretical proposals that all cortical circuits contribute to accumulation with intrinsic timescales that follow a progression across brain areas (Hasson et al., 2015; Chaudhuri et al., 2015; Christophel et al., 2017; Sreenivasan et al., 2014).

The amplitude modulations of cue-locked cells can be interpreted as multiplicative gain changes on otherwise sensory responses, and could be clearly distinguished from additive effects due to our experimental design with pulsatile stimuli and high signal-to-noise calcium imaging (Figure 2). While a number of other studies have quantified the presence of multiplicative noise correlations in cortical responses (Goris et al., 2014; Arandia-Romero et al., 2016; Lin et al., 2015), we showed that for most cells the amplitude variations were not random, but instead depended systematically on visuomotor and cognitive variables (Figure 4c). Relationships between sensory responses and choice can arise in a purely feedforward circuit structure (Shadlen et al., 1996), where the causal role of sensory neurons in producing the behavioral choice predicts that choice-related neural and behavioral fluctuations should have similar timecourses (Nienborg and Cumming, 2009). Incompatible with a solely feedforward circuit hypothesis, we instead observed that choice modulations of cue-locked responses increased in time (Figure 4e), whereas the behavioral influence of sensory evidence fluctuations on the mice’s choice decreased in time (Figure 1e). Both the choice- and count-modulation observations discussed here were suggestive of signals originating from an accumulator.

Our findings extend previous reports of relationships between sensory responses and perceptual decisions, termed ‘choice probability (Britten et al., 1996)’ (CP), and may constitute a form of conjunctive coding of cue and contextual information that preserves both the specificity and precise timing of responses to cues. An interesting question arises as to whether such multiplexing of cue and contextual information can cause potential interference between the different multiplexed information. For example, many evidence accumulation studies have reported positive correlations between CP and the stimulus selectivity of cells (Britten et al., 1996; Celebrini and Newsome, 1994; Cohen and Newsome, 2009; Dodd et al., 2001; Law and Gold, 2009; Price and Born, 2010; Kumano et al., 2016; Sasaki and Uka, 2009; Gu et al., 2014; Nienborg and Cumming, 2014) (for a differing view, see Zaidel et al., 2017 for analyses that better separate effects of stimulus vs. choice responses, and Zhao et al., 2020 for a recent re-analysis at the neural-population level). Translated to our task, positively correlated CP vs. stimulus preferences means that neurons that responded selectively to right cues tended to have increased firing rates when the animal will make a choice to the right. In this kind of coding scheme, increased activity in right-cue-locked cells could be due to either more right-side cues being presented or an internally generated right-choice signal, and there is no obvious way to distinguish between these two possibilities from just the activities of these cells. Our data deviates from the abovementioned CP studies in that highly contralateral-cue-selective neurons could be divided into two near-equally sized subpopulations with positive choice modulation (analogous to CP >0.5) and negative choice modulation (CP <0.5) respectively (Figure 4—figure supplement 2e). As two simultaneously recorded cells that respond to the same visual cue can be oppositely modulated (Figure 4a), these phenomena are not expected from canonical accounts of spatial- or feature/object-based attention in visual processing (Cohen and Maunsell, 2014; Treue, 2014), but rather more compatible with mixed choice- and sensory-selectivity reported in other perceptual decision-making experiments (Raposo et al., 2014).

We can conceptualize how our CP-related findings differ from previous literature by considering how choice modifies the neural-population-level representations of the visual cues, as illustrated in Figure 5 for two hypothetical neurons that both respond to right-side cues. We refer to the joint activity levels of these two hypothetical neurons as the neural (population) state. Figure 5a illustrates that when there is no cue both neurons have near-zero activity levels (gray dots), whereas when a right-side cue is present both neurons have high activity levels with some variations due to noise (purple dots). Conversely, the presence or absence of a right-side cue can be better decoded from the neural-population activity than from individual noisy neurons, by summing their activities or equivalently projecting the two-dimensional neural state onto a cue-decoding direction dcue as depicted in Figure 5a. If in addition these two neurons both have CP >0.5 (positive choice modulation), this means that the neural responses in the presence of a right-side cue can further be separated into two distinguishable distributions depending on whether the subject will eventually make a right or left behavioral choice (Figure 5b, blue or red dots for the two choices, respectively). The CP >0.5 case corresponds to both neurons having slightly higher (lower) activity levels on right (left) choice trials, which means that we can decode the subject’s behavioral choice by projecting the neural state onto a choice-decoding direction dchoice that is more or less aligned with the cue-decoding direction dcue (arrows in Figure 5b). However as noted above, collinearity of dchoice with dcue means that based on neural activities alone, there can be many cases where we cannot unambiguously decide whether the subject saw more right-side cues or will make a right behavioral choice (overlap between blue and red points in Figure 5b). This is distinct from the case—as observed in our data—where the two neurons have opposite choice modulations, for example neuron 1 has CP <0.5 (negative choice modulation) whereas neuron 2 has CP >0.5. As depicted in Figure 5c, neuron 1 now has lower activity on right-choice than left-choice trials, whereas neuron 2 has higher activity on right-choice than left-choice trials, leading to a choice-decoding direction dchoice that is orthogonal to dcue. Intuitively, if comparable proportions of sensory units are positively vs. negatively modulated by choice, the opposite signs of these modulations can cancel out when sensory unit activities are summed (projected onto dcue), leading to a readout of sensory information that is less confounded by internally generated choice signals. Our results are compatible with findings from areas MSTd and VIP of nonhuman primates that use alternative analyses (to CP) that more rigorously separates stimulus vs. choice effects on neural activity as they are behaviorally interrelated (Zaidel et al., 2017), as well as a recent re-analysis of area MT data in nonhuman primates performing an evidence-accumulation task (Zhao et al., 2020). Similar arguments have been made for how motor preparatory activity and feedback do not interfere with motor output (Kaufman et al., 2014; Stavisky et al., 2017), and how attentional-state signals can be distinguished from visual stimulus information (Snyder et al., 2018). The use of both positive and negative modulations for coding non-sensory information, such as choice here, may hint at a general coding principle that allows non-destructive multiplexing of information in the same neuronal population.

Conceptualization of how choice-related modulations can modify sensory representations at the neural-population level.

(a) Illustrated distribution of the joint activity levels (‘neural state’) of two cue-locked cells, at time-points when there is no visual cue (dark gray), vs. time-points when a cue of the preferred laterality for these cells (purple) is present. Each time-point in this simulation corresponds to different samples of noise in the two neural responses, which results in variations in the neural state (multiple dots each corresponding to a different neural state). dcue is a direction that best separates neural states for the ‘no cue’ vs. ‘cue visible’ conditions. (b) Illustrated distribution of neural states as in (a), but for time-points when a cue is present, colored differently depending on whether the mouse will eventually make a right-turn (blue) or left-turn choice. dchoice is a direction that best separates neural states for right- vs. left-choice conditions, which was chosen here to be parallel to dcue (defined as in (a)). (c) Same as (b), but for a scenario where dchoice was chosen to be orthogonal to dcue.

All in all, our neurophysiological observations in a mouse pulsatile evidence-accumulation task bears some similarity to but also notable differences with respect to an extensive body of related work on evidence accumulation tasks in nonhuman primates (NHP). We hypothesize that although a sensory evidence accumulation process may underlie the decision-making behaviors in all of these tasks, there are qualitative differences in both the nature of the tasks as well as our methods of investigation that may shed light on the differences in reported neurophysiological findings. From a methodological standpoint, the use of randomized pulsatile stimuli gives us the power to exploit the unpredictable (but known to the experimenter) timing of sensory pulses to separate stimulus responses from responses to other aspects of the behavior. One downside to this random design, together with the navigational nature of the task that the mouse controls, is that no two trials are literally identical. We thus trade off the richness of the behavior and the ability to directly identify sensory responses, with an inability to directly measure effects that require exactly repeated trials, such as noise correlations. The scope of our study should therefore be understood as being on signal responses across the posterior cortex, and we do not attempt here to report features such as noise correlations that are contingent on having a fully correct model of signal responses in order to interpret the residual as ‘noise’.

Starting from our most basic neurophysiological observation, the small fractions of cue-locked neural activity in even the visual cortices is not unexpected, because the visual inputs of the task were not tuned to elicit maximal responses from the recorded neurons. In fact, the virtual spatial environment that the mice experienced corresponds to a high rate of visual information beyond just the tower-like cues, all of which are highly salient visual inputs for performing the navigation aspect of the task and may therefore be expected to influence much of the activity in visual cortices. Our observation that choice-related variability in cue-locked cell responses were not lateralized according to brain hemisphere Figure 4—figure supplement 2e, is similar to the non-lateralized choice information in the activities of other (non-cue-locked) neurons that we report in an upcoming article (Koay et al., 2019). These findings of cells with intermixed choice preferences within the same brain hemisphere is compatible with several other rodent neurophysiological findings in evidence-accumulation tasks (Erlich et al., 2011; Hanks et al., 2015; Scott et al., 2017), but not, to the best of our knowledge, the NHP choice probability literature discussed above unless via alternative/extended analyses such as in Zaidel et al., 2017; Zhao et al., 2020. Other than analysis methodology and interspecies differences in brain architecture as a plausible cause for different neural representations of choice, we wonder if these differences could arise from choice and stimulus preferences being related in a more abstract way in our task (and other rodent behavioral paradigms) than in the NHP studies. In the Accumulating-Towers task, although the mouse should choose to turn to the side of the T-maze corresponding to the side with more cues, a navigational goal location is qualitatively different in modality from the retinotopic location of the tower-shaped cues. In contrast, in classic NHP evidence-accumulation tasks (Gold and Shadlen, 2007) the subject should saccade in the same direction as they perceive random dot motion stimulus to be along, that is perform a directly visual-direction-based action to indicate their choice. Our overall hypothesis is that if there is additional task-relevant information that have potentially abstract relationships to the visual cues to be accumulated, the brain may need to employ more complex neural representational schemes—including in as early as V1—in order to keep track of not only the momentary sensory information in visual cortices, but also various environmental and memory-based contexts in which they occur.

Materials and methods

Experiment subjects

Request a detailed protocol

All procedures were approved by the Institutional Animal Care and Use Committee at Princeton University (protocol 1910) and were performed in accordance with the Guide for the Care and Use of Laboratory Animals (National Research Council, Division on Earth and Life Studies, Institute for Laboratory Animal Research, and Committee for the Update of the Guide for the Care and Use of Laboratory Animals, 2011). We used 11 mice for the main experiments (+4 mice for control experiments), aged 2–16 months of both genders, and from three transgenic strains (see Supplementary file 2) that express the calcium-sensitive fluorescent indicator GCamp6f (Chen et al., 2013) in excitatory neurons of the neocortex:

  • Six (+2 control) mice (6 male, 2 female): Thy1-GCaMP6f (Dana et al., 2014) [C57BL/6J-Tg(Thy1-GCaMP6f)GP5.3Dkim/J, Jackson Laboratories, stock # 028280]. Abbreviated as ‘Thy1 GP5.3’ mice.

  • Five (+1 control) mice (three male, three female): Triple transgenic crosses expressing GCaMP6f under the CaMKIIα promoter, from the following two lines: Ai93-D; CaMKIIα-tTA [IgS5tm93.1(tetO−GCaMP6f)Hze Tg(Camk2atTA) 1Mmay/J (Gorski et al., 2002), Jackson Laboratories, stock #024108] (Manita et al., 2015); Emx1-IRES-Cre [B6.129S2-Emx1tm1(cre)Krj/J, Jackson Laboratories, stock #005628]. Abbreviated as ‘Ai93-Emx1’ mice.

  • One mouse (control experiments; female): quadruple transgenic crossexpressing GCaMP6f in the cytoplasm and the mCherry protein in the nucleus. both Cre-dependent, from the three lines: Ai93-D; CaMKIIα-tTA, Emx1-IRES-Cre, and Rosa26 LSL H2B mCherry [B6;129S-Gt(ROSA)26Sortm1.1Ksvo/J, Jackson Laboratories, stock #023139].

Mice were randomly assigned such that there were about the same numbers of either gender and various transgenic lines in each group (main vs. control experiments). As the Ai93-Emx1 strain had higher expression levels of the fluorescent indicator, they produced significantly higher signal-to-noise (SNR) recordings than the Thy1 GP5.3 strain, and contributed more to the layer 5 datasets (see Supplementary file 2). Strain differences in the results were small and not of a qualitative nature (Figure 2—figure supplement 1d–g, Figure 4—figure supplement 2h).

Surgery

Request a detailed protocol

Young adult mice (2–3 months of age) underwent aseptic stereotaxic surgery to implant an optical cranial window and a custom lightweight titanium headplate under isoflurane anesthesia (2.5% for induction, 1–1.5% for maintenance). Mice received one pre-operative dose of meloxicam subcutaneously for analgesia (1 mg/kg) and another one 24 hr later, as well as peri-operative intraperitoneal injection of sterile saline (0.5cc, body-temperature) and dexamethasone (2–5 mg/kg). Body temperature was maintained throughout the procedure using a homeothermic control system (Harvard Apparatus). After asepsis, the skull was exposed and the periosteum removed using sterile cotton swabs. A 5 mm diameter craniotomy approximately centered over the parietal bone was made using a pneumatic drill. The cranial window implant consisted of a 5 mm diameter round #1 thickness glass coverslip bonded to a steel ring (0.5 mm thickness, 5 mm diameter) using a UV-curing optical adhesive. The steel ring was glued to the skull with cyanoacrylate adhesive. Lastly, a titanium headplate was attached to the cranium using dental cement (Metabond, Parkell).

Behavioral task

Request a detailed protocol

After at least three days of post-operative recovery, mice were started on water restriction and the Accumulating-Towers training protocol (Pinto et al., 2018), summarized here. Mice received 1–2 mL of water per day, or more in case of clinical signs of dehydration or body mass falling below 80% of the pre-operative value. Behavioral training started with mice being head-fixed on an 8-inch Styrofoam ball suspended by compressed air, and ball movements were measured with optical flow sensors. The VR environment was projected at 85 Hz onto a custom-built Styrofoam toroidal screen and the virtual environment was generated by a computer running the Matlab (Mathworks) based software ViRMEn (Aronov and Tank, 2014), plus custom code.

For historical reasons, 3 out of 11 mice were trained on mazes that were longer (30 cm pre-cue region + 250 cm cue region + 100–150 cm delay region) than the rest of the cohort (30 cm pre-cue region + 200 cm cue region + 100 cm delay region). In VR, as the mouse navigated down the stem of the maze, tall, high-contrast visual cues appeared along either wall of the cue region when the mouse arrived within 10 cm of a predetermined cue location; cues were then made to disappear after 200 ms (see following section for details on timing precision). Cue locations were drawn randomly per trial according to a spatial Poisson process with 12 cm refractory period between consecutive cues on the same wall side. The mean number of majority:minority cues was 8.5:2.5 for the 250 cm cue region maze and 7.7:2.3 for the 200 cm cue region maze. Mice were rewarded with 4μL of a sweet liquid reward (10% diluted condensed milk, or 15% sucrose) for turning down the arm on the side with the majority number of cues. Correct trials were followed by a 3s-long inter-trial-interval (ITI), whereas error trials were followed by a loud sound and an additional 9 s time-out period. To discourage a tendency of mice to systematically turn to one side, we used a de-biasing algorithm that adjusts the probabilities of sampling right- vs. left-rewarded trials (Pinto et al., 2018). Per session, we computed the percent of correct choices using a sliding window of 100 trials and included the dataset for analysis if the maximum performance was 65%.

Functional identification of visual areas

Request a detailed protocol

We adapted methods (Garrett et al., 2014; Kalatsky and Stryker, 2003; Zhuang et al., 2017) to functionally delineate the primary and secondary visual areas using widefield imaging of calcium activity paired with presentation of retinotopic stimuli to awake and passively running mice. We used custom-built, tandem-lens widefield macroscopes consisting of a back-to-back objective system (Ratzlaff and Grinvald, 1991) connected through a filter box holding a dichroic mirror and emission filter. One-photon excitation was provided using a blue (470 nm) LED (Luxeon star) and the returning green fluorescence was bandpass-filtered at 525 nm (Semrock) before reaching a sCMOS camera (Qimaging, or Hamamatsu). The LED delivered about 2–2.5 mW/cm2 of power at the focal plane, while the camera was configured for 20–30 Hz frame rate and about 5–10 µm spatial resolution. Visual stimuli were displayed on either a 32’ AMVA LED monitor (BenQ BL3200PT), or the same custom Styrofoam toroidal screen as for the VR rigs. The screens were placed to span most of the visual hemifield on the side contralateral to the mouse’s optical window implant. The space between the headplate and the objective was covered using a custom made cone of opaque material.

The software used to generate the retinotopic stimuli and coordinate the stimulus with the widefield imaging acquisition was a customized version of the ISI package (Juavinett et al., 2017) and utilized the Psychophysics Toolbox (Brainard, 1997). Mice were presented with a 20° wide bar with a full-contrast checkerboard texture (25° squares) that inverted in polarity at 12 Hz, and drifted slowly (9°/s) across the extent of the screen in either of four cardinal directions (Zhuang et al., 2017). Each sweep direction was repeated 15 times, totaling four consecutive blocks with a pause in between. Retinotopic maps were computed similarly to previous work (Kalatsky and Stryker, 2003) with some customization that improved the robustness of the algorithms for preparations with low signal-to-noise ratios (SNR). Boundaries between the primary and secondary visual areas were detected using a gradient-inversion-based algorithm (Garrett et al., 2014), again with some changes to improve stability for a diverse range of SNR.

Two-photon imaging during VR-based behavior

Request a detailed protocol

The virtual reality plus two-photon scanning microscopy rig used in these experiments follow a previous design (Dombeck et al., 2010). The microscope was designed to minimally obscure the ∼270° horizontal and ∼80° vertical span of the toroidal VR screen, and also to isolate the collection of fluorescence photons from the brain from the VR visual display. Two-photon illumination was provided by a Ti:Sapphire laser (Chameleon Vision II, Coherent) operating at 920 nm wavelength, and fluorescence signals were acquired using a 40 × 0.8 NA objective (Nikon) and GaAsP PMTs (Hamamatsu) after passing through a bandpass filter (542/50, Semrock). The amount of laser power at the objective used ranged from ~40–150 mW. The region between the base of the objective lens and the headplate was shielded from external sources of light using a black rubber tube. Horizontal scans of the laser were performed using a resonant galvanometer (Thorlabs), resulting in a frame acquisition rate of 30 Hz and configured for a field of view (FOV) of approximately 500×500μm in size. Microscope control and image acquisition were performed using the ScanImage software (Pologruto et al., 2003). Data related to the VR-based behavior were recorded using custom Matlab-based software embedded in the ViRMEn engine loop, and synchronized with the fluorescence imaging frames using the I2C digital serial bus communication capabilities of ScanImage. A single FOV at a fixed cortical depth and location relative to the functional visual area maps was continuously imaged throughout the 1–1.5 hr behavioral session. The vasculature pattern at the surface of the brain was used to locate a two-photon imaging FOV of interest.

Identification of putative neurons

Request a detailed protocol

All imaging data were downsampled in time by a factor of 2 to facilitate analysis (i.e. 15 Hz effective frame rate), and first corrected for rigid brain motion by using the Open Source Computer Vision (OpenCV) software library function cv::matchTemplate. Fluorescence timecourses corresponding to individual neurons were then extracted using a deconvolution and demixing procedure that utilizes the Constrained Non-negative Matrix Factorization algorithm (CNMF [Pnevmatikakis et al., 2016]). A custom, Matlab Image Processing Toolbox (Mathworks) based algorithm was used to construct initial hypotheses for the neuron shapes in a data-driven way. In brief, the 3D fluorescence movie was binarized to mark significantly active pixels, then connected components of this binary movie were found. Each of these components arose from a hypothetical neuron, but a neuron could have contributed to multiple components. A shape-based matching procedure was used to remove duplicates before using these as input to CNMF. The ‘finalized’ components from CNMF were then selected post-hoc to identify those that resembled neural somata, using a multivariate classifier with a manual vetting step.

General statistics

Request a detailed protocol

We summarize the distribution of a given quantity vs. areas and layers using quantile-based statistics, which are less sensitive to non-Gaussian tails. The standard deviation is computed as half the difference between the 84% and 16% quantiles of the data points. The standard error (S.E.M.) is computed as the standard deviation divided by n where n is the number of data points. For uncertainties on fractions/proportions, we compute a binomial confidence interval using a formulation with the equal-tailed Jeffreys prior interval (DasGupta et al., 2001). The significance of differences in means of distributions were assessed using a two-sided Wilcoxon rank sum test. The p-value threshold for evaluating significance is 0.05 for all tests, unless otherwise stated.

Behavioral metrics

Request a detailed protocol

These analyses were described in a previous study (Pinto et al., 2018) and outlined here. The fraction of trials where a given mouse turned right was computed in 11 bins of evidence levels Δ#R#L at the end of each trial, and fit to a 4-parameter sigmoid function pR(Δ)=p0+B[ 1+e(ΔΔ0)/λ ]1 to obtain psychometric curves. A logistic regression model was used to assess the dependence of the mice’s choices on the spatial location of cues, that is, with factors being the evidence {Δi|i=1,2,3} computed using cues in equally sized thirds of the cue region (indexed by i). Statistical uncertainties on the regression weights were determined by repeating this fit using 1000 bootstrapped pseudo-experiments.

Precision of behavioral cue timings

Request a detailed protocol

The cue onset is defined as the instant at which a given cue is made visible in the virtual reality display, that is, when the mouse approaches 10 cm of the predetermined cue location in y, the coordinate down the stem of the maze. Given a typical mouse running speed of about 70 cm/s (Figure 1—figure supplement 1a–b) and the virtual reality display refresh rate of 85 Hz, there can be a lag of up to one frame (12 ms) or equivalently about 0.8 cm distance in the cue onset from the intended 10 cm approach definition. Regardless, the actual frame at which the cue appears was recorded in the behavioral logs and was used in all analyses.

The cues were made to vanish after 200 ms, but it is possible for a mouse to run so quickly that a given cue falls outside of the 270° virtual reality display range in less than 200 ms. Only 1/11 mice exhibited running speeds (~90 cm/s) that occasionally ran into this regime. Figure 1—figure supplement 1c–d shows that for 10/11 mice the actual duration of visibility of cues was essentially 200 ms (standard deviation <10 ms), while for the one fast mouse the cue duration was ~190 ms (standard deviation <30 ms).

We do not expect cue-responsive neurons in the visual cortices to continue responding strongly to a given cue for the entire 200 ms for which it is visible, because of the expected retinotopy of visual cortical responses and previous reports of 15°−20° receptive field radii. For a neuron with a 20° receptive field that has one edge at the cue onset location (10 cm ahead and 4 cm lateral of the mouse), the cue would fall outside of a 40° diameter receptive field within 130 ms (110 ms) if the mouse ran straight past it at 60 cm/s (70 cm/s). In sum, we expect the variability in how long a cue remains in neural receptive fields to be on the order of 10 s of milliseconds (or less for neurons with more lateralized receptive fields).

Impulse response model for cue-locked cells

Request a detailed protocol

This analysis excluded some rare trials where the mouse backtracks through the T-maze, by using only trials where the y displacement between two consecutive behavioral iterations was > −0.2 cm (including all time-points up to the entry to the T-maze arm), and if the duration of the trial up to and not including the ITI was no more than 50% different from the median trial duration in that session.

We modeled the activity of each cell as a time series of non-negative amplitudes Ai in response to the ith cue, convolved with a parametric impulse response function g(t):

ΔFF(t)=i=1mAi g(ttiτlagδτi;σ,σ)+ i.i.d. noise
(1) g(t;σ,σ)=[2/π σ+σ{et2/2σ2, t<0et2/2σ2, t0]hCa2+(t)

where {ti|i=1,,m} are the appearance times of cues throughout the behavioral session. The free parameters of this model are the lag (τlag), rise (σ) and fall (σ) times of the impulse response function, the amplitudes Ai, and small (L2-regularized) time jitters δτi that decorrelates variability in response timings from amplitude changes. hCa2+(t) is a calcium indicator response function using parameters from literature (Chen et al., 2013), which deconvolves calcium and indicator dynamics from our reports of timescales. This function is parameterized as a difference of exponentials, hCa2+(t)=(1et/τCa) et/τCa/h0, where τCa35ms, τCa300ms, and h0 is a normalization constant such that the peak of this function is 1. The per-cue time jitter parameters δτi additionally allows this model to flexibly account for some experimental uncertainty in the assumed cue onset times (see previous section). The distribution of jitter parameters that we obtained from fitting this model (as explained below) had a standard deviation of about 50 ms across neurons, which is within the expected range of behavioral timing variations. We also note that neural response timescales cannot be resolved to better than the Nyquist rate of the imaging data, (1/15Hz)/233ms.

We maximized the model likelihood to obtain point estimates of all the parameters, using a custom coordinate-descent-like algorithm (Wright, 2015). The significance of a given cell’s time-locking to cues was defined as the number of standard deviations that the impulse response model AICC score (bias-corrected Aikaike Information Criterion [Hurvich and Tsai, 1989]) lies above the median AICC of null hypothesis models where the timings { ti } of cues were randomly shuffled within the cue region. Given the ΔF/F time-series F(t) for a given cell and the predicted activity time-trace m(t) which we treat as vectors F and m respectively, the AICC score is:

AICC(F,m):=2npar+nFln(Fm2nF)+2npar(npar+1)(nFnpar1)

where nF is the number of time-points that comprise the data and npar is the number of free parameters in the model. Lastly, a small fraction of cells responded to both left- and right-side cues. We parsimoniously allowed for different impulse responses to these by first selecting a primary response (preferred-side cues) as that which yields the best single-side model AICC, then adding a secondary response if and only if it would improve the model likelihood. This criterion is exp([AICC(F,m(1))AICC(F,m(1)+m(2))]/2)0.05, where m(1) is the model prediction with only primary responses and m(1)+m(2) is the model prediction with both primary and secondary responses. We defined cells to be cue-locked if the primary response significance exceeded three standard deviations of the abovementioned null hypotheses. Other than a factor of about two reduction in the number of identified cue-locked neurons, we found no qualitative difference in our conclusions for a much stricter significance threshold of 5 standard deviations.

Decoding from cue-locked amplitudes

Request a detailed protocol

The decoding models were fit separately using responses to cues in three equally sized spatial bins of the cue region. We defined the neural state response as the vector of contralateral-cue-locked cell response amplitudes to a given cue, and used a Support Vector Machine classifier (SVM) to predict a task variable of interest from this neural state (using data across trials but restricted to responses to cues in a given third of the cue region, as mentioned). To assess the performance of these classifiers using threefold cross-validation, we trained the SVM using 2/3rds of the data and computed Pearson’s correlation coefficient between the predicted and actual task variable values in the held-out 1/3rd of the data. Significance was assessed by constructing 100 null hypothesis pseudo-experiments where the neural state for a given epoch bin was permuted across trials, that is preserving inter-neuron correlations but breaking any potential relationship between neural activity and behavior.

To correct for multiple comparisons when determining whether the decoding p-value for a particular dataset was significant, we used the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995) as follows. For a given type of decoder, we sorted the p-values of all data points (spatial bins and imaging sessions) in ascending order, [ p1,p2,...,pn ], and found the first rank iα such that piαiα×0.05/n. The decoding performance was then considered to be significantly above chance for all ppiα.

Uncorrelated modes of task variables

Request a detailed protocol

We wished to define a set of uncorrelated behavioral modes such that the original set of six task variables are each a linear combination of these modes, with the additional requirement that each mode should be as similar as possible to one of the task variables. In matrix notation, this means that we want to solve:

argminY YXFs.t.YY=I

where each column of X corresponds to values of a given task variable across trials, each column of  Y are the uncorrelated behavioral modes, and AF(ijAij2)1/2 is the Frobenius norm of a matrix  A. This can be computed using polar decomposition (Higham, 1988): X=YH, where Y is an orthogonal matrix and H a symmetric matrix. To obtain the polar decomposition, we used an algorithm based on the singular value decomposition X=UΣV, which gives the solution Y=UV.

Amplitude modulation models

Request a detailed protocol

These models used as input the following behavioral data: ti is the onset time of the ith cue, which is located at distance yi along the cue region, and appears at a visual angle ϕcue(ti) relative to the mouse (Figure 1c). Δ(ti) is the cumulative cue counts (explained further below) up to and including cue i, and C(ti) is the upcoming choice of the mouse in that trial. viv(ti) is the running speed of the mouse in the virtual world at the time that the ith cue appeared, and for simple linear speed dependencies explained below, the standardized version v˜(t)[ v(t)Q50%v] / [Q90%vQ10%v ] is used, where Qpv is the p probability content quantile of the speed distribution.

To account for the stochastic and nonnegative nature of pulsatile responses, the cue-locked cell response amplitudes Ai were modeled as random samples from a Gamma distribution, Ai | μA(ti),kΓ[ k,μA(ti)/k ]. The shape parameter k for the Gamma distribution is a free parameter, and furthermore indexed by choice for the choice model. The four models discussed in the text are defined by having different behavior-dependent mean functions μA(ti) that have the following forms (detailed below):

(2) μA(ti)=ρ[ϕcue(ti)]×{sv[v(ti)]null hypothesis[1+φv~(ti)]h[μA(ti1),titi1]SSA[1+φcv~(ti)]syc(yi)choice[1+φv~(ti)]sΔ[Δ(ti)]cue-counts

In all of the models, ρ(ϕcue) is an angular receptive field function that has either a skew-Gaussian (Priebe et al., 2006) or sigmoidal dependence on ϕcue:

ρ(ϕ)={exp(12(ϕϕ0)2[σ+ζ (ϕϕ0)]2)exp(1ζ2)skew-Gaussian1ρ02+ρ0[1+exp([ϕϕ0]/ζ)]νsigmoid

ϕ0,σ,ζ,ρ0 and ν are all free parameters, and either the skew-Gaussian or sigmoidal hypotheses are selected depending on which produces a better fit for the cell (using the AICC score as explained below).

All the models also have a speed dependence that multiplies the angular receptive field function. For the null hypothesis, we allowed this to be highly flexible so as to potentially match the explanatory power of the other models (which have other behavioral dependencies). Specifically, the function sv(v) is defined to be a cubic spline (piecewise 3rd-order polynomial [Gan, 2004]) with control points at five equally-spaced quantiles of the running speed distribution, that is, at v={ Q0v,Q25%v,Q50%v,Q75%v,Q100%v }. A cubic spline model has as many free parameters as the number of control points. For the other models, we used a simple linear parameterization for speed dependence, 1+ψ v~ where ψ is a free parameter (for the choice model, there are two free parameters ψc where C indexes the choice).

The SSA, choice, and cue-counts models are further distinguished by how they depend on the h, syc, and sΔ functions respectively. For the SSA model:

h[μA(ti1),titi1]=1+[ξ μA(ti1)1]exp[(titi1)/λ]

The response to the first cue in the session is defined to be μA(t1)=1. The h function can be understood as follows. Right after the cue at ti1, the response is scaled by the free parameter ξ, that is, the new response level is ξ μA(ti1) where ξ>1 corresponds to facilitation and corresponds to depression. This facilitation/depression effect decays exponentially with time toward 1, that is, the amount by which the response μA(ti). deviates from one is equal to the deviation (from 1) of the facilitated/depressed response ξ μA(ti1)1, multiplied by the time-recovery factor exp[(titi1)/λ]. Here λ is another free parameter that specifies the timescale of recovery.

The choice model has smooth dependencies on y location on the cue region parameterized by choice. This is given by two functions syc(y) where C indexes either the right or left choice, and each of these functions is a cubic spline with control points at y={ 0,0.5Lcue,Lcue } (recall that Lcue is the total length of the cue region).

Lastly, the cue-counts model also has smooth dependencies on cue counts Δ, that is, the function sΔ(Δ) is a cubic spline. As the responses of cells can depend on counts on either the right, left, or both sides (Scott et al., 2017), we allowed Δ to be either the cumulative right or cumulative left cue counts (control points are at Δ={ 0,3,8 }), or the cumulative difference #R#L in cue counts (control points are at Δ={ 4,0,4 }). The best definition of Δ was selected per cell according to which produced the best AICC score.

Because neural activity can be very different in the rare cases where the mouse halts in the middle of the cue region, only data where the speed v is within 25% of its median value were included in the analysis of this model. Point estimates for the model parameters were obtained by minimizing the Gamma-distribution negative log-likelihood:

lnL=i(1k)lnAi+AiμA(ti)+klnμA(ti)+lnΓ(k)

Because the Gamma distribution is defined only in the positive domain, we had to make an assumption about how to treat data points where Ai=0. We reasoned that we could substitute these with a noise-like distribution of amplitudes, which were obtained by fitting the impulse response model (Equation 1) using the same cue timings but simulated noise-only data, which comprised of a ΔF/F time-series drawn i.i.d. from a Gaussian distribution with zero mean and standard deviation being σF, the estimated fluorescence noise level for that cell. The relative AICC-based likelihood used for model selection as described in the text, is exp([AICC(model 1)AICC(model 2)]/2).

Choice modulation strength

Request a detailed protocol

The location-dependent choice modulation strength for cue-locked amplitudes is defined as δAchoice(y)=[Acontrachoice(y)Aipsichoice(y)]A, where Acontrachoice(y)μA(y; C=contralateral choice) as in Equation 2, and analogously for ipsilateral choices. This is computed by evaluating the amplitude model prediction vs. location in the cue region, but at fixed ϕcue corresponding to zero view angle (+22° for right-side cues and 22° for left-side cues) and Δ=0. The normalization constant is:

A=(12LcueC{R,L}0Lcue1max[ACchoice(y),σF] dy)1

Data availability

A condensed set of imaging and behavioral data as well as secondary results from analyses and modeling have been deposited in Dryad with the DOI: https://doi.org/10.5061/dryad.tb2rbnzxv. This dataset contains all of the information required to reproduce the figures in the manuscript. As the full, raw data generated in this study is extremely large, access to these raw data can be arranged upon reasonable request to the authors.

The following data sets were generated
    1. Koay SA
    2. Thiberge SY
    3. Brody CD
    4. Tank DW
    (2020) Dryad Digital Repository
    Amplitude modulations of cortical sensory responses in pulsatile evidence accumulation.
    https://doi.org/10.5061/dryad.tb2rbnzxv

References

    1. Doron G
    2. Brecht M
    (2015) What single-cell stimulation has told us about neural coding
    Philosophical Transactions of the Royal Society B: Biological Sciences 370:20140204.
    https://doi.org/10.1098/rstb.2014.0204
  1. Book
    1. Gan LK
    (2004)
    Interpolation: Cubic Spline Interpolation and Hermite Interpolation
    Cubic Spline.
  2. Conference
    1. Higham NJ
    (1988)
    Matrix nearness problems and applications
    Citeseer.
  3. Conference
    1. Lerman GM
    2. Gill JV
    3. Rinberg D
    4. Shoham S
    (2019) Precise optical probing of perceptual detection
    Biophotonics Congress: Optics in the Life Sciences Congress 2019 (BODA,BRAIN,NTM,OMA,OMP), BM3A.2. Optical Society of America.
    https://doi.org/10.1101/456764

Decision letter

  1. Emilio Salinas
    Reviewing Editor; Wake Forest School of Medicine, United States
  2. Michael J Frank
    Senior Editor; Brown University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This study investigates how sensory representations in visual cortex are modulated by ongoing task requirements as rats navigate a virtual environment and make a choice based on the total numbers of discrete stimuli, or 'pulses,' seen along the path. The main finding is that only a small fraction of active neurons had sensory-like responses time-locked to each pulse, and furthermore, for those that did, the amplitude of the response changed systematically as the impending choice advanced to completion. This shows that, even at a very basic level, the representation of sensory stimuli is strongly modulated and shaped by cognitive factors and behavioral relevance, and that a lot of the variability associated with sensory activity is not just random noise, as it often appears.

Decision letter after peer review:

Thank you for submitting your article "Amplitude modulations of sensory responses, and deviations from Weber's Law in pulsatile evidence accumulation" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Michael Frank as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require additional new data or analyses, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional work and report on how it affects the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

This manuscript carefully studies the properties of sensory responses in several visual areas during performance of a task in which head-fixed mice run along a virtual corridor and must turn toward the side that has more visual cues (small towers) along the wall. The results provide insight into the mechanisms whereby sensory evidence is accumulated and weighted to generate a choice, and on the sources of variability that limit the observed behavioral performance. All reviewers thought the work was generally interesting, carefully done, and novel.

However, the reviewers' impression was that the manuscript as it stands is very dense. In fact, it is largely two studies with different methods and approaches rolled into one. The first one (physiology) is still dense but less speculative and with interesting, solid results, and the revisions suggested by the reviewers should be relatively straightforward to address. In contrast, the modeling effort is no doubt connected to the physiology, but it really addresses a separate issue. The general feeling was that this material is probably better suited for a separate, subsequent article, for two reasons. First, because it will require substantial further work (see details below), and second, because it adds a fairly complex chapter to an already intricate analysis of the neurophysiological data.

So, going forward, we suggest that the authors revise the neurophys analyses along the lines suggested below (largely addressing clarity and completeness), leaving out the modeling study for a later report. If, however, the authors wish to maintain the current structure, they should address all the comments, and understand that we would reconsider the manuscript's suitability for publication after full re-review.

Revisions for this paper:

1) More should be done to highlight how very different the sensory representation is in this study compared with the great majority of earlier related work in the primate. This merits at least some discussion, and optimally, additional analyses of correlations in the data. See the comments from reviewer 3 for details.

2) Figure 4E was confusing. What is the point of showing the shades (which extend very far)? If the idea is to contrast the SSA and feedback models, then it would be better to plot their corresponding effects directly, on the same graph, or to show predictions versus actual data in each case, in two graphs. In any case, the data need to be shown in a different way, or the point made differently.

Similarly for Figure 3F. Could the authors explain how each point is calculated? I was specially confused about the meaning of the points for each area in the x-axis.

3) The prediction about the Fano Factor (FF) is problematic in a couple of ways. First, it seems to come out of the blue because Figure 5 is described before any discussion of the variability in the model is presented (except for the dice in the model schematic).

And second, the FF prediction itself is verified for a very small fraction of neurons even when an unusual pValue of 0.1 is used. Furthermore, the mathematical derivation relies on an Taylor series around NR ~ 0? (In most of the paper, CLT is invoked on the assumption NR is "large"). Due to the lack of transparency of this prediction and the mild support, the authors could consider dropping it, at least from the main manuscript.

4). The 1st prediction in Figure 5 seems very unspecific. In particular, it would seem like any "open loop" modulation of the cue-locked response which depended on time, or on location along the track, would induce a trend like the one assessed in Figure 5C-E. It is not clear this is a prediction specific to the multiplicative-feedback model the authors are advocating for.

5) The number of cells showing responses consistent with the model (Figure 5E) seems very small (~15% of 5-10% of cells with cue-locked responses). Could they really underlie the behavioral effects? The authors could perhaps comment on this.

6) It wasn't completely clear how the time of a particular cue onset was defined. In a real environment the cues would appear small (from afar) and get progressively bigger as the animal advances (at least if they are 3D objects, as depicted in Figure 1). What would be the cue onset in that case, and does the virtual environment work in the same way? This is probably not a serious issue, but it comes across as a bit at odds with the supposed "pulsatile" nature of the sensory stream, and would seem somewhat different from the auditory case with clicks.

A related question concerns multiple references to cue timing made in the Introduction, as if such timing were very precise. This seems strange given that all time points depend on the running speed of the mice, which is surely variable. So, how exactly is cue position converted to cue time, and why is there an assumption of very low variability? Some of this detail may be in previous reports, but it would be important to make at least a brief, explicit clarification early on.

Revisions expected in follow-up work:

For details, see comments 1-3 from reviewer 2 and comment 1 from reviewer 1, below.

Reviewer #1:

This study investigates the responses of neurons in the parietal cortex of mice (recorded via two-photon Ca imaging) performing a virtual navigation task, and then relates their activity to the animal's psychophysical performance. It is essentially two studies rolled into one. The analysis of neurophysiological activity in the first part shows that visually driven responses in the recorded "cue cells" are strongly modulated by the eventual choice and/or by the integrated quantity that defines that choice (the difference in left vs. right stimulus counts), as well as by other task variables, such as running speed. The model comparison study of the second part shows that, in the context of a sensory-motor circuit for performing the task, this type of feedback may account for subtle but robust psychophysical effects observed in the mice from this study and in rats from previous studies from the lab. Notably, the feedback explains intriguing deviations in choice accuracy from the Weber-Fechner law.

Both parts are interesting and carefully executed, although both are pretty dense; there are a ton of important technical details at each step. I wonder if this isn't too much for a single study. Had I not been reading it as a reviewer, I probably would have stopped after Figure 4 or just skimmed the rest. After that, the motivation, methods, and analyses shift markedly. I'm not pushing hard on this issue, but I think the authors should ponder it.

Other comments:

1) Figure 6 and the accompanying section of the manuscript investigate a variety of models with different architectures (feedback vs. purely feedforward) and noise sources. Here, if I understood correctly, the actual cue-driven responses are substituted with variables that are affected by different types of noise. It is this part that I found a bit disconnected from the rest, and somewhat confusing.

Here, there's a jump from the actual cells to model responses. I think this needs an earlier and more explicit introduction. It is clear what the objective of the modeling effort is; what's unclear are the elements that initially go into it. This is partly because the section jumps off with a discussion about accumulator noise, but the modeling involves many more assumptions (i.e., simplifications about the inputs to the accumulators).

What I wondered here was, what happened to all the variance that was carefully peeled away from the cue driven responses in the earlier part of the manuscript? Were the dependencies on running speed, viewing angle, contra versus ipsi sensitivity, etc still in play, or were the modeled cue-driven responses considering just the sensory noise from the impulse responses? I apologize if I missed this. I guess the broader question is how exactly the noise sources in the model relate to all the dependencies of the cue cells exposed in the earlier analyses.

Overall, my general impression is that this section requires more unpacking; perhaps it should become an independent report.

Reviewer #2:

In this manuscript, the authors present an in-depth analysis of the properties of sensory responses in several visual areas during performance of an evidence-accumulation task for head-fixed running mice (developed and studied by the authors previously), and of how these properties can illuminate aspects of the performance of mice and rats during pulsatile evidence accumulation, with a focus on the effect of "overall stimulus strength" on discriminability (Weber-Fechner scaling).

The manuscript is very dense and presents many findings, but the most salient ones are a description of how the variability in the large Ca++ transients evoked by the behaviourally-relevant visual stimuli (towers) are related to several low-level behavioural variables (speed, view) and also variables relevant for the task (future choice, running count of accumulated evidence), and a framework based on multiplicative-top down feedback that seeks to explain some aspects of this variability and ultimately the psychophysical performance in the accumulating-towers task. The first topic is framed in the context of the literature on choice-probability, and the second in the context of "Weber-Fechner" scaling, which in the current task would imply constant performance for given ratios of Left/Right counts as their total number is varied.

Overall, the demonstration of how trial to trial variability is informative about various relevant variables is important and convincing, and the model with multiplicative feedback is elegant, novel, naturally motivated by the neural data, and an interesting addition to a topic with a long-history.

1) Non-integrable variability. In addition to 'sensory noise' (independent variability in the magnitude of each pulse), it is critical in the model to include a source of variability whose impact does not decay through temporal averaging (to recover Weber-Fechner asymptotically for large N). This is achieved in the model by positing trial-to-trial variability (but not within-trial) in the dot product of the feedforward (w) and feedback (u) directions. But the way this is done seems to me problematic:

The authors model variability in w*u as LogNormal (subsection “Sources of noise in various accumulator architectures”). First, the justification for this choice is incorrect as far as I can tell. The authors write: "We model m̂R with a lognormal distribution, which is the limiting case of a product of many positive random variables". But neither is the dot product of w and u a product (it's a sum of many products), nor are the elements of this sum positive variables (the vector u has near zero mean and both positive and negative elements allowing different neurons to have opposite preferences on choice – see e.g., in the subsection “Cue-locked amplitude modulations motivate a multiplicative feedback-loop circuit model” where it is stated that ui<0 for some cells), nor would it have a LogNormal distribution even if the elements of the sum were indeed positive. Without further assumptions, the dot product w*u will have a normal distribution with mean and variance dependent on the (chosen) statistics of u and w.Two conditions seem to be necessary for u*w: it should have a mean positive but close to zero (if it's too large a(t) will explode), and it should have enough variability to make non-integrable noise have an impact in practice. For a normal distribution, this would imply that for approximately half of the trials, w*u would need to be negative, meaning a decaying accumulator and effectively no feedback. This does not seem like a sensible strategy that the brain would use.

The authors should clarify how this LogNormality is justified and whether it is a critical modelling choice (as an aside, although LogNormality in u*w allows non-negativity, low mean and large variability, the fact that it has very long tails sometimes leads to instability in the values of a(t)).

2) Related to this point, it would be helpful to have more clarity on exactly what is being assumed about the feedback vector u. The neural data suggests u has close to zero mean (across neurons). At the same time, it is posited that u varies across trials ("accumulator feedback is noisy") is and that this variability is significant and important (previous comment). However, it would seem like neurons keep their choice preference across trials, meaning the trial to trial variability in each element of u has to be smaller than the mean. The authors only describe variability in u*w (LogNormal), but, in addition to the issues just mentioned about this choice, what implications does this have for the variability in u? The logic of the approach would greatly increase if the authors made assumptions about the statistics of u consistent with the neural data, and then derived the statistics of u*w.

3) Overall, it seems like there is an intrinsically hard problem to be solved here, which is not acknowledged: how to obtain large variability in the effective gain of a feedback loop while at the same time keeping the gain "sufficiently restricted", i.e., neither too large and positive (runaway excitation) nor negative (counts are forgotten). While the authors avoid worrying about model parameters by fitting their values from data (with the caveats discussed above), their case would become much stronger if they studied the phenomenology of the model itself, exposing clearly the computational challenges faced and whether robust solutions to these problems exist.

Reviewer #3:

This manuscript describes measurements of neuronal activity in mice performing a discrimination task, and a new model that links these data to psychophysical performance. The key element of the new model is that sensory neurons are subject to gain modulations that evolve during each trial. They show that the model can produce pure sensory integration, Weber-Fechner performance, or intermediate states that nicely replicate the behavioral observations. This is an interesting and valuable contribution.

My only significant comment relates to the Discussion, which should do more to make sure the reader understands how very different the sensory representation is in this study compared with the great majority of earlier related work in the primate:

First, choice related signals are not systematically related to stimulus preferences (no Choice Probability). This is mentioned, but only very briefly.

Second, there appears to be no relationship between stimulus preference (visual field in this case) and noise correlation. Unfortunately, this emerges from the model fits, not an analysis of data. But is an important difference with profound implications for how the coding of information is organized. It really needs a discussion. It should also be supported by an analysis of correlations in the data. I know some people argue that 2 photon measures make this difficult, but if that's true then surely they can’t be used to support a model in which correlations are a key component.

https://doi.org/10.7554/eLife.60628.sa1

Author response

Revisions for this paper:

To our understanding, items 3-5 listed in this section of the decision letter are only relevant for the accumulator modeling work, and we have therefore moved them to the next section.

1) More should be done to highlight how very different the sensory representation is in this study compared with the great majority of earlier related work in the primate. This merits at least some discussion, and optimally, additional analyses of correlations in the data. See the comments from reviewer 3 for details.

We have replaced the last half of the Discussion, which used to be about the accumulator circuit models, with an extended discussion of how the neural responses to visual cues in our task relate to and differ from previous work on nonhuman primates (NHP) and rodents. Most of this discussion concerns points brought up by reviewer 3 (please see the specific reply to reviewer 3 below for details). In particular, we discuss how the subject’s eventual choice modifies sensory representations, which we illustrate in an added conceptual Figure 5. The last Discussion paragraph provides a more general comparison between our work vs. other rodent work and the NHP literature. With respect to reviewer 3’s request for analyses of correlations in the data, we have refrained from doing so because we do not think that we can make correct claims about noise correlations in our data. The reason is the nature of the behavior which could not have truly repeated trials. Please see the reply to reviewer 3 for details.

2) Figure 4E was confusing. What is the point of showing the shades (which extend very far)? If the idea is to contrast the SSA and feedback models, then it would be better to plot their corresponding effects directly, on the same graph, or to show predictions versus actual data in each case, in two graphs. In any case, the data need to be shown in a different way, or the point made differently.

We have replaced Figure 4E in the revised manuscript to provide a direct comparison of the neural vs. behavioral timecourses. We wanted the timecourse of neural choice modulations in Figure 4E to be compared to the timecourse of how cues influenced the behavioral performance data in Figure 1E. Perhaps because these panels were separated by many figures, it is not obvious what the reader should take away by the time they come to Figure 4E.

Similarly for Figure 3F. Could the authors explain how each point is calculated? I was specially confused about the meaning of the points for each area in the x-axis.

We have added a more detailed explanation of the computation of Figure 3F to both the text and the caption, as follows. The goal of Figure 3F is to address the visible differences across brain areas, in Figure 3B, D, in how well various task variables could be decoded from the amplitudes of a population of cue-locked cells. We wanted to know if these differences were indeed region-specific differences or whether they could be explained by differences in the number of recorded neurons (which differed systematically across cortical areas/layers). To do this we constructed a linear regression model to predict the decoding performance (evaluated in the middle of the cue region for each dataset) as a weighted sum of a set of factors being the x-axis coordinates in Figure 3F. The cortical area and layer regressors had values of either 0 or 1 depending on whether the dataset was for the stated area and layer, e.g. a recording from layer 5 of V1 would have regressor values (V1=1, AM=0, PM=0, MMA=0, MMP=0, RSC=0, layer=1). This explanation is now in the text as the last paragraph of the subsection “Cue-locked response amplitudes contain information about visual, motor, cognitive, and memory-related contextual task variables”.

6) It wasn't completely clear how the time of a particular cue onset was defined. In a real environment the cues would appear small (from afar) and get progressively bigger as the animal advances (at least if they are 3D objects, as depicted in Figure 1). What would be the cue onset in that case, and does the virtual environment work in the same way? This is probably not a serious issue, but it comes across as a bit at odds with the supposed "pulsatile" nature of the sensory stream, and would seem somewhat different from the auditory case with clicks.

We indeed neglected to provide this information while introducing the task, and have added this now as a third paragraph in the Results, as well as details on the following points in the Materials and methods. In summary, the “cue onset” is defined as the instant at which the cue is made visible in the virtual reality display, which is when the mouse approaches within 10cm of a predetermined cue location.

A related question concerns multiple references to cue timing made in the Introduction, as if such timing were very precise. This seems strange given that all time points depend on the running speed of the mice, which is surely variable. So, how exactly is cue position converted to cue time, and why is there an assumption of very low variability? Some of this detail may be in previous reports, but it would be important to make at least a brief, explicit clarification early on.

We have precise experimental control over the onset time of cues as we now explain in the Materials and methods (“Precision of behavioral cue timings”), and the cues were made to disappear from view in 200ms. However, there is some variability in how long any one cue will remain in the visual field of the mouse, which as the reviewer correctly noted, depends on how it runs down the maze. This variability is small except for one mouse that had a much higher running speed than other mice, and we have added Figure 1—figure supplement 1 to quantify these behavior-induced variations. Regardless, for the above reasons as well as complications due to neurons having limited receptive fields, we had included in the cue-locked response model small timing jitter parameters that allowed the model to flexibly account for some timing uncertainty in neural responses with regards to the assumed cue onset times. Insofar as we can think of, only the cue-locked response model depends on knowing the precise timings of cues, and the distribution of jitter parameters across the model fits for cue-locked neurons had a standard deviation of about 50ms, which is ballpark what we expected.

Reviewer #3:

This manuscript describes measurements of neuronal activity in mice performing a discrimination task, and a new model that links these data to psychophysical performance. The key element of the new model is that sensory neurons are subject to gain modulations that evolve during each trial. They show that the model can produce pure sensory integration, Weber-Fechner performance, or intermediate states that nicely replicate the behavioral observations. This is an interesting and valuable contribution.

My only significant comment relates to the Discussion, which should do more to make sure the reader understands how very different the sensory representation is in this study compared with the great majority of earlier related work in the primate:

We have added a last paragraph to the Discussion regarding some overall points where we believe our findings could be surprising compared to the primate work: (a) the small fractions of cue-locked neural activity even in V1; (b) choice modulations that are not lateralized by brain hemisphere; (c) the prevalence of many types of cue-locked amplitude modulations.

First, choice related signals are not systematically related to stimulus preferences (no Choice Probability). This is mentioned, but only very briefly.

To the best of our knowledge of the primate literature, choice probability (CP) values that correspond to higher firing rates in trials where the subject will make a choice opposite​​ to the stimulus preference (CP < 0.5) could also have been detected as significant, albeit we have not been able to find reports of this other than in the recent re-analysis of monkey data by Zhao et al., 2020). If we guess correctly that CP < 0.5 is what the reviewer meant by “choice related signals [that] are not systematically related to stimulus preferences”, then we referred to such phenomena in the Discussion as cue-locked cells that have negative choice modulations as opposed to “no CP” (since these choice modulations were consistent across trials). In other words, rather than individual cue-locked cells having no CP, we observed that a substantial fraction of them had highly significant CP. Where our results differ from the primate work is in the statistics across the population of cue-locked cells, which had comparable fractions with positive (CP > 0.5) and negative (CP < 0.5) choice modulations, as opposed to the primate work where mostly CP > 0.5 results have been reported. We have added a conceptual Figure 5 as well as expanded upon these differences between our and previous work in the second-to-last paragraph of the Discussion. However it is possible that we have misunderstood the reviewer’s comment, in which case we ask for some more clarification.

Second, there appears to be no relationship between stimulus preference (visual field in this case) and noise correlation. Unfortunately, this emerges from the model fits, not an analysis of data. But is an important difference with profound implications for how the coding of information is organized. It really needs a discussion. It should also be supported by an analysis of correlations in the data. I know some people argue that 2 photon measures make this difficult, but if that's true then surely they can’t be used to support a model in which correlations are a key component.

We hope that we have not erroneously claimed any results about noise correlations in the paper, and would like to know where this was implied so that we can be more careful about the wording. We had in the past wished to perform direct analyses of noise correlations, but then realized that this was extremely difficult because our behavioral task has no exactly repeated trials that we could use to remove the effect of signal correlations. In particular, we did have in the task design multiple trials per session with exactly the same spatial configuration of cues, but unfortunately since the mice could run down the T-maze in different ways in each trial, we fear that differences in running speed, view angle etc. could result in signal-induced variability across these trials. Insofar as we can think of, we could try to subtract signal variance using a computational model, but then of course any results we obtain for noise correlations would be contingent on how well the model captures signal effects at a timepoint-by-timepoint level. We therefore feel that the soundness of any claims that we could try to make on noise correlations would be under question.

Revisions expected in follow-up work:

For details, see comments 1-3 from reviewer 2 and comment 1 from reviewer 1.

3) The prediction about the Fano Factor (FF) is problematic in a couple of ways. First, it seems to come out of the blue because Figure 5 is described before any discussion of the variability in the model is presented (except for the dice in the model schematic).

And second, the FF prediction itself is verified for a very small fraction of neurons even when an unusual pValue of 0.1 is used. Furthermore, the mathematical derivation relies on an Taylor series around NR ~ 0? (In most of the paper, CLT is invoked on the assumption NR is "large"). Due to the lack of transparency of this prediction and the mild support, the authors could consider dropping it, at least from the main manuscript.

The FF measurement was indeed very difficult to perform because of insufficient statistics (that’s why we used a pValue threshold of 0.1 for testing effect sizes). This result is more of a consistency check in the sense that we didn’t observe a phenomenon that contradicted​ predictions of the theoretical model, but as noted neither do we have strong statistical support for predictions of the model. We will move the FF analysis to the supplement and word the text to indicate the difficulty of this measurement.

4) The first prediction in Figure 5 seems very unspecific. In particular, it would seem like any "open loop" modulation of the cue-locked response which depended on time, or on location along the track, would induce a trend like the one assessed in Figure 5C-E. It is not clear this is a prediction specific to the multiplicative-feedback model the authors are advocating for.

The reviewer is correct that it is always possible to write down models with ad hoc time- or location-dependent scaling of cue-locked responses (in fact, we included ad hoc location-dependent scaling in all models for the mouse data). However, what we wished to do with the feedback-loop model was to propose a neural circuit origin for the observed amplitude scaling trends, and also the specific prediction is that the cue-response amplitudes should depend on the accumulated number of cues, not time or location. We will explain more clearly in the text that this is one of the reasons why the Figure 5C-E trends were made using only neural responses to the last cue in a trial and only including trials where that occurred in the last third of the cue region, so that the spatial location of the cue along the track is kept as similar as possible. We should also add a supplementary figure where the time at which the last cue occurs (which is highly correlated with its location along the track given the stereotypical running patterns of mice) is similarly restricted.

5) The number of cells showing responses consistent with the model (Figure 5E) seems very small (~15% of 5-10% of cells with cue-locked responses). Could they really underlie the behavioral effects? The authors could perhaps comment on this.

We agree with the reviewer that these are important points to discuss in the upcoming paper. Regarding the small fraction of active cells with cue-locked responses, it is indeed intriguing that only a small fraction of neural activity even in V1 were cue-locked, but since each of our recordings only includes a very small piece of brain tissue, and 98% of recordings had at least 1 cue-locked cell (despite us having selected imaging locations agnostic to any neural analyses), the total amount of signal in the cortex can be large. Our finding of cue-locked cells in many posterior cortical regions as well as both layers 2/3 and 5 also implies that somehow the wiring of the brain allows for this small fraction of sensory-like information to be transmitted in a widespread manner (e.g. found in the retrosplenial cortex), and we might perhaps speculate that this need not have been the case for neural signals that are too weak to drive behavior.

On the small fraction of evidence-modulated cue-locked cells, we should discuss the statistical power of our analysis as well as neural circuit considerations brought up by reviewer 2’s main comments 2 and 3.

Our fitting of accumulator models to behavioral data favored the multiplicative feedback-loop (fdbk​​) model where the feedback-loop gain u*w has mean close to zero in the fdbk​​ model fits. This behavioral prediction is compatible with the amplitude-vs.-cue-counts slopes of cue-locked cells (dA/dN) having a distribution with more cells having slopes closer to zero (albeit this is not necessary to generate small u*w). Given limited, noisy data we can only have the statistical power to find as significant those slopes that have large magnitudes, which can be why we only found 18% of cells with significant slopes. We should also note that this analysis was performed separately using trials of a fixed choice, i.e. testing for a dependence on cue-counts beyond that which can be accounted for by choice. However, noisy cells that receive weak accumulator feedback can more easily pass statistical tests for being modulated by choice (assuming that the accumulator drives and is therefore correlated with choice), than having count modulation beyond that explainable by choice. In these ways, we believe that our neural observations are consistent with, albeit not a necessary implication of, the behavioral model fits.

We also note that small (dA/dN) does not necessarily correspond to a small neural signal. Because (dA/dN) is a change in cue-response amplitudes per accumulated cue, if we consider the net change after accumulating ~ 8 cues (the average number of majority-side cues in the behavioral task), the responses of count-modulated cue-locked cells can increase/decrease in amplitudes by about × 2 compared to their responses to the first cue. As further discussed in the reply to reviewer 2, a feedback-loop neural circuit with very large magnitudes of dA/dN can have runaway excitation or complete suppression of cue responses after accumulating many cues. It is therefore our thinking that small dA/dN are more physiologically reasonable, and at least according to our accumulator circuit modeling results, can have a behavioral effect.

Reviewer #1:

[…] 1) Figure 6 and the accompanying section of the manuscript investigate a variety of models with different architectures (feedback vs. purely feedforward) and noise sources. Here, if I understood correctly, the actual cue-driven responses are substituted with variables that are affected by different types of noise. It is this part that I found a bit disconnected from the rest, and somewhat confusing.

Here, there's a jump from the actual cells to model responses. I think this needs an earlier and more explicit introduction. It is clear what the objective of the modeling effort is; what's unclear are the elements that initially go into it. This is partly because the section jumps off with a discussion about accumulator noise, but the modeling involves many more assumptions (i.e., simplifications about the inputs to the accumulators).

What I wondered here was, what happened to all the variance that was carefully peeled away from the cue driven responses in the earlier part of the manuscript? Were the dependencies on running speed, viewing angle, contra versus ipsi sensitivity, etc still in play, or were the modeled cue-driven responses considering just the sensory noise from the impulse responses? I apologize if I missed this. I guess the broader question is how exactly the noise sources in the model relate to all the dependencies of the cue cells exposed in the earlier analyses.

Overall, my general impression is that this section requires more unpacking; perhaps it should become an independent report.

We think that the suggested splitting off of the accumulator modeling work to a second paper is an excellent way to more cleanly separate the more complicated neurophysiological findings from the simplifications that we made in the accumulator modeling work for reasons of conceptual clarity. The modeling paper can therefore start out with an explicit list of assumptions made, as follows.

There were three major simplifications made in going from the experimentally observed cue-locked neural responses to the computational accumulator model. First, we assumed that the sensory units in the computational accumulator models only responded to one laterality of cues, because in the neural data there was only <​ 5​ % of cells that responded to both lateralities, and even for these 5% of cells the responses still had a strong cue laterality preference. Second, while the cue-locked neurons had impulse responses of duration ~ 100ms to the pulsatile visual cues, in the computational model we simplified the sensory inputs to the accumulators to have instantaneous responses to the visual cues. The motivation for this was to make the model analytically solvable, which then allowed us to mathematically understand its phenomenology. Third, we did not separately model the other sources of cue-locked response variabilities mentioned by the reviewer, but this is because they act as behavioral sources of sensory and/or accumulator-level noise, and were thus conceptually lumped into the two (sensory and accumulator) noise sources in the models. For example, variability that is due to the mouse viewing different cues at different running speeds can be thought of as adding a different random number to a sensory unit’s response to each cue, i.e. exactly what we defined as the per-pulse sensory noise in the model. In general, sources of noise that are fast (changing from cue to cue) and have no systematic relationship to the running tally of cue counts or choice would all contribute some part of the models’ sensory noise variance. Other sources of noise that are slow (changing from trial to trial, e.g. variability that is correlated to the mouse’s eventual choice) would contribute to the models’ accumulator noise variance because they affect every cue response within the trial in the same way and therefore are accumulated in the same way as the cue responses.

Reviewer #2:

[…] 1) Non-integrable variability. In addition to 'sensory noise' (independent variability in the magnitude of each pulse), it is critical in the model to include a source of variability whose impact does not decay through temporal averaging (to recover Weber-Fechner asymptotically for large N). This is achieved in the model by positing trial-to-trial variability (but not within-trial) in the dot product of the feedforward (w) and feedback (u) directions. But the way this is done seems to me problematic:

The authors model variability in w*u as LogNormal (subsection “Sources of noise in various accumulator architectures”). First, the justification for this choice is incorrect as far as I can tell. The authors write: "We model m̂R with a lognormal distribution, which is the limiting case of a product of many positive random variables". But neither is the dot product of w and u a product (it's a sum of many products), nor are the elements of this sum positive variables (the vector u has near zero mean and both positive and negative elements allowing different neurons to have opposite preferences on choice – see e.g., in the subsection “Cue-locked amplitude modulations motivate a multiplicative feedback-loop circuit model” where it is stated that ui<0 for some cells), nor would it have a LogNormal distribution even if the elements of the sum were indeed positive. Without further assumptions, the dot product w*u will have a normal distribution with mean and variance dependent on the (chosen) statistics of u and w.Two conditions seem to be necessary for u*w: it should have a mean positive but close to zero (if it's too large a(t) will explode), and it should have enough variability to make non-integrable noise have an impact in practice. For a normal distribution, this would imply that for approximately half of the trials, w*u would need to be negative, meaning a decaying accumulator and effectively no feedback. This does not seem like a sensible strategy that the brain would use.

The authors should clarify how this LogNormality is justified and whether it is a critical modelling choice (as an aside, although LogNormality in u*w allows non-negativity, low mean and large variability, the fact that it has very long tails sometimes leads to instability in the values of a(t)).

We agree with the reviewer that the description of the lognormal distribution for u*w is confusing. Specifically, we wrote “limiting case of a product of many positive random variables” only as a statement about the lognormal distribution, and did not think to explain why we made that modeling choice. Upon hindsight, this way of writing it was unintentionally misleading. The justification that we had in mind when designing the model was related to what the reviewer mentioned: if the distribution of u*w can have both negative and positive tails, either the variance of this distribution must be very small relative to its (positive) mean, or else there will be a substantial fraction of trials in which there is negative accumulator feedback modulating the sensory unit responses. However, as reasoned by the reviewer, our assumption of strictly positive u*w was not very natural and would seem to require some kind of careful rectification by neural circuits, for which we have no proposed mechanism.

To address this and the other related comments below, we have extended our work to include a more thorough exploration of noise distribution modeling options, specifically the choices of (1) the sensory response distribution, which we had previously assumed to be gaussian; and (2) the feedback/modulatory noise distribution. As a reminder, the feedforward accumulator models had an accumulator state equal to n*m, where n is a stochastic sensory response drawn from (1) and which depends on the true stimulus counts N, whereas m is a per-trial modulatory noise drawn from (2). The feedback accumulator state is instead proportional to [exp(n m) – 1]/m, with n as for the feedforward models and m = u*w now interpreted as a feedback-related source of noise. We sketch our results so far below.

Our previous choice of n drawn from a gaussian distribution with mean proportional to N means that under some conditions it is possible for the sensory response n to be negative, which we can interpret as a nonzero probability for cues of one laterality to be confused for cues of the opposite laterality. However, this is a modeling assumption that can be tested by alternatively drawing n from a Gamma distribution, and interestingly seems to produce a better fit for the rat data specifically for trials where there are only cues of one laterality.

For the feedback/modulatory noise distribution (2), we tried a spectrum of options ranging from the symmetric gaussian distribution to skewed distributions with progressively longer tails, including the lognormal but also other distributions that have both negative and positive tails. Our preliminary findings are that gaussian-distributed modulatory noise actually produces a significantly better behavioral prediction for the mouse data than our previous choice of the lognormal distribution, and gaussian-distributed u*w may be more compatible with our neural observations of comparable proportions of cue-locked cells with positive vs. negative count-modulations. Specifically to answer the reviewer’s question about which are critical aspects of the lognormal distribution that produced a good fit for the rat data, by comparing its behavioral prediction to a similar model where the distribution of u*w had a truncated positive tail as well as a (smaller) negative tail, we found that these models both predicted the behavior equivalently well. This points to that the strict positivity and extreme tails of the lognormal distribution were not necessary to explain behavior, but rather other features such as a mode close to zero and a positively skewed tail were important.

2) Related to this point, it would be helpful to have more clarity on exactly what is being assumed about the feedback vector u. The neural data suggests u has close to zero mean (across neurons). At the same time, it is posited that u varies across trials ("accumulator feedback is noisy") is and that this variability is significant and important (previous comment). However, it would seem like neurons keep their choice preference across trials, meaning the trial to trial variability in each element of u has to be smaller than the mean. The authors only describe variability in u*w (LogNormal), but, in addition to the issues just mentioned about this choice, what implications does this have for the variability in u? The logic of the approach would greatly increase if the authors made assumptions about the statistics of u consistent with the neural data, and then derived the statistics of u*w.

We first admit that the result mentioned by the reviewer of cells having consistent choice preferences across trials, is unfortunately not sufficient to show that the postulated feedback strength u is fairly consistent across trials. This is because the choice- and count-modulation models that we constructed for cue-locked cell amplitudes assumed​ ​ across-trial consistency, e.g. cells that had differently signed choice modulations from one trial to the next would not have passed significance tests for being choice modulated. Insofar as we can think of, the only way for us to measure the distribution of the u across trials directly from the neural data, is to fit for a potentially different value of u per trial using the responses of a cue-locked cell to multiple cue presentations within a given trial. When we attempted this, say using a simple linear model ak = a0 + uk where ak is a given cell’s response amplitude to the kth cue in a trial, we roughly found three categories of cells with u having symmetric, positively skewed, and negatively skewed distributions, all categories with means close to zero but large variances relative to the mean. We unfortunately feel that these results have many interpretation caveats that make them difficult to claim, e.g. the many other modulatory factors that influence cue-locked activities, and independent cue-to-cue variability (intrinsic to the sensory response) that can increase the variance of the estimated u but seems very difficult to dissociate from this method of estimating u.

We therefore think that the best that we can do to answer this question is to try various modeling options for the u*w distribution as outlined in the answer to comment 1. The general idea is for the different distribution options to explore different properties such as strict positivity, long tails and so forth, using which we can determine which regimes of u*w distribution shapes produce good fits for the behavior, and then more speculatively discuss these fits with regards to neural circuit considerations as suggested by the reviewer in comment 3.

3) Overall, it seems like there is an intrinsically hard problem to be solved here, which is not acknowledged: how to obtain large variability in the effective gain of a feedback loop while at the same time keeping the gain "sufficiently restricted", i.e., neither too large and positive (runaway excitation) nor negative (counts are forgotten). While the authors avoid worrying about model parameters by fitting their values from data (with the caveats discussed above), their case would become much stronger if they studied the phenomenology of the model itself, exposing clearly the computational challenges faced and whether robust solutions to these problems exist.

We will include in the second paper a study of the phenomenology of the model in terms of different sensory response and feedback/modulatory noise distributions, as mentioned in the reply to comment 1. Our preliminary findings are that there are many combinations of distribution shapes that can produce comparably good predictions of the behavior, which may perhaps hint at robustness in the sense that the behavioral prediction depends on gross properties and not details of various noise distributions, and that multiple hypothesized neural implementations can produce the same behavioral outcomes.

https://doi.org/10.7554/eLife.60628.sa2

Article and author information

Author details

  1. Sue Ann Koay

    Princeton Neuroscience Institute, Princeton University, Princeton, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Methodology, Writing - original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9648-2475
  2. Stephan Thiberge

    Bezos Center for Neural Circuit Dynamics, Princeton University, Princeton, United States
    Contribution
    Resources, Supervision
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6583-6613
  3. Carlos D Brody

    1. Princeton Neuroscience Institute, Princeton University, Princeton, United States
    2. Howard Hughes Medical Institute, Princeton University, Princeton, United States
    Contribution
    Conceptualization, Supervision, Writing - review and editing
    For correspondence
    brody@princeton.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4201-561X
  4. David W Tank

    1. Princeton Neuroscience Institute, Princeton University, Princeton, United States
    2. Bezos Center for Neural Circuit Dynamics, Princeton University, Princeton, United States
    Contribution
    Conceptualization, Resources, Supervision, Writing - review and editing
    For correspondence
    dwtank@princeton.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9423-4267

Funding

National Institutes of Health (5U01NS090541)

  • Sue Ann Koay
  • Stephan Thiberge
  • Carlos D Brody
  • David W Tank

National Institutes of Health (1U19NS104648)

  • Sue Ann Koay
  • Stephan Thiberge
  • Carlos D Brody
  • David W Tank

Simons Foundation (Simons Collaboration on the Global Brain-328057)

  • Carlos D Brody
  • David W Tank

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank BB Scott for brainstorming and feedback on the concept of this paper, as well as L Pinto, CM Constantinople, AG Bondy, M Aoi, and B Deverett for useful and interesting discussions. B Engelhard and L Pinto built rigs for the high-throughput training of mice, and S Stein helped in the training of mice in this study. B Engelhard and L Pinto contributed behavioral data from the mouse evidence accumulation task. We additionally thank all members of the BRAIN COGS team, Tank and Brody labs. This work was supported by the NIH grants 5U01NS090541 and 1U19NS104648, and the Simons Collaboration on the Global Brain (SCGB).

Ethics

Animal experimentation: All procedures were approved by the Institutional Animal Care and Use Committee at Princeton University (Protocol 1910) and were performed in accordance with the Guide for the Care and Use of Laboratory Animals (National Research Council et al. 2011). All surgeries were performed under isoflurane anesthesia, every effort was made to minimize suffering, and all experimental animals were group housed in enriched environments.

Senior Editor

  1. Michael J Frank, Brown University, United States

Reviewing Editor

  1. Emilio Salinas, Wake Forest School of Medicine, United States

Publication history

  1. Received: July 1, 2020
  2. Accepted: November 30, 2020
  3. Accepted Manuscript published: December 2, 2020 (version 1)
  4. Version of Record published: January 15, 2021 (version 2)

Copyright

© 2020, Koay et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,115
    Page views
  • 193
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Joshua B Burt et al.
    Research Advance

    Psychoactive drugs can transiently perturb brain physiology while preserving brain structure. The role of physiological state in shaping neural function can therefore be investigated through neuroimaging of pharmacologically induced effects. Previously, using pharmacological neuroimaging, we found that neural and experiential effects of lysergic acid diethylamide (LSD) are attributable to agonism of the serotonin-2A receptor (Preller et al., 2018). Here, we integrate brain-wide transcriptomics with biophysically based circuit modeling to simulate acute neuromodulatory effects of LSD on human cortical large-scale spatiotemporal dynamics. Our model captures the inter-areal topography of LSD-induced changes in cortical blood oxygen level-dependent (BOLD) functional connectivity. These findings suggest that serotonin-2A-mediated modulation of pyramidal-neuronal gain is a circuit mechanism through which LSD alters cortical functional topography. Individual-subject model fitting captures patterns of individual neural differences in pharmacological response related to altered states of consciousness. This work establishes a framework for linking molecular-level manipulations to systems-level functional alterations, with implications for precision medicine.

    1. Neuroscience
    Chin-Hsuan Chia et al.
    Short Report

    Sleep is essential in maintaining physiological homeostasis in the brain. While the underlying mechanism is not fully understood, a 'synaptic homeostasis' theory has been proposed that synapses continue to strengthen during awake, and undergo downscaling during sleep. This theory predicts that brain excitability increases with sleepiness. Here, we collected transcranial magnetic stimulation (TMS) measurements in 38 subjects in a 34-hour program, and decoded the relationship between cortical excitability and self-report sleepiness using advanced statistical methods. By utilizing a combination of partial least squares (PLS) regression and mixed-effect models, we identified a robust pattern of excitability changes, which can quantitatively predict the degree of sleepiness. Moreover, we found that synaptic strengthen occurred in both excitatory and inhibitory connections after sleep deprivation. In sum, our study provides supportive evidence for the synaptic homeostasis theory in human sleep and clarifies the process of synaptic strength modulation during sleepiness.