Multiple timescales of sensory-evidence accumulation across the dorsal cortex

  1. Lucas Pinto
  2. David W Tank  Is a corresponding author
  3. Carlos D Brody  Is a corresponding author
  1. Department of Neuroscience, Northwestern University, United States
  2. Princeton Neuroscience Institute, Princeton University, United States

Abstract

Cortical areas seem to form a hierarchy of intrinsic timescales, but the relevance of this organization for cognitive behavior remains unknown. In particular, decisions requiring the gradual accrual of sensory evidence over time recruit widespread areas across this hierarchy. Here, we tested the hypothesis that this recruitment is related to the intrinsic integration timescales of these widespread areas. We trained mice to accumulate evidence over seconds while navigating in virtual reality and optogenetically silenced the activity of many cortical areas during different brief trial epochs. We found that the inactivation of all tested areas affected the evidence-accumulation computation. Specifically, we observed distinct changes in the weighting of sensory evidence occurring during and before silencing, such that frontal inactivations led to stronger deficits on long timescales than posterior cortical ones. Inactivation of a subset of frontal areas also led to moderate effects on behavioral processes beyond evidence accumulation. Moreover, large-scale cortical Ca2+ activity during task performance displayed different temporal integration windows. Our findings suggest that the intrinsic timescale hierarchy of distributed cortical areas is an important component of evidence-accumulation mechanisms.

Editor's evaluation

Pinto and colleagues used brief optogenetic silencing to study the contributions of different cortical areas in an evidence accumulation task in mice. The authors show that silencing of frontal regions affected evidence accumulation on a longer timescale than that of posterior regions, providing evidence indicating the relation between cortical functions and intrinsic timescales.

https://doi.org/10.7554/eLife.70263.sa0

Introduction

The cerebral cortex of both rodents and primates appears to be organized in a hierarchy of intrinsic integration timescales, whereby frontal areas integrate input over longer time windows than sensory areas (Cavanagh et al., 2020; Chaudhuri et al., 2015; Gao et al., 2020; Hasson et al., 2008; Ito et al., 2020; Kiebel et al., 2008; Murray et al., 2014; Runyan et al., 2017; Soltani et al., 2021; Spitmaan et al., 2020). Although this idea has received increasing attention, the role of such timescale hierarchy in cognitive behavior remains unclear.

In particular, the decisions we make in our daily lives often unfold over time as we deliberate between competing choices. This raises the possibility that decisions co-opt the cortical timescale hierarchy such that different cortical areas integrate decision-related information on distinct timescales. A commonly studied type of time-extended decision-making happens under perceptual uncertainty, which requires the gradual accrual of sensory evidence. This involves remembering a running tally of evidence for or against a decision, updating that tally when new evidence becomes available, and making a choice based on the predominant evidence (Bogacz et al., 2006; Brody and Hanks, 2016; Brunton et al., 2013; Carandini and Churchland, 2013; Gold and Shadlen, 2007; Morcos and Harvey, 2016; Newsome et al., 1989; Odoemene et al., 2018; Stine et al., 2020; Sun and Landy, 2016; Tsetsos et al., 2012; Waskom and Kiani, 2018). Neural correlates of decisions relying on evidence accumulation have been found in a number of cortical and subcortical structures, in both primates and rodents (Brincat et al., 2018; Ding and Gold, 2010; Erlich et al., 2015; Hanks et al., 2015; Horwitz and Newsome, 1999; Kim and Shadlen, 1999; Koay et al., 2020; Krueger et al., 2017; Murphy et al., 2021; Orsolic et al., 2021; Scott et al., 2017; Shadlen and Newsome, 2001; Wilming et al., 2020; Yartsev et al., 2018). Likewise, we have previously shown that, when mice must accumulate evidence over several seconds to make a navigational decision, the inactivation of widespread dorsal cortical areas leads to behavioral deficits, and that these areas encode multiple behavioral variables, including evidence (Pinto et al., 2019). However, we do not understand which aspects of these decisions lead to such widespread recruitment of brain structures.

Here, we hypothesized that the pattern of widespread recruitment of cortical areas during prolonged evidence accumulation can be in part explained by their underlying timescale hierarchy. To test this, we trained mice to accumulate evidence over seconds toward navigational decisions and used brief optogenetic inactivation of single or combined cortical areas, restricted to one of six epochs of the behavioral trials. We show that the inactivation of widespread areas in the dorsal cortex strongly affects the processing and memory of sensory evidence, and that inactivating a subset of frontal areas also results in prospective behavioral deficits. Further, the inactivation of different areas affects accumulation over distinct timescales, such that, to an approximation, frontal areas contribute to evidence memory over longer temporal windows than posterior areas. In agreement with this, we show that cortical activity during the accumulation task displays a gradient of intrinsic timescales, which are longer in frontal areas. Our findings thus suggest the existing cortical hierarchy of temporal integration windows is important for evidence-accumulation computations.

Results

Brief inactivation of different cortical areas leads to accumulation deficits on distinct timescales

We trained mice to accumulate evidence over relatively long timescales while navigating in VR (Figure 1A; Pinto et al., 2018). The mice navigated a 3-m long virtual T-maze and during the first 2 m (~4 s) they encountered salient objects, or towers, along the walls on either side, and after a delay of 1 m (~2 s) turned into the arm corresponding to the highest perceived tower count. The towers were visible for 200 ms and appeared at different positions in each trial, obeying spatial Poisson processes of different underlying rates on the rewarded and non-rewarded sides. Compatible with our previous reports (Koay et al., 2020; Pinto et al., 2018), task performance was modulated by the difference in tower counts between the right and left sides (Figure 1B, n=28). Crucially, beyond allowing us to probe sensitivity to sensory evidence, the task design decorrelated the position of individual towers from the animals’ position in the maze across trials. This allowed us to build a logistic regression model that used the net sensory evidence (∆ towers, or #R – #L) from each of four equally spaced bins from the cue region to predict the choice the mice made. In other words, we inferred the weight of sensory evidence from different positions in the maze on the final decision. While individual mice showed different evidence-weighting profiles, fitting the model on aggregate data yielded a flat evidence-weighting curve (Figure 1C, n=108,940 trials), indicating that on average the mice weighted evidence equally from throughout the maze (Pinto et al., 2018).

Figure 1 with 3 supplements see all
Temporally specific inactivation of multiple dorsal cortical regions during performance of a virtual reality-based evidence-accumulation task.

(A) Schematics of the experimental setup. (B) Psychometric functions for control trials, showing the probability of right-side choice as a function of the strength of right sensory evidence, ∆ towers (#R – #L). Thin gray lines: best fitting psychometric functions for each individual mouse (n=28). Black circles: aggregate data (n=108,940 trials), black line: fit to aggregate data, error bars: binomial confidence intervals. (C) Logistic regression curves for the weight of sensory evidence from four equally spaced bins on the final decision, from control trials. Thin gray lines: individual animals, thick black line: aggregate data, error bars: ± SD from 200 bootstrapping iterations. (D) Experimental design. We bilaterally inactivated seven dorsal cortical areas, alone or in combination, yielding a total of nine area sets, while mice performed the accumulating towers task. Bilateral inactivation happened during one of six regions in the maze spanning different parts of the cue region or delay. We thus tested a total of 54 area-epoch combinations. (E) Effects of sub-trial inactivation on overall performance during all 54 area-epoch combinations. Each panel shows inactivation-induced change in overall % correct performance for each inactivation epoch, for data pooled across mice. Error bars: SD across 10,000 bootstrapping iterations. Black circles indicate significance according to the captions on the leftmost panel. Light gray circles indicate data from individual mice (n=4–11, see Figure 1—source data 1 for details about n per condition).

Figure 1—source data 1

Numbers of mice, sessions, and trials for each of the 54 experimental conditions.

Last line shows the number of unique mice and trials across all experiments, as conditions were partially overlapping for a given mouse and behavioral session.

https://cdn.elifesciences.org/articles/70263/elife-70263-fig1-data1-v1.docx
Figure 1—source data 2

Source data for plots in Figure 1.

https://cdn.elifesciences.org/articles/70263/elife-70263-fig1-data2-v1.zip

Our previous results have shown that cortical contributions to the performance of this task are widespread (Pinto et al., 2019), but our whole-trial inactivation did not allow us to tease apart the nature of the contributions of different areas. Here, we addressed this by using sub-trial optogenetic inactivations to ask how different dorsal cortical regions contribute to the temporal weighting of sensory evidence in order to make a perceptual decision. To do this we cleared the intact skull of mice expressing Channelrhodopsin-2 (ChR2) in inhibitory interneurons (VGAT-ChR2-EYFP, n=28) and used a scanning laser system to bilaterally silence different cortical regions, by activating inhibitory cells (Figure 1D; Guo et al., 2014; Pinto et al., 2019). We targeted seven different areas – primary visual cortex (V1), medial secondary visual cortex (mV2, roughly corresponding to area AM), posterior parietal cortex (PPC), retrosplenial cortex (RSC), the posteromedial portion of the premotor cortex (mM2), the anterior portion of the premotor cortex (aM2), and the primary motor cortex (M1) – as well as two combinations of these individual areas, namely posterior cortex (V1, mV2, PPC, and RSC) and frontal cortex (mM2, aM2, and M1). Cortical silencing occurred in one of six trial epochs: first, second , or third quarter of the cue region (0–50 cm, 50–100 cm, or 100–150 cm, respectively), first or second half of the cue region (0–100 cm or 100–200 cm, respectively), or delay region (200–300 cm). We tested all 54 possible area-epoch combinations (Figure 1—source data 1). This large number of experimental conditions allowed us to assess how the inactivation of different areas affects different aspects of the decision-making process.

Compatible with our previous whole-trial inactivation experiments (Pinto et al., 2019), we found that the inactivation of all tested cortical areas significantly affected behavioral performance, though to varying degrees (Figure 1E, Figure 1—figure supplement 1). Furthermore, we observed a variety of effect profiles across regions and inactivation epochs, as assessed by the difference between the evidence-weighting curves separately calculated for ‘laser off’ and ‘laser on’ trials (Figure 1—figure supplement 2). Although our previous measurements indicate inactivation spreads of at least 2 mm (Pinto et al., 2019), we observed different effects even comparing regions that were in close physical proximity (e.g. V1 and mV2). Additionally, all tested areas had significant effects in at least a subset of conditions (Figure 1—figure supplement 2, p<0.05, bootstrapping). Finally, in agreement with our previous results (Pinto et al., 2019), inactivation resulted in minor changes in running speed in a subset of conditions (average overall increase of ~8%, Figure 1—figure supplement 3). Importantly, we have previously shown that these effects are specific to mice expressing ChR2, ruling out a non-specific light effect (Pinto et al., 2019).

We next assessed directly whether the inactivation of different cortical areas led to changes in how much the mice based their final decision on evidence from different times in the trial with respect to inactivation. We reasoned that changes in the weighting of sensory evidence occurring before laser onset would primarily reflect effects on the memory of past evidence, while changes in evidence occurring while the laser was on would reflect disruption of processing and/or very short-term memory of the evidence. Finally, changes in evidence weighting following laser offset would potentially indicate effects on processes beyond accumulation per se, such as commitment to a decision. For example, a perturbation that caused a premature commitment to a decision would lead to towers that appeared subsequent to the perturbation having no weight on the animal’s choice. Although our inactivation epochs were defined in terms of spatial position within the maze, small variations in running speed across trials, along with the moderate increases in running speed during inactivation, could have introduced confounds in the analysis of evidence as a function of maze location (Figure 1—figure supplement 2). Thus, we repeated the analysis of Figure 1C but now with logistic regression models, built to describe inactivation effects for each area, in which net sensory evidence was binned in time instead of space. Further, to account for the inter-animal variability we observed, we used a mixed-effects logistic regression approach, with mice as random effects (see Materials and methods for details), thus allowing each mouse to contribute its own source of variability to overall side bias and sensitivity to evidence at each time point, with or without the inactivations. We first fit these models separately to inactivation epochs occurring in the early or late parts of the cue region, or in the delay (y≤100 cm, 100<y≤200 cm, y>200 cm, respectively). We again observed a variety of effect patterns, with similar overall laser-induced changes in evidence weighting across epochs for some but not all tested areas (Figure 2—figure supplement 1). Such differences across epochs could reflect dynamic computational contributions of a given area across a behavioral trial. However, an important confound is the fact that we were not able to use the same mice across all experiments due to the large number of conditions (Figure 1—source data 1), such that epoch differences (where epoch is defined as time period relative to trial start) could also simply reflect variability across subjects. To address this, for each area we combined all inactivation epochs in the same model, adding them as additional random effects, thus allowing for the possibility that inactivation of each brain region at each epoch would contribute its own source of variability to side bias; different biases from mice perturbed at different epochs would then be absorbed by this random-effects parameter. We then aligned the timing of evidence pulses to laser onset and offset within the same models, as opposed to aligning with respect to trial start. This alignment combined data from mice inactivated at different epochs together, further ameliorating potential confounds from any mouse × epoch-specific differences. Each fixed-effects data point in figures below (Figures 2 and 3, solid colors) thus reflects tendencies common across mice, not individual mouse effects; the latter are shown as the random effects (faded colors). This approach allowed us to extract the common underlying patterns of inactivation effects on the use of sensory evidence toward choice, while simultaneously accounting for inter-subject and inter-condition variability. These models confirmed that, in control conditions, evidence use is fairly constant across time (Figure 2—figure supplement 2), allowing us to compare the inactivation-induced deficits across time points in the trials (Figure 2A). Overall, these models accurately predicted ~70% of behavioral choices in trials not used to fit the data (Figure 2B).

Figure 2 with 2 supplements see all
Inactivating different cortical areas leads to evidence-accumulation deficits on distinct timescales.

(A) Results from mixed-effects logistic regression models fit to inactivation data from different areas, combined across mice and inactivation epochs, with 10-fold cross-validation. For each area, we plot normalized evidence weights for inactivation trials, such that 0 means no difference from control, and –1 indicates complete lack of evidence weight on decision. Net evidence (#R – #L towers) was binned in time (0.5 s time bins) and aligned to either laser onset or offset. Coefficients were extracted from the model with highest cross-validated accuracy. Thin gray lines and crosses, mouse random effects (primary visual cortex [V1]: n=15, medial secondary visual cortex [mV2]: n=14, posterior parietal cortex [PPC]: n=20, retrosplenial cortex [RSC]: n=21, posteromedial portion of the premotor cortex [mM2]: n=15, anterior portion of the premotor cortex [aM2]: n=15, primary motor cortex [M1]: n=11; note that we omitted 10/666 random effect outliers outside the 1st–99th percentile range for clarity, but they were still considered in the analysis). The zero-mean random effects were added to the fixed effects for display only. Error bars, ± SEM on coefficient estimates. Black circles below the data points indicate statistical significance from t-tests, with false discovery rate correction. We also imposed an additional significance constraint that the coefficient ± SEM does not overlap with ± SD intervals for coefficients for models fit to shuffled data (gray shaded areas, see Materials and methods for details). (B) Distribution of model prediction accuracy, defined as the proportion of correctly predicted choices in 10% of the data not used to fit the model (n=9 areas × 10 cross-validation runs). (C) Comparison of the inactivation effects between areas in the posterior (V1, mV2, PPC, RSC) and frontal cortex (mM2, aM2, M1). Thin lines with crosses, individual areas. Thick lines and error bars, mean ± SEM across areas. p-values are from a two-way ANOVA with repeated measures, with time bins and area group as factors.

Figure 3 with 1 supplement see all
Simultaneous inactivation of frontal or posterior cortical areas confirms distinct contributions to sensory-evidence-based decisions.

Comparison of the effect of simultaneous posterior or frontal cortical inactivation on the use of sensory evidence as recovered from the mixed-effects logistic regression models combined across mice and inactivation epochs, with 10-fold cross-validation. For each area, we plot normalized evidence weights for inactivation trials as in Figure 2. Thin gray lines and crosses, mouse random effects (posterior: n=11, frontal: n=11). Error bars, ± SEM on coefficient estimates. Black circles below the data points indicate p-values from z-tests on the coefficients, with false discovery rate correction (captions on top, see Materials and methods for details).

This modeling approach revealed that inactivation of different areas led to deficits in the use of sensory evidence from distinct time points within the behavioral trials. For instance, PPC inactivation only led to significant decreases in the use of sensory evidence occurring during inactivation or immediately preceding it (Figure 2A, p<0.001, t-test), indicating a role in processing and memory of very recent sensory evidence (≤0.5 s). Similarly, we observed deficits in the use of sensory evidence during the inactivation period for all posterior areas (p<0.001, t-test), except for RSC (p>0.05). However, their role went beyond pure sensory processing, as their inactivation decreased the use of sensory evidence occurring prior to laser onset, albeit on different time scales. Inactivation of mV2 only affected evidence memory on intermediate timescales (0.5–1.0 s, p<0.001, t-test). Conversely, V1 and RSC inactivation led to non-monotonic changes in evidence memory, affecting recent (≤0.5 s, p<0.001) and long-past evidence (1.0–1.5 s, p<0.001), but not evidence occurring in between (0.5–1.0 s, p>0.05). Lastly, for all posterior cortical areas, significant changes in the evidence-weighting curves happened exclusively for evidence concomitant to or preceding laser onset, indicating that the manipulations primarily affected the processing and/or memory of the evidence, i.e., the accumulation process itself.

The inactivation of all tested frontal areas also led to deficits in evidence memory, but with different temporal profiles. Notably, inactivation of all three frontal areas (M1, mM2, aM2) led to profound deficits in the use of long-past evidence (1.0–1.5 s, p<0.001, t-test), but only aM2 inactivation led to changes in the use of evidence occurring while the laser was on (p<0.001). Interestingly, inactivation of mM2 and M1 also led to significant changes in the use of sensory evidence occurring after laser offset (p<0.001), although to a lesser extent than pre-laser evidence (Figure 2A). This could indicate that, in addition to evidence accumulation, these frontal areas also have a role in other decision processes, such as post-stimulus categorization (Hanks et al., 2015) or commitment to a decision. In this case, silencing these regions could result in a premature decision. However, the possibility remains that these effects are related to lingering effects of inactivation on population dynamics in frontal regions, which we have found to evolve on slower timescales (see below). Although we have previously verified in an identical preparation that our laser parameters lead to near-immediate recovery of pre-laser firing rates of single units, with little to no rebound (Pinto et al., 2019), these measurements were not done during the task, such that we cannot completely rule out this possibility.

Thus, while we observed diverse temporal profiles of evidence-weighting deficits resulting from the inactivation of different areas of the dorsal cortex, to an approximation they could be broadly divided according to whether they belonged to the frontal or posterior cortex. Indeed, frontal and posterior areas differed significantly in terms of the magnitude and time course of evidence-weighting deficits induced by their inactivation (Figure 2C, two-way repeated measure ANOVA with factors time bin and area group; F[time]5,15 = 3.09, p[time]=0.047, F[area]1,3 = 33.93, p[area]=0.010, F[interaction]5,15 = 3.60, p[interaction]=0.025).

To further explore the different contributions of posterior and frontal cortical areas to the decision-making process, we next analyzed the effect of inactivating these two groups of areas simultaneously, using the same mixed-effects modeling approach as above. Compatible with our previous analysis, we found significant differences in how these two manipulations impacted the use of sensory evidence (Figure 3). In particular, compared to posterior areas, frontal inactivation resulted in a significantly larger decrease in the use of sensory evidence occurring long before laser onset (1.0–1.5 s, p=0.006, z-test). Moreover, it led to decreases in the use of sensory evidence occurring after inactivation (p<0.001, z-test).

Finally, we wondered whether evidence information from different areas is evenly combined, at least from a behavioral standpoint. To do this, we compared the effects of simultaneously inactivating all frontal or posterior areas to that expected by an even combination of the effects of inactivating areas individually (i.e. their average). Both posterior and frontal significantly deviated from the even-combination prediction (Figure 3—figure supplement 1, p<0.05, z-test). This could suggest that signals from the different dorsal cortical areas are combined with different weights toward a final decision.

A hierarchy of timescales in large-scale cortical activity during evidence accumulation

Our inactivation results thus suggest that different regions of the dorsal cortex contribute to distinct aspects of evidence-accumulation-based decisions. In particular, while all areas we tested appear to have a role in evidence accumulation, they do so on distinct timescales. This is reminiscent of the findings that cortical areas display a hierarchy of intrinsic timescales, such that primary sensory areas tend to integrate over shorter time windows than frontal and other association areas (Chaudhuri et al., 2015; Hasson et al., 2008; Murray et al., 2014; Runyan et al., 2017). While these are thought to arise in part from intrinsic cellular and circuit properties such as channel and receptor expression, amount of recurrent connectivity, and relative proportions of inhibitory interneuron subtypes (Chaudhuri et al., 2015; Duarte et al., 2017; Fulcher et al., 2019; Gao et al., 2020; Wang, 2020), they appear to be modulated by task demands (Gao et al., 2020; Ito et al., 2020; Zeraati et al., 2021). Thus, to confirm whether this timescale hierarchy exists in the mouse cortex during performance of the accumulating-towers task, we reanalyzed previously published data consisting of mesoscale widefield Ca2+ imaging of the dorsal cortex through the intact cleared skull of mice expressing the Ca2+ indicator GCaMP6f in excitatory neurons (Figure 4A, Emx1-Ai93 triple transgenics, n=6, 25 sessions) (Pinto et al., 2019). To do this, we enhanced our previous linear encoding model of the average activity of anatomically defined regions of interest (ROIs) (Pinto et al., 2019) by including two sets of predictors in addition to task events. First, for each ROI we added the zero-lag activity of other simultaneously imaged ROIs as coupling predictors, similar to previous work (Pillow et al., 2008; Runyan et al., 2017; Figure 4—figure supplement 1). Crucially, we also included auto-regressive predictors to capture intrinsic activity autocorrelations that are not locked to behavioral events. In other words, this approach allowed us to estimate within-task autocorrelations while separately accounting for task-induced temporal structure in cortical dynamics (Spitmaan et al., 2020). Adding these new sets of predictors resulted in a large and significant increase in cross-validated model accuracy, as measured by the linear correlation coefficient between the model predictions and a test dataset not used to fit the model (Figure 4B and C; ~0.95 vs. ~0.3, Fmodel (6,2,12) = 1994.85, p=6.2 × 10–13, two-way ANOVA with repeated measures).

Figure 4 with 1 supplement see all
A hierarchy of activity timescales during evidence accumulation.

(A) Top: Example widefield imaging field of view showing GCaMP6f fluorescence across the dorsal cortex. Bottom: Approximate correspondence between the field of view and regions of interest (ROIs) defined from the Allen Brain Atlas, ccv3. (B) Distribution of cross-validated accuracies across mice (n=6, sessions for each mouse are averaged) and ROIs (n=7, averaged across hemispheres). (C) Example of actual ∆F/F (gray) and model predictions (colored lines) for the first 5 s of the same held-out single trial, and four simultaneously imaged ROIs. Traces are convolved with a 1-SD Gaussian kernel for display only. (D) Auto-regressive model coefficients as a function of time lags for an example imaging session and four example ROIs. Gray, coefficient values. Colored lines, best fitting exponential decay functions. (E) Distribution of R2 values for the exponential fits across mice (n=6, sessions for each mouse are averaged) and ROIs (n=7, averaged across hemispheres). (F) Exponential decay functions for all seven cortical areas, fitted to the average across mice (n=6). (G) Median time constants extracted from the exponential decay fits, for each area. Error bars, interquartile range across mice (n=6). p-value is from a one-way ANOVA with repeated measures with ROIs as factors. Light gray circles indicate data from individual mice.

We first wondered whether different timescales would be reflected in model coefficients related to sensory evidence. Interestingly, however, we did not observe any significant differences across areas in the time course of coefficients for contralateral tower stimuli or cumulative sensory evidence (Figure 4—figure supplement 1). Thus, we next focused our analysis on the auto-regressive coefficients of the model. We observed that across animals the rate of decay of these coefficients over lags slowed systematically from visual to premotor areas, with intermediate values for M1, PPC, and RSC (Figure 4D). To quantify this, we fitted exponential decay functions to the auto-regressive coefficients averaged across hemispheres (Figure 4D–F) and extracted decay time constants (τ, Figure 4G). Compatible with our observations, τ differed significantly across cortical areas (F6,30 = 4.49, p=0.006, one-way ANOVA with repeated measures), being larger for frontal than posterior areas. Note that, while it is possible that these coefficients capture autocorrelations introduced by intrinsic GCaMP6f dynamics, there is no reason to believe that this affects our conclusions, as indicator dynamics should be similar across regions. Thus, during the evidence-accumulation task, cortical regions display increasing intrinsic timescales going from visual to frontal areas. This is consistent with previous reports for spontaneous activity and other behavioral tasks (Chaudhuri et al., 2015; Hasson et al., 2008; Murray et al., 2014; Runyan et al., 2017) and is compatible with our inactivation findings (Figures 2 and 3). Nevertheless, a caveat here is that the auto-regressive coefficients of the encoding model could conceivably be spuriously capturing variance attributable to other behavioral variables not included in the model. For example, our model parameterization implicitly assumes that evidence encoding would be linearly related to the side difference in the number of towers. Although this is a common assumption in evidence-accumulation models (e.g. Bogacz et al., 2006; Brunton et al., 2013), it might not apply to our case. At face value, however, our findings could suggest that the different intrinsic timescales across the cortex are important for evidence-accumulation computations.

Discussion

Taken together, our results suggest that distributed cortical areas contribute to sensory-evidence accrual on different timescales. Specifically, brief sub-trial inactivations during performance of a decision-making task requiring seconds-long evidence accumulation resulted in distinct deficits in the weighting of sensory evidence from different points in the stimulus stream. This was such that, on average, the inactivation of frontal cortical areas resulted in larger decreases in the use of evidence occurring further in the past from laser onset compared to posterior regions (Figures 2 and 3). Compatible with this, using an encoding model of large-scale cortical dynamics, we found that activity timescales vary systematically across the cortex in a way that mirrors the inactivation results (Figure 4).

Our results add to a growing body of literature that has revealed that the cortex of rodents and primates appears to be organized in a hierarchy of temporal processing windows across regions (Chaudhuri et al., 2015; Gao et al., 2020; Hasson et al., 2008; Ito et al., 2020; Murray et al., 2014; Runyan et al., 2017; Spitmaan et al., 2020). Specifically, they suggest that the contributions of different cortical areas to decision-making computations are similarly arranged in a temporal hierarchy. A caveat here is that our inactivation findings did not exactly match the same area ordering of integration windows from the widefield imaging neural data, nor were all the inactivation effects monotonic. This could be in part due to technical limitations of the experiments. First, the laser powers we used result in large inactivation spreads, potentially encompassing neighboring regions. Moreover, local inactivation could result in changes in the activity of interconnected regions (Young et al., 2000), a possibility that should be evaluated in future studies using simultaneous inactivation and large-scale recordings across the dorsal cortex. At face value, however, the findings could be a reflection of the fact that diverse timescales exist at the level of individual neurons within each region (Bernacchia et al., 2011; Cavanagh et al., 2020; Scott et al., 2017; Spitmaan et al., 2020; Wasmuht et al., 2018). For example, inactivating an area with multimodal distributions of intrinsic timescales across its neurons could conceivably result in non-monotonic effects of inactivation. In any case, our results point to accrual timescale hierarchies being a significant factor contributing to the widespread recruitment of cortical dynamics during evidence-based decisions, as areas get potentially progressively recruited with increasing integration timescale demands. In the future, this hypothesis should be further probed using tasks that explicitly manipulate the timescales of these decision processes.

Our findings also suggest the possibility that the logic of widespread recruitment of cortical regions in complex, time-extended decisions may in part rely on intrinsic temporal integration properties of local cortical circuits, rather than specific evidence-accumulation mechanisms. For instance, it is possible that simple perceptual decisions primarily engage only the relevant sensory areas because they can be made on the fast intrinsic timescales displayed by these regions (Zatka-Haas et al., 2021). Along the same lines, it is conceivable that discrepancies in the literature regarding the effects of perturbing different cortical areas during evidence accumulation stem in part from differences in the timescales of the various tasks (Erlich et al., 2015; Fetsch et al., 2018; Hanks et al., 2015; Katz et al., 2016; Pinto et al., 2019).

An important remaining question is whether evidence from the different time windows is accumulated in parallel or as a feedforward computation going from areas with short to those with long integration time constants. The parallel scheme would be compatible with recent psychophysical findings in humans reporting confidence of their evidence-based decisions (Ganupuru et al., 2019). Conversely, a feedforward transformation would be in agreement with human fMRI findings during language processing (Yeshurun et al., 2017) and with a previously published model whereby successive (feedforward) convolution operations lead to progressively longer-lasting responses to sensory evidence (Scott et al., 2017). Interestingly, the oculomotor integrator of both fish and monkeys appears to be organized as largely feedforward chains of integration leading to systematically increasing time constants (Joshua and Lisberger, 2015; Miri et al., 2011), perhaps suggesting that this architecture is universal to neural integrators.

Finally, it is also unclear how and where evidence information from different timescales is combined to yield a final decision. Our simultaneous inactivation experiments (Figure 3) suggest that dorsal cortical activity is unevenly weighted, potentially by downstream structures. Candidate regions include the medial prefrontal cortex or subcortical structures such as the striatum and the cerebellum, which have been shown to be causally involved in evidence accumulation (Bolkan et al., 2022; Deverett et al., 2019; Yartsev et al., 2018). Other subcortical candidates are midbrain regions shown to have a high incidence of choice signals in a contrast discrimination task (Steinmetz et al., 2019).

Much work remains before obtaining a complete circuit understanding of gradually evolving decisions. Our findings highlight the fact that, much like in memory systems (Jeneson and Squire, 2012), the timescale of decision processes is an important feature governing their underlying neural mechanisms, a notion which should be incorporated into both experimental and theoretical accounts of decision making.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Mouse line (Mus musculus)(B6.Cg-Tg(Slc32a1-COP4*H134R/EYFP)8Gfng/J)The Jackson LaboratoryJAX: 014548A.k.a VGAT-ChR2-EYFP
Mouse line (Mus musculus)IgS6tm93.1(tetO-GCaMP6f)Hze Tg(Camk2a-tTA)1Mmay/JThe Jackson LaboratoryJAX: 024108A.k.a. Ai93-D;CaMKIIα-tTA
Mouse line (Mus musculus)B6.129S2-Emx1tm1(cre)Krj/JThe Jackson LaboratoryJAX: 005628A.k.a.
Emx1-IRES-Cre
Software, algorithmMatlab 2015b, 2016b, 2017bMathworkshttps://www.mathworks.com/products/matlab.html
Software, algorithmViRMEnAronov and Tank, 2014https://pni.princeton.edu/pni-software-tools/virmen
Software, algorithmNI DAQmx 9.5.1National Instrumentshttps://www.ni.com/en-us/support/downloads/drivers/download.ni-daqmx.html
Software, algorithmHCImageHamamatsuhttps://hcimage.com
Software, algorithmScanning laser control softwarePinto et al., 2019; Pinto, 2019https://github.com/BrainCOGS/laserGalvoControl
Software, algorithmWidefield data analysis softwarePinto et al., 2019; Pinto, 2020https://github.com/BrainCOGS/widefieldImaging
Software, algorithmInactivation analysisThis paper, Pinto, 2022https://github.com/BrainCOGS/PintoEtAl2020_sub-trial_inact
Software, algorithmPython 3.8Pythonhttps://www.python.org
Software, algorithmNumpy 1.18.1van der Walt et al., 2011https://www.numpy.org
Software, algorithmScipy 1.4.1Virtanen et al., 2020https://www.scipy.org
Software, algorithmDeepdish 0.3.4University of Chicago, Larsson, 2022https://github.com/uchicago-cs/deepdish
Software, algorithmStatsmodels 0.11.0Skipper, 2010https://www.statsmodels.org
Software, algorithmMatplotlib 3.1.3Hunter, 2007https://www.matplotlib.org
Software, algorithmPandas 1.0.1McKinney, 2010https://www.pandas.pydata.org
Software, algorithmPingouin 0.3.8Vallat, 2018https://pingouin-stats.org
Software, algorithmMat7.3Simon Kern, Kern, 2022https://github.com/skjerns/mat7.3
Software, algorithmPymer4Jolly, 2018

Animals and surgery

Request a detailed protocol

All procedures were approved by the Institutional Animal Care and Use Committee at Princeton University (protocols 1910–15 and 1910–18) and were performed in accordance with the Guide for the Care and Use of Laboratory Animals (National Research Council, 2011). We used both male and female VGAT-ChR2-EYFP mice aged 2–16 months (B6.Cg-Tg[Slc32a1-COP4*H134R/EYFP]8Gfng/J, Jackson Laboratories, stock # 014548, n=28). Part of the inactivation data from some of these animals was collected in the context of previous work (Pinto et al., 2019), but the analyses reported here are completely novel. The mice underwent sterile surgery to implant a custom titanium headplate and optically clear their intact skulls, following a procedure described in detail elsewhere (Pinto et al., 2019). Briefly, after exposing the skull and removing the periosteum, successive layers of cyanoacrylate glue (krazy glue, Elmers, Columbus, OH) and diluted clear metabond (Parkell, Brentwood, NY) were applied evenly to the dorsal surface of the skull and polished after curing using a dental polishing kit (Pearson dental, Sylmar, CA). The headplate was attached to the cleared skull using metabond, and a layer of transparent nail polish (Electron Microscopy Sciences, Hatfield, PA) was applied and allowed to cure for 10–15 min. The procedure was done under isoflurane anesthesia (2.5% for induction, 1.5% for maintenance). The animals received two doses of meloxicam for analgesia (1 mg/kg I.P or S.C.), given at the time of surgery and 24 hr later, as well as peri-operative I.P. injections of body-temperature saline to maintain hydration. Body temperature was maintained constant using a homeothermic control system (Harvard Apparatus, Holliston, MA). The mice were allowed to recover for at least 5 days before starting behavioral training. After recovery they were restricted to 1–2 mL of water per day and extensively handled for another 5 days, or until they no longer showed signs of stress. We started behavioral training after their weights were stable and they accepted handling. During training, the full allotted fluid volume was typically delivered within the behavioral session, but supplemented if necessary. The mice were weighed and monitored daily for signs of dehydration. If these were present or their body mass fell below 80% of the initial value, they received supplemental water until recovering. They were group housed throughout the experiment and had daily access to an enriched environment (Pinto et al., 2018). The animals were trained 5–7 days/week.

The analysis reported in Figure 4 (widefield Ca2+ imaging) is from data collected in the context of a previous study (Pinto et al., 2019), although the analysis is novel. The data was from six male and female mice from triple transgenic crosses expressing GCaMP6f under the CaMKIIα promoter from the following two lines: Ai93-D; CaMKIIα-tTA [IgS6tm93.1(tetO-GCaMP6f)Hze Tg[Camk2a-tTA]1Mmay/J, Jackson Laboratories, stock # 024108] and Emx1-IRES-Cre [B6.129S2-Emx1tm1(cre)Krj/J, Jackson Laboratories, stock # 005628]. These animals also underwent the surgical procedure described above.

Virtual reality apparatus

Request a detailed protocol

The mice were trained in a VR environment (Figure 1A) described in detail elsewhere (Pinto et al., 2018). Briefly, they sat on an 8-inch hollow Styrofoam ball that was suspended by compressed air at ~60 p.s.i, after passing through a laminar flow nozzle to reduce noise (600.326.5 K.BC, Lechler, St. Charles, IL). They were head-fixed such that their snouts were aligned to the ball equator and at a height such that they could run comfortable without hunching, while still being able to touch the ball with their full paw pads (corresponding to a headplate-to-ball height of ~1 inch for a 25 g animal). Ball movements were measured using optical flow sensors (ADNS-3080 APM2.6) and transformed into virtual world displacements using custom code running on Arduino Due (https://github.com/sakoay/AccumTowersTools/tree/master/OpticalSensorPackage, Koay, 2018). The ball sat on a custom 3D-printed cup that contained both the air outlet and the movement sensor. The VR environment was projected onto a custom-built toroidal Styrofoam screen using a DLP projector (Optoma HD141X, Fremont, CA) at a refresh rate of 120 Hz and a pixel resolution of 1024 × 768. The screen spanned ~270° of azimuth and ~80° of elevation in the mouse’s visual field. The whole setup was enclosed in a custom-built sound-attenuating chamber. The VR environment was programmed and controlled using ViRMEn (Aronov and Tank, 2014) (https://pni.princeton.edu/pni-software-tools/virmen), running on Matlab (Mathworks, Natick, MA) on a PC.

Behavioral task

Request a detailed protocol

We trained the mice in the accumulating towers task (Pinto et al., 2018). The mice ran down a virtual T-maze that was 3.3 m in length (y), 5 cm in height, and a nominal 10 cm in width (x, though they were restricted to the central 1 cm). The length of the maze consisted of a 30 cm start region to which they were teleported at the start of each trial, followed by a 200 cm cue region and a 100 cm delay region. The cue and the delay region had the same wallpaper designed to provide optical flow. During the cue region, the mice encountered tall white objects (2 × 6 cm, width × height), or towers, that appeared at random locations in each trial at a Poisson rate of 7.7 m–1 and 2.3 m–1 on the rewarded and non-rewarded sides, respectively (or 8.0 and 1.6 m–1 in some sessions), with a 12 cm refractory period and an overall density of 5 m–1. The towers appeared when the mice were 10 cm away from their drawn locations and disappeared 200 ms later (roughly corresponding to the time over which the tower sweeps across the visual field given average running speeds). After the maze stem, the mice turned into one of the two arms (10.5 × 11 × 5 cm, length × width × height), and received a reward if they turned to the arm corresponding to the highest tower count (4–8 µL of 10% v/v sweet condensed milk). This was followed by a 3 s inter-trial interval, consisting of 1 s of a frozen frame of the VR environment and 2 s of a black screen. An erroneous turn resulted in a loud sound and a 12 s timeout.

Each daily behavioral session (~1 hr, ~200–250 trials) started with warm-up trials of a visually guided task in the same maze, in which towers appeared only on the rewarded side and additionally a 30 cm tall visual guide visible from the start of the trial was placed in the arm corresponding to the reward location. The animals progressed to the main task when they achieved at least 85% correct trials over a running window of 10 trials in the warm-up task. During the accumulating-towers task, performance was evaluated over a 40-trial running window, both to assess side biases and correct them using an algorithm described elsewhere (Pinto et al., 2018) and to trigger a transition into a 10-trial block of easy trials if performance fell below 55% correct. These blocks consisted of towers only on the rewarded side and were introduced to increase motivation but were not included in the analyses. No optogenetic inactivation was performed during either warm-up or easy-block trials. In the widefield imaging experiments, the behavioral sessions contained several visually guided (warm-up) blocks (Pinto et al., 2019). These were excluded from the present analyses.

Laser-scanning optogenetic inactivation

Request a detailed protocol

We used a scanning laser setup described in detail elsewhere (Pinto et al., 2019). Briefly, a 473 nm laser beam (OBIS, Coherent, Santa Clara, CA) was directed to 2D galvanometers using a 125-µm single-mode optic fiber optic (Thorlabs, Newton, NJ) and reached the cortical surface after passing through an f-theta scanning lens (LINOS, Waltham, MA). We used a 40 Hz square wave with an 80% duty cycle and a power of 6 mW measured at the level of the skull. This corresponds to an inactivation spread of ~2 mm (Pinto et al., 2019). While this may introduce confounds regarding ascribing exact functions to specific cortical areas, we have previously shown that the effects of whole-trial inactivations at much lower powers (corresponding to smaller spatial spreads) are consistent with those obtained at 6 mW. To minimize post-inactivation rebounds, the last 100 ms of the laser pulse consisted of a linear ramp-down of power (Guo et al., 2014; Pinto et al., 2019). We performed inactivations during the following trial epochs: first, second , or third quarter of the cue region (0–50 cm, 50–100 cm, or 100–150 cm, respectively), first or second half of the cue region (0–100 cm or 100–200 cm, respectively), or delay region (200–300 cm). Thus, the epochs were defined according to the animals' y position in the maze. Because of this, the onset time of the power ramp-down was calculated in each trial based on the current speed and the expected time at which the mouse would reach the laser offset location. The system was controlled using custom-written code in Matlab running on a PC, which sent command analog voltages to the laser and galvanometers through NI DAQ cards. This PC received instructions for laser onset, offset, and galvanometer position from the ViRMEn PC through digital lines.

We targeted a total of nine area combinations, either consisting of homotopic bilateral pairs or multiple bilateral locations. The galvanometers alternated between locations at 200 Hz (20 mm travel time: ~250 µs) and in the case of more than two locations, the sequence of visited locations was chosen to minimize travel distance. The inactivated locations were defined based on stereotaxic coordinates using bregma as reference, as follows:

  • Primary visual cortex (V1): –3.5 AP, 3 ML

  • Medial secondary visual cortex (mV2, ~area AM): –2.5 AP, 2.5 ML

  • Posterior parietal cortex (PPC): –2 AP, 1.75 ML

  • Retrosplenial cortex (RSC): –2.5 AP, 0.5 ML

  • Posteromedial portion of the premotor cortex (mM2): 0.0 AP, 0.5 ML

  • Anterior portion of the premotor cortex (aM2): +3 AP, 1 ML

  • Primary motor cortex (M1): +1 AP, 2 ML

  • Posterior cortex: V1, mV2, PPC, and RSC

  • Frontal cortex: mM2, aM2, and M1

To ensure consistency in bregma location across behavioral sessions, the experimenter set bregma on a reference image and for each session the current image of the mouse’s skull was registered to this reference using rigid transformations. Different sessions contained different combinations of areas and inactivation epochs, resulting in partially overlapping mice and sessions for each condition. The probability of inactivation trials, therefore, varied across sessions, ranging from a total of 0.15–0.35 across conditions, and from 0.02 to 0.15 per condition. In our experience, capping the probability at ~0.35 is important to maintain motivation throughout the behavioral session.

Widefield Ca2+ imaging

Request a detailed protocol

Details on the experimental setup and data preprocessing can be found elsewhere (Pinto et al., 2019). Briefly, we used a tandem-lens macroscope (1×–0.63× planapo, Leica M series, Wetzlar, Germany) with alternating 410 nm and 470 nm LED epifluorescence illumination for isosbestic hemodynamic correction and collected 525 nm emission at 20 Hz, using an sCMOS (OrcaFlash4.0, Hamamatsu, Hamamatsu City, Japan), with an image size of 512 × 512 pixels (pixel size of ~17 µm). Images were acquired with HCImage (Hamamatsu) running on a PC and synchronized to the behavior using a data acquisition-triggering TTL pulse from another PC running ViRMEn, which in turn received analog frame exposure voltage traces acquired through a DAQ card (National Instruments, Austin, TX) and saved in the behavioral log file. The image stacks were motion-corrected by applying the x-y shift that maximized the correlation between successive frames, and then were spatially binned to a 128 × 128 pixel image (~68 × 68 µm). The fluorescence values from pixels belonging to different anatomical ROIs were averaged into a single trace, separately for 410 nm (Fv) and 470 nm excitation (Fb). After applying a heuristic correction to Fv (Pinto et al., 2019), we calculated fractional fluorescence changes as R=F/F0, where F0 for each excitation wavelength was calculated as the mode of all F values over a 30 s sliding window with single-frame steps. The final ∆F/F was calculated using a divisive correction, ∆F/F=Rb/Rv – 1. ROIs were defined based on the Allen Brain Mouse Atlas (ccv3). We first performed retinotopic mapping to define visual areas and used the obtained maps to find, for each mouse, the optimal affine transformation to the Allen framework.

Data analysis

Request a detailed protocol

All analyses of the behavioral effects of cortical inactivations were performed in Python 3.8. Linear encoding model fitting of widefield data was performed in Matlab, and the results were analyzed in Python.

Behavioral data selection

Request a detailed protocol

Because of the warm-up and easy-block trials, the sessions are naturally organized into a block structure, such that the duration of each block of the accumulating-towers task is of at least 40 trials (see above). We selected all trials from blocks in which the control (laser off) performance was at least 60% correct, collapsed over all levels of sensory evidence. After block selection, we excluded trials in which the animals failed to reach the end of the maze, or in which the total traveled distance exceeded the nominal maze length by more than 10% (Pinto et al., 2018; Pinto et al., 2019). These selection criteria yielded a total of 929 optogenetic inactivation sessions from 28 mice (average ~33 /mouse), corresponding to 108,940 control (laser off) trials, and 29,825 inactivation trials (average ~552/condition, see Figure 1—source data 1). Twenty-five sessions from six mice were selected for widefield imaging data analysis.

Analysis of behavioral data

Overall performance

Request a detailed protocol

We calculated overall performance as the percentage of trials in which the mice turned to the side with the highest tower counts, separately for control and inactivation trials.

Running speed

Request a detailed protocol

Speed was calculated for each inactivation segment using the total x-y displacement. We compared laser-induced changes in speed to control trials from the same maze segment.

Psychometric curves

Request a detailed protocol

We computed psychometric curves separately for control and inactivation trials by plotting the percentage of right-choice trials as a function of the difference in the number of right and left towers (#R – #L, or ∆). ∆ was binned in increments of five between –15 and 15, and its value defined as the average ∆ weighted by the number of trials. We fitted the psychometric curves using a four-parameter sigmoid:

pR=b+a1+exp((ΔΔ0)/λ)

Evidence-weighting curves in space

Request a detailed protocol

To assess how mice weighted sensory evidence from different segments of the cue region, we performed a logistic regression analysis in which the probability of a right choice was predicted from a logistic function of the weighted sum of the net amount of sensory evidence from each of four equally spaced segments (10–200 cm, since no towers can occur before y=10):

pR=11+exp((β0+i=14βiΔi))

where ∆ = # right – # left towers calculated separately for each segment. These weighting functions were calculated separately for ‘laser on’ and ‘laser off’ trials. To quantify the laser-induced changes in evidence weighting, we simply subtracted the ‘laser on’ from the ‘laser off’ curves, such that negative values indicate smaller evidence weights in the ‘laser on’ condition. Bin sizes were chosen to match the resolution of our inactivation epochs.

Evidence-weighting curves in time

Request a detailed protocol

To directly quantify how our manipulations impacted the weighting of sensory evidence in time relative to inactivation onset and offset, while accounting for variability introduced by different mice and epochs in the trial, we fitted a mixed-effects logistic regression model to the mice’s choices using the Pymer4 package for Python (Jolly, 2018):

(1) pR=11+exp((β0C+EC+β0L+EL+R))

where pR is the probability of making a right side choice, β0C and β0L are fixed-effects bias terms for control and laser trials, respectively, R denotes random effects (see below), and EX is the sensory-evidence fixed-effects terms for control (X=C) and laser-on (X=L) trials:

EX=i=13βiPreΔiPre+βDuringΔDuring+i=12βiPostΔiPost

where βiY are the evidence weights and ΔiY = # right – # left towers calculated separately for each 0.5 s time bin i, aligned by the laser onset or offset depending on superscript Y. Y=Pre captures evidence occurring before inactivation; thus, evidence time is aligned to laser onset and binned at increasingly negative time values. Y=During captures evidence occurring while the laser is on; thus, evidence time is aligned to laser offset and binned at negative time values. Finally, Y=Post captures evidence occurring after the laser is off; thus, evidence time is aligned to laser offset and binned at positive time values. For control trials, we used dummy laser onset times defined as the time in the control trial corresponding to when the animal crossed the y position value where the laser was on in the nearest inactivation trial. ΔiY values were z-scored. The normalized coefficients in Figures 2 and 3, and Figure 2—figure supplement 2 were calculated as,

(ELEC)/EC

We included two classes of random effects, mouse identity, and inactivation epoch. (Here, ‘epoch’ is not defined relative to the laser timing but is relative to the start of the maze, as in the schematic of Figure 1D.) Mouse identity had both a single random intercept (i.e. side bias) and a random slope for each model coefficient, whereas epoch only had a random intercept corresponding to side bias.

R=e=16β0,e+m=1nβ0,mC+m=1nβ0,mL+m=1nEmC+m=1nEmL(3)

where e is the index over inactivation epochs for a given area, m is the index over mice, and n is the number of mice for a given area. β0,i is the bias (random intercept) terms for the ith epoch, β0,ix are the bias (random intercept) terms for the ith mouse in either control (C) or laser (L) trials, and Emx are the per-mouse evidence random effects (slopes), with each term as described above in equation (2). R is modeled as a multivariate Gaussian distribution with mean = 0. In other words, the per-mouse random effects are modeled as a noise sample from this zero-mean gaussian, with the fixed effect of equations (1) and (2) capturing tendencies that are common across the mice. The model is fitted using maximum likelihood estimation as described in detail elsewhere (Bates et al., 2015).

All models were fitted using 10-fold cross-validation. We analyzed the best of the 10 models for each condition, defined as the one for which we obtained the highest choice prediction accuracy in the 10% of trials not used to fit the data. For the models in Figure 2, we also computed coefficients for shuffled data, where we randomized the laser-on labels 30 times while keeping the mouse and condition labels constant, such that we maintained the underlying statistics for these sources of variability. This allowed us to estimate the empirical null distributions for the laser-induced changes in evidence weighting terms.

Statistics of inactivation effects

Request a detailed protocol

Error estimates and statistics for general performance, running speed, and logistic regression weights in space were generated by bootstrapping this procedure 10,000 times, where in each iteration we sampled trials with replacement. p-values for regression weights and general performance were calculated as the fraction of bootstrapping iterations in which the control-subtracted inactivation value was above zero. In other words, we performed a one-sided test of the hypothesis that inactivation decreases performance and evidence weights on decision. For speed, we performed a two-sided test by computing the proportion of iterations where the sign of the laser-induced change in speed differed from that of the empirical data average. The significance of the coefficients in the mixed-effects model of evidence in time was calculated using a t-test based on the coefficient estimate and its standard error. Additionally, for the models in Figure 2, we only considered coefficients to be significant if their standard error did not overlap the ±1 SD intervals from the coefficients extracted from the shuffled models. To compare two coefficients from different models (Figure 3—figure supplement 1), we used a z-test, calculating the z statistic as follows Clogg et al., 1995:

z=|(β1β2)/(β1×SEMβ1)2+(β2×SEMβ2)2|

To estimate statistical power, we performed a bootstrapping-based power analysis based on the one described by Guo et al., 2014. We randomly subsampled the full dataset containing all inactivation conditions. In each subsample, we selected different numbers of inactivation trials regardless of area-epoch combination (50<n<1000, in steps of 25) and added a random subset of control trials such that the relative proportion of control to laser trials was preserved in the subsampled dataset. We then ran the bootstrapping procedure described above to compute laser-induced changes in overall performance combined across all inactivation conditions, extracting p-values for each of the values of n subsamples. We repeated this procedure 10 times. Power was defined as the minimum number of trials required to observe p<0.05 at the empirical effect size pooled across conditions, as defined by the first n where the 2× SEM across the 10 repeats is below 0.05. We obtained an aggregate power of n=250.

Linear encoding model of widefield data

Request a detailed protocol

We fitted Ca2+ activity averaged over each anatomically defined ROI with a linear model (Pinto et al., 2019; Pinto and Dan, 2015; Scott et al., 2017). For each trial and y position in the maze, we extracted ∆F/F (with native 10 Hz sampling frequency) limited to 0≤y≤ 300 cm (i.e. trial start, outcome, and inter-trial periods were not included). Activity was then z-scored across all trials. ∆F/F of each area was modeled as a linear combination of different predictors at different time lags. In addition to the previously used task-event predictors (Pinto et al., 2019), we added coupling terms, i.e., the zero-lag activity of the other simultaneously imaged ROIs (Pillow et al., 2008; Runyan et al., 2017), as well as auto-regressive terms to capture activity auto-correlations that were independent of task events (Spitmaan et al., 2020). Finally, we added a term to penalize the L2 norm of the coefficients, i.e., we performed ridge regression. The full model was thus defined as:

ΔF/F(t)=β0+A+C+T+λ||B||

where β0 is an offset term, λ is the penalty term, and ||B|| is the L2 norm of the weight vector. Additionally, A, C, and T are the auto-regressive, coupling, and task terms, respectively:

A=i=0.12βiautoregrΔF/F(ti)
C=j=115βjcouplingΔF/Fj(t)
T=i=02βitREtitR+i=02βitLEtitL+i=02βiΔEtiΔ+i=0.30.3βiθEtiθ+i=0.30.3βidθ/dtEtidθ/dt+i=0.30.3βispEtisp+βyy+βchch+βpchpch+βprwprw

In the above equations, βix is the encoding weight for predictor x at time lag i (in steps of 0.1 s), where x is either a task event or the activity of the ROI at a previous time point, and βjcoupling is the weight for the zero-lag activity for simultaneously imaged ROI j (we had a total of 16 ROIs across the 2 hemispheres). In the task term, Etij is a delta function indicating the occurrence of event x at time t-i. Specifically, tR indicates the occurrence of a right tower, tL of a left tower, =cumulative #R – #L towers, θ is the view angle, dθ/dt is the virtual view angle velocity, sp is the running speed, y is the spatial position in the maze stem (no lags), and ch, pch, and prw are constant offsets for a given trial, indicating upcoming choice, previous choice (+1 for right and –1 for left) and previous reward (1 for reward and –1 otherwise), respectively.

Cross-validation

Request a detailed protocol

The model was fitted using threefold cross-validation. For each of 20 values of the penalty term λ, we trained the model using two-thirds of the trials (both correct and wrong choices) and tested it on the remaining one-third of trials. We picked the value of λ that maximized accuracy and used median accuracy and weight values across all 10 × 3 runs for that λ. Model accuracy was defined as the linear correlation coefficient between actual ∆F/F and that predicted by the model in the test set.

Model comparison

Request a detailed protocol

We tested three versions of the encoding model, one with just the task term T, another one adding the auto-regressive term A, and the other with the coupling term C in addition to A and T. All versions were fitted using exactly the same cross-validation data partitioning to allow for direct comparison. We averaged cross-validated predictions over hemispheres and sessions for each mouse, performing the comparison with mouse-level data. Statistical significance of the differences between the accuracy of different models was computed using a two-way ANOVA with repeated measures with factors ROI and model type, and individual model comparisons were made using Tukey’s post hoc test. Coefficient analysis in Figure 4 is from the full model, which had the highest performance.

Quantification of timescales from the model coefficients

Request a detailed protocol

To quantify the timescales from the fitted auto-regressive coefficients, for each behavioral session we fitted an exponential decay function to the coefficients between 0.1 and 2 s in the past, normalized to the coefficient at 0.1 s (first bin):

B+Aexp(x/τ)

where B is the offset term, A controls the amplitude of the curve, x is the vector of normalized coefficients, and τ is the decay time constant. Fits were performed using the non-linear least squares algorithm. The extracted time constants (τ) were first averaged over hemispheres and sessions for each mouse, and statistics were performed on mouse averages. The significance of the differences in the time constants across regions was assessed by performing a one-way ANOVA with repeated measures, with cortical regions as the factor.

False discovery rate correction

Request a detailed protocol

We corrected for multiple comparisons using a previously described method for false discovery rate correction (Benjamini and Hochberg, 1995; Guo et al., 2014; Pinto et al., 2019). Briefly, p-values were ranked in ascending order, and the ith ranked p-value, Pi, was deemed significant if it satisfied Pi ≤ (αi)/n, where n is the number of comparisons and α is the significance level. In our case, α=0.050 and 0.025 for one-sided and two-sided tests, respectively.

Data and code availability

Request a detailed protocol

Data analysis code and source code for figures are available at https://github.com/BrainCOGS/PintoEtAl2020_sub-trial_inact (copy archived at swh:1:rev:de1261fff8f39a8aa14cde34da032384fe3b9144, Pinto, 2022). Behavioral data from inactivation experiments is publicly available on figshare.com, doi: 10.6084 /m9.figshare.19543948.

Data availability

Data analysis code and source code for figures is available at https://github.com/BrainCOGS/PintoEtAl2020_subtrial_inact (copy archived at swh:1:rev:de1261fff8f39a8aa14cde34da032384fe3b9144). - Each figure contains associated source data containing the numerical data used to generate them - Behavioral data from all inactivation experiments is publicly available on figshare.com, https://doi.org/10.6084/m9.figshare.19543948.

The following data sets were generated
    1. Pinto L
    2. Tank DW
    3. Brody CD
    (2022) figshare
    Behavioral data in the accumulating-towers task with optogenetic inactivation of 9 sets of cortical regions during six different trial epochs.
    https://doi.org/10.6084/m9.figshare.19543948

References

  1. Conference
    1. Skipper S
    (2010)
    Statsmodels: Econometric and Statistical Modeling with Python
    In: Proceedings of the 9th Python in Science Conference.

Decision letter

  1. Naoshige Uchida
    Reviewing Editor; Harvard University, United States
  2. Michael J Frank
    Senior Editor; Brown University, United States
  3. Gidon Felsen
    Reviewer; University of Colorado School of Medicine, United States

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Multiple timescales of sensory-evidence accumulation across the dorsal cortex" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Michael Frank as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Gidon Felsen (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

Previous studies have indicated that neurons in different cortical areas have different intrinsic timescales. In this study, Pinto and colleagues aimed at establishing the functional significance of intrinsic timescales across cortical regions by performing optogenetic silencing of cortical areas in an evidence accumulation task in mice. The authors observed that optogenetic silencing reduced the weight of sensory evidence primarily during silencing, but also preceding time windows in some cases, suggesting that inactivation of frontal cortical regions had long-lasting effects than that of posterior cortical regions. This study provides important results addressing the relation between cortical functions and intrinsic timescales.

The reviewers agreed that this study addresses an important question, and the authors performed sophisticated experiments, and collected a large amount of data. The results are presented clearly and the manuscript is well-written. All the reviewers thought that the results are potentially of great interest to a wide audience. However, the reviewers found several substantive issues which reduced the confidence on the authors' conclusions. These issues need to be addressed before publication of this study at eLife.

In particular, the following points have been identified as essential issues:

1. The presented analysis does not consider a large variability that exists at the level of individual animals. There is also some variability across conditions (e.g. photoinhibition of different epochs). Furthermore, the statistical analyses presented in the manuscript often rely on a small number of samples, and the sample size is not equal across the conditions (n = 6, 4, 3 for y = 0, 50, 100, respectively). Because of these issues, we felt that the main conclusion needs to be supported by further analysis investigating these variabilities, and careful discussions of these potential caveats.

2. The authors claim that the optogenetic silencing primarily affected the evidence-accumulation computation, but not other decision-related processes. The reviewers found this claim to be not strongly supported by the data. From the presented data, whether silencing specifically affected the evidence-accumulation process, not just passing the evidence to an accumulation process, remains unclear. Furthermore, silencing affects running speed (thus, indicates effects other than accumulation process). Also, the reviewers thought that alternative possibilities have not been fully examined.

3. Optogenetic silencing sometimes increased the running speed. This can potentially reduce the time spent in each location, and may affect the acquisition sensory information. It is important that the reduced regression weight is not the side effect of reduce time spent in each location. Furthermore, some analysis based on time, not just locations, would be very helpful.

More detailed comments and suggestions on the above issues are included in the individual reviewers' comments.

Reviewer #2 (Recommendations for the authors):

1) Overall, the inactivation effect is highly variable across brain regions and conditions. For example, in Figure 1-Supp 2, silencing mV2 and RSC during the 3rd quarter of the cue region reduce weighting 100 cm back, but the effect is not replicated when silencing is extended in time (2nd half of the cue region). The effect is yet different when silencing the posterior cortical regions, which covers mV2 and RSC. There are many cases like this. What is this variability due to? Is this degree of variability expected from behavioral variability? It is difficult to evaluate how robust the behavioral deficits are without an estimate of the expected variability and false positive rate.

2) The conclusion that inactivation primarily affects evidence accumulation is based on weights from the logistic regression. A drop in weights of the sensory evidence presumably means the stimulus information is lost. However, there could be other reasons weights could drop. For example, if mice stop engage in the task after photostimulation, this could presumably lower the weights since mice no longer base their choice on the sensory stimulus. The analysis of weights after photostimulation provides a nice control (Figure 2-Supp2). However, several areas do show prospective deficits in weighting of future evidence, although this is not observed in all areas. Prospective deficits could be consistent with mice stop performing the task. This possibility should be ruled out.

3) Some additional analyses could further corroborate the interpretation that the deficit is specifically in evidence accumulation. For example, if the inactivation selectively abolishes the memory of prior evidence, stimuli presented thereafter should still be integrated and a model based the evidence after the photostimulus should predict choice. If so, this could strengthen the interpretation that the deficits are specific to the accumulated evidence. Otherwise, it could suggest inactivation is degrading performance for other reasons.

4) In general, I could not find information on how well the logistic regression predicts choice.

5) The main result of the paper (Figure 2) is based on effects averaged across different inactivation conditions (different epochs). However, I wonder if it makes sense to combine conditions like this. One, I wonder if this could hide areas that are involved during specific epochs of the task. The text states that "…aligned curves from different epochs were fairly consistent (Figure 2B)", but it is not clear how this is quantified and compared to what reference. Two, I wonder if this pooling would violate assumptions of statistical tests given data now comes from distinct sources, rather than being repeated observations.

6) The analysis of calcium dynamics are based on the autoregressive component of the GLM model. This is counterintuitive because that component is not related to the stimulus or the task. If the claim is that evidence accumulation is related to the timescale of neural dynamics, shouldn't the analysis focus on the coefficients for E_δ (cumulative #R – #L towers), i.e. the component of the dynamics that encodes the stimulus?

7) In a couple of places in the text, I feel the claims should be weakened as they go beyond the data. For example,

a. Intro: "… provide the first casual demonstration that this hierarchy [of timescale] is important for cognitive behavior." A similar statement is in the 2nd paragraph of discussion. I suggest changing the framing. The experiments do not manipulate the timescale of cortical regions. The relationship with the observed behavioral deficit is correlative.

b. Page 11, "This suggests that signals from the different dorsal cortical areas could be combined by downstream regions in a near-linear fashion. Candidate regions include … " The following paragraph is perhaps more suitable for discussion since the experiments do not probe subcortical regions. Also see comment 8 below. The effects of combined-area inactivation in fact appear to be qualitatively different from the average of single area silencing.

c. Page 13, "…the different intrinsic timescales across the cortex support evidence integration over time windows of different durations." For the same reason as in comment (a) above, I suggest rephrasing or removing this framing.

d. Abstract and intro, "inactivation of different areas primarily affected the evidence-accumulation per se, rather than other decision-related process". It seems the results do not examine other decision-related process besides the weighting of sensory evidence.

e. The text claims the spatial resolution of inactivation is 1.5-2mm. This is somewhat misleading. In Figure S2 of Pinto 2019, 60% of neurons are silenced at this light intensity at 2mm from light center. This broad inactivation is also consistent with the characterization from the Svoboda lab (Li et al., eLife 2019), which suggests that the spread of inactivation at 6 mW extends well beyond 2 mm in radius.

8) In Figure 2-Supp 3, the effects of posterior vs frontal cortex inactivation do not appear to be very different from each other. This is somewhat different from the averages of single area effects. In general, the statistical tests in the paper do not directly compare the effects of posterior cortex inactivation vs. frontal cortex inactivation. A more appropriate test for the key conclusion should be an interaction of y-position dependence with cortical regions.

9) The explanation of power analysis is not very clear (page 26-27). How are the control trials subsampled at different number of inactivation trials? What does it mean to bootstrap all the inactivation conditions together? At what effect size is n=250 sufficient to detect the effect?

10) The non-monotonic effect of cluster 3 (V1 and RSC) in Figure 2c is counterintuitive. The effect seems to be present in several individual conditions in Figure 1-Supp 2. However, other conditions don't show this (e.g. delay epoch inactivation). The text states that the effect is potentially compatible with findings that multiple timescales exist in a single region. Please explain this notion more clearly and how it could lead to no deficit for recent stimulus information but deficits for distant stimulus memory.

11) Mice speed up during photostimulation in nearly all conditions (Figure 2-Supp 1). Are mice responding to the light? Ideally, a negative control could be included to show there are no non-specific effects of photostimulation when analyzed in the logistic regression. This could be done by photostimulation in GFP mice or by inactivation a cortical region not involved in the behavior.

Reviewer #3 (Recommendations for the authors):

Related to the above comment on aggregating data across mice, the presentation of the data would be more transparent if mouse-by-mouse results were shown, where possible (like they are in Figure 1B,C; Figure 1-table S1 is also helpful). For example, symbols for individual mice could be shown in Figure 1E instead of (or in addition to) the mean across mice. Presumably change in performance was calculated within mice and then averaged, rather than averaging laser on and laser off performance across mice and then taking the difference between the two. But the description ("inactivation-induced change in overall % correct performance for each inactivation epoch, for data combined across mice", line 119) could apply to either analysis.

https://doi.org/10.7554/eLife.70263.sa1

Author response

Essential revisions:

Previous studies have indicated that neurons in different cortical areas have different intrinsic timescales. In this study, Pinto and colleagues aimed at establishing the functional significance of intrinsic timescales across cortical regions by performing optogenetic silencing of cortical areas in an evidence accumulation task in mice. The authors observed that optogenetic silencing reduced the weight of sensory evidence primarily during silencing, but also preceding time windows in some cases, suggesting that inactivation of frontal cortical regions had long-lasting effects than that of posterior cortical regions. This study provides important results addressing the relation between cortical functions and intrinsic timescales.

The reviewers agreed that this study addresses an important question, and the authors performed sophisticated experiments, and collected a large amount of data. The results are presented clearly and the manuscript is well-written. All the reviewers thought that the results are potentially of great interest to a wide audience. However, the reviewers found several substantive issues which reduced the confidence on the authors' conclusions. These issues need to be addressed before publication of this study at eLife.

We are glad the reviewers found our work to be of potentially great interest, and we thank them for their valuable, thorough and constructive critique of our manuscript. We believe they raised a number of important issues, which prompted us to significantly revise our analytical approaches and writing. Specifically, we have made the following major changes to the manuscript:

– We have replaced the analysis of inactivation effects on evidence weighting for a mixed-effects logistic regression model that is parameterized in time instead of space. This new parameterization better separates the effects on sensory evidence that occurs before, during or after the inactivation, all within the same models. Moreover, variability from individual mice is now explicitly accounted for as random effects. Further, we also combine different epochs in the same model by adding them as additional random effects. These changes simultaneously address major concerns about inter-mouse variability, low-n statistics on conditions rather than mice, and potential confounds from laser-induced changes and inter-trial variability in running speed. Overall, the results from the new analysis largely match our previous analysis, but the increased statistical power and better parameterization allowed us to clear up some findings.

– Individual mouse data are now displayed where relevant.

– We have added analyses of stimulus-related coefficients in the linear encoding models of widefield data.

– We have removed our claims of causality and exclusive inactivation effects on evidence accumulation, and added more thorough discussions of the caveats around the interpretation of our findings.

We hope the reviewers and editors will agree that we have satisfactorily addressed their concerns, and as a result our manuscript is much improved. We provide detailed responses to the individual points below.

In particular, the following points have been identified as essential issues:

1. The presented analysis does not consider a large variability that exists at the level of individual animals. There is also some variability across conditions (e.g. photoinhibition of different epochs). Furthermore, the statistical analyses presented in the manuscript often rely on a small number of samples, and the sample size is not equal across the conditions (n = 6, 4, 3 for y = 0, 50, 100, respectively). Because of these issues, we felt that the main conclusion needs to be supported by further analysis investigating these variabilities, and careful discussions of these potential caveats.

We have entirely replaced this analysis for a mixed-effects logistic regression approach in which the fixed effects are weights of sensory evidence in time, and random effects are mice and conditions. Thus, we now explicitly model the variability introduced by these two factors, which allows us to focus on the effects that are common across mice and conditions. Additionally, we now perform statistical analyses on coefficients using metrics based on their error estimates from the model fitting procedure, such that all estimates come from the same sample size and take into account the full data (t- and z-tests, as explained in more detail in Materials and methods, line 665). Finally, to further estimate variability introduced by mice and conditions (epochs), we have devised a shuffling procedure where laser-on labels are shuffled while maintaining mouse and condition statistics. We use this procedure to estimate the empirical null distributions of inactivation-induced changes in evidence weights, which we use to further assess statistical significance of the effects. These new analyses are presented in Figures 2, 3 and corresponding supplements. We have also added text to be more explicit about these sources of variability (line 169):

“(…) to account for the inter-animal variability we observed, we used a mixed-effects logistic regression approach, with mice as random effects (see Materials and methods for details), thus allowing each mouse to contribute its own source of variability to overall side bias and sensitivity to evidence at each time point, with or without the inactivations. We first fit these models separately to inactivation epochs occurring in the early or late parts of the cue region, or in the delay (y ≤ 100 cm, 100 < y ≤ 200 cm, y > 200 cm, respectively). We again observed a variety of effect patterns, with similar overall laser-induced changes in evidence weighting across epochs for some but not all tested areas (Figure 2—figure supplement 1). Such differences across epochs could reflect dynamic computational contributions of a given area across a behavioral trial. However, an important confound is the fact that we were not able to use the same mice across all experiments due to the large number of conditions (Figure 1–table supplement 1), such that epoch differences (where epoch is defined as time period relative to trial start) could also simply reflect variability across subjects. To address this, for each area we combined all inactivation epochs in the same model, adding them as additional random effects, thus allowing for the possibility that inactivation of each brain region at each epoch would contribute its own source of variability to side bias; different biases from mice perturbed at different epochs would then be absorbed by this random-effects parameter. We then aligned the timing of evidence pulses to laser onset and offset within the same models, as opposed to aligning with respect to trial start. This alignment combined data from mice inactivated at different epochs together, further ameliorating potential confounds from any mouse x epoch-specific differences. Each fixed-effects data point in figures below (Figures 2, 3, solid colors) thus reflects tendencies common across mice, not individual mouse effects; the latter are shown as the random effects (faded colors). This approach allowed us to extract the common underlying patterns of inactivation effects on the use of sensory evidence towards choice, while simultaneously accounting for inter-subject and inter-condition variability.”

2. The authors claim that the optogenetic silencing primarily affected the evidence-accumulation computation, but not other decision-related processes. The reviewers found this claim to be not strongly supported by the data. From the presented data, whether silencing specifically affected the evidence-accumulation process, not just passing the evidence to an accumulation process, remains unclear. Furthermore, silencing affects running speed (thus, indicates effects other than accumulation process). Also, the reviewers thought that alternative possibilities have not been fully examined.

We agree with the reviewers that our previous modeling approach did not allow us to adequately separate between these different processes, both because it confounded sensory processing and sensory memory and because effects on post-laser sensory evidence were part of a separate analysis that did not take into account pre-laser evidence weights. To address these shortcomings, not only did we fit a new model in time rather than space, but we also separated evidence occurring before, during or after inactivation within the same model. We did so by aligning evidence time to either laser onset or offset (see Materials and methods, page 28, line 614, for details). As we now explain in the main text (line 156):

“We reasoned that changes in the weighting of sensory evidence occurring before laser onset would primarily reflect effects on the memory of past evidence, while changes in evidence occurring while the laser was on would reflect disruption of processing and/or very short-term memory of the evidence. Finally, changes in evidence weighting following laser offset would potentially indicate effects on processes beyond accumulation per se, such as commitment to a decision. For example, a perturbation that caused a premature commitment to a decision would lead to towers that appeared subsequent to the perturbation having no weight on the animal’s choice. Although our inactivation epochs were defined in terms of spatial position within the maze, small variations in running speed across trials, along with the moderate increases in running speed during inactivation, could have introduced confounds in the analysis of evidence as a function of maze location (Figure 1—figure supplement 2). Thus, we repeated the analysis of Figure 1C but now with logistic regression models, built to describe inactivation effects for each area, in which net sensory evidence was binned in time instead of space. (…) We then aligned the timing of evidence pulses to laser onset and offset within the same models, as opposed to aligning with respect to trial start.”

We would also like to emphasize that we consider the evidence-accumulation process to involve both “passing evidence to the accumulator” and “accumulating / remembering it.” To make this more explicit, we have added the following sentence to the introduction (line 48):

“[evidence accumulation] involves remembering a running tally of evidence for or against a decision, updating that tally when new evidence becomes available, and making a choice based on the predominant evidence.”

Throughout our description of results, we now more carefully outline whether the findings support a role in sensory-evidence processing, memory, or both, as well as post-accumulation processes manifesting as decreases in the weight of sensory evidence after laser offset. For example, our new analyses have more clearly shown prospective changes in evidence use when M1 and mM2 were silenced, compatible with the latter. We also agree with the reviewers that we cannot completely rule out other untested sources of behavioral deficits beyond the aforementioned decision processes. Thus, we have removed all statements to the effect that only evidence accumulation per se was affected. Importantly, though, we believe the new analyses do support the claims that the inactivation of all tested areas strongly affects the accumulation process, even if not exclusively.

3. Optogenetic silencing sometimes increased the running speed. This can potentially reduce the time spent in each location, and may affect the acquisition sensory information. It is important that the reduced regression weight is not the side effect of reduce time spent in each location. Furthermore, some analysis based on time, not just locations, would be very helpful.

The increases in running speed are small in magnitude, averaging only ~8% from control levels across conditions (we have changed the units in the speed figure from cm/s to % of control to highlight that). However, we do agree with the reviewers that this potentially introduces confounds when analyzing the effects in space rather than time. Thus, along with the changes described above, we have replaced the analysis in time by a single model that parametrizes evidence in time, aligned to either laser onset or offset. These analyses are now presented in Figures 2 and 3, and corresponding supplements, and largely confirm our main findings.

More detailed comments and suggestions on the above issues are included in the individual reviewers' comments.

Reviewer #2 (Recommendations for the authors):

1) Overall, the inactivation effect is highly variable across brain regions and conditions. For example, in Figure 1-Supp 2, silencing mV2 and RSC during the 3rd quarter of the cue region reduce weighting 100 cm back, but the effect is not replicated when silencing is extended in time (2nd half of the cue region). The effect is yet different when silencing the posterior cortical regions, which covers mV2 and RSC. There are many cases like this. What is this variability due to? Is this degree of variability expected from behavioral variability? It is difficult to evaluate how robust the behavioral deficits are without an estimate of the expected variability and false positive rate.

We have now changed our modeling approach to better account for the variability across mice and conditions present in the data. We now use a single mixed-effects model to fit all conditions and mice for a given area, using evidence weights as fixed effects and mice and conditions as random effects. Because of the large number of different conditions (54), not all mice were exposed to all area-epoch combinations. Thus, it is unfortunately not possible to dissociate these two sources of variability. For example, it could be the case that inter-condition variability genuinely reflects some dynamic process, such that the effect of inactivating a given area is different depending on the inactivation epoch. While this possibility is interesting, we cannot probe it conclusively because inter-condition variability could trivially arise from differences between mice exposed to different conditions. We now discuss this explicitly in the Results section (line 174):

“(…) We again observed a variety of effect patterns, with similar overall laser-induced changes in evidence weighting across epochs for some but not all tested areas (Figure 2—figure supplement 1). Such differences across epochs could reflect dynamic computational contributions of a given area across a behavioral trial. However, an important confound is the fact that we were not able to use the same mice across all experiments due to the large number of conditions (Figure 1–table supplement 1), such that epoch differences (where epoch is defined as time period relative to trial start) could also simply reflect variability across subjects. To address this, for each area we combined all inactivation epochs in the same model, adding them as additional random effects, thus allowing for the possibility that inactivation of each brain region at each epoch would contribute its own source of variability to side bias; different biases from mice perturbed at different epochs would then be absorbed by this random-effects parameter. We then aligned the timing of evidence pulses to laser onset and offset within the same models, as opposed to aligning with respect to trial start. This alignment combined data from mice inactivated at different epochs together, further ameliorating potential confounds from any mouse x epoch-specific differences. (…) Thus, this approach allowed us to extract the common underlying patterns of inactivation effects on the use of sensory evidence towards choice, while simultaneously accounting for inter-subject and inter-condition variability.”

Following the reviewer's excellent suggestion, we have also devised a procedure to estimate the empirical null distribution of laser effects, taking into account the variability in our data. We then used that distribution to further constrain effect significance. This is explained in Materials and methods (lines 652, 667):

“For the models in Figure 2, we also computed coefficients for shuffled data, where we randomized the laser-on labels 30 times while keeping the mouse and condition labels constant, such that we maintained the underlying statistics for these sources of variability. This allowed us to estimate the empirical null distributions for the laser-induced changes in evidence weighting terms. (…) Significance of the coefficients in the mixed-effects model of evidence in time were calculated using a t-test based on the coefficient estimate and its standard error. Additionally, for the models in Figure 2 we only considered coefficients to be significant if their standard error did not overlap the ± 1 SD intervals from the coefficients extracted from the shuffled models.”

Finally, we note that all of our p-values are in fact corrected for a false discovery rate using Bejamini and Hochberg's method (1995), as described in Materials and methods (line 739):

"We corrected for multiple comparisons using a previously described method for false discovery rate (FDR) correction (Benjamini and Hochberg, 1995; Guo et al., 2014; Pinto et al., 2019). Briefly, p-values were ranked in ascending order, and the ith ranked p-value, Pi, was deemed significant if it satisfied π ≤ (αi)/n, where n is the number of comparisons and α is the significance level. In our case, α = 0.050 and 0.025 for one-sided and two-sided tests, respectively."

2) The conclusion that inactivation primarily affects evidence accumulation is based on weights from the logistic regression. A drop in weights of the sensory evidence presumably means the stimulus information is lost. However, there could be other reasons weights could drop. For example, if mice stop engage in the task after photostimulation, this could presumably lower the weights since mice no longer base their choice on the sensory stimulus. The analysis of weights after photostimulation provides a nice control (Figure 2-Supp2). However, several areas do show prospective deficits in weighting of future evidence, although this is not observed in all areas. Prospective deficits could be consistent with mice stop performing the task. This possibility should be ruled out.

3) Some additional analyses could further corroborate the interpretation that the deficit is specifically in evidence accumulation. For example, if the inactivation selectively abolishes the memory of prior evidence, stimuli presented thereafter should still be integrated and a model based the evidence after the photostimulus should predict choice. If so, this could strengthen the interpretation that the deficits are specific to the accumulated evidence. Otherwise, it could suggest inactivation is degrading performance for other reasons.

We will address points 2 and 3 together. We agree with the reviewer that our previous modeling approach did not allow us to adequately separate between these different processes, both because it confounded sensory processing and sensory memory and because effects on post-laser sensory evidence were part of a separate analysis that did not take into account pre-laser evidence weights. However, we believe that our new modeling approach addresses these shortcomings by separating evidence occurring before, during or after inactivation within the same model. Thus, because choice can be predicted based on both pre- and post-laser evidence, we can now compare the effects on these directly (Figures 2, 2-S1, 2-S2, 3). With two exceptions, we did not observe any significant deficits in prospective evidence weighting. As the reviewer points out, this suggests that the mice perform the task normally after the inactivation in these cases. We did observe post-inactivation changes in evidence use for M1 and mM2, although these were milder than the decreases in pre-laser evidence weighting (Figure 2). We believe that this suggests a role for these areas in both accumulation and post-accumulation processes. Of course, this does suggest that evidence accumulation is not the only affected computation. We also agree that we have not completely ruled out other potential sources of behavioral deficits. Thus, we have removed all statements to the effect that only evidence accumulation per se was affected. Importantly, though, we believe the new analyses do support the claims that the inactivation of all tested areas strongly affects the accumulation process, even if not exclusively.

4) In general, I could not find information on how well the logistic regression predicts choice.

Thank you for catching this oversight. We have added cross-validated choice-prediction accuracy distributions to Figure 2B (~70% accuracy across models).

5) The main result of the paper (Figure 2) is based on effects averaged across different inactivation conditions (different epochs). However, I wonder if it makes sense to combine conditions like this. One, I wonder if this could hide areas that are involved during specific epochs of the task. The text states that "…aligned curves from different epochs were fairly consistent (Figure 2B)", but it is not clear how this is quantified and compared to what reference. Two, I wonder if this pooling would violate assumptions of statistical tests given data now comes from distinct sources, rather than being repeated observations.

The reviewer makes a good point. However, as detailed in our response to comment 1, we cannot be certain that those are legitimate differences between epochs, rather than just an artifact of inter-animal variability. Thus, we would rather make the more conservative choice of presenting them together, while still showing more granular per-condition analyses in the supplements. We also justify these choices more explicitly in the main text now. Finally, while our previous method of pooling data may have violated statistical assumptions, the fact that we are now fitting all data in the same model and explicitly modeling that variability as random effects should address that concern.

6) The analysis of calcium dynamics are based on the autoregressive component of the GLM model. This is counterintuitive because that component is not related to the stimulus or the task. If the claim is that evidence accumulation is related to the timescale of neural dynamics, shouldn't the analysis focus on the coefficients for E_δ (cumulative #R – #L towers), i.e. the component of the dynamics that encodes the stimulus?

We have now added the analysis suggested by the reviewer to Figure 4—figure supplement 1. Interestingly, we did not observe systematic timescale differences in E_δ, or in the responses locked to sensory-evidence pulses. This is now described in the Results section (line 315):

“We first wondered whether different timescales would be reflected in model coefficients related to sensory evidence. Interestingly, however, we did not observe any significant differences across areas in the time course of coefficients for contralateral tower stimuli or cumulative sensory evidence (Figure 4—figure supplement 1). Thus, we next focused our analysis on the auto-regressive coefficients of the model.”

At face value, this would suggest that the auto-regressive coefficients capture temporal components of neural dynamics that are not locked to any task events. Of course, the possibility remains that, despite our extensive parameterization of behavioral events, we failed to capture some task component that would display timescale differences across areas. We have added a discussion to acknowledge this possibility (line 332):

"Nevertheless, a caveat here is that the auto-regressive coefficients of the encoding model could conceivably be spuriously capturing variance attributable to other behavioral variables not included in the model. For example, our model parameterization implicitly assumes that evidence encoding would be linearly related to the side difference in the number of towers. Although this is a common assumption in evidence-accumulation models (e.g., Bogacz et al., 2006; Brunton et al., 2013), it could not apply to our case."

7) In a couple of places in the text, I feel the claims should be weakened as they go beyond the data. For example,

a. Intro: "… provide the first casual demonstration that this hierarchy [of timescale] is important for cognitive behavior." A similar statement is in the 2nd paragraph of discussion. I suggest changing the framing. The experiments do not manipulate the timescale of cortical regions. The relationship with the observed behavioral deficit is correlative.

Following the reviewer's suggestion, we have removed these claims altogether.

b. Page 11, "This suggests that signals from the different dorsal cortical areas could be combined by downstream regions in a near-linear fashion. Candidate regions include … " The following paragraph is perhaps more suitable for discussion since the experiments do not probe subcortical regions. Also see comment 8 below. The effects of combined-area inactivation in fact appear to be qualitatively different from the average of single area silencing.

This has been moved to the discussion as suggested. Also please note that, as we elaborate in the response to comment 8 below, our new analysis methods did reveal significant differences between simultaneous silencing and single-area averages. The discussion has been updated accordingly.

c. Page 13, "…the different intrinsic timescales across the cortex support evidence integration over time windows of different durations." For the same reason as in comment (a) above, I suggest rephrasing or removing this framing.

We have rewritten this sentence as: “our findings could suggest that the different intrinsic timescales across the cortex are important for evidence-accumulation computations.” Note that this now also immediately follows our discussion on the caveats about the auto-regressive coefficients (see response to comment 6).

d. Abstract and intro, "inactivation of different areas primarily affected the evidence-accumulation per se, rather than other decision-related process". It seems the results do not examine other decision-related process besides the weighting of sensory evidence.

Following the reviewer’s suggestion, we have changed this framing throughout. While we do believe that our revised analyses indicate disruptions to the accumulation process (see response to comments 2 and 3 above), we agree that we did not fully examine other alternatives, so we removed our claims that evidence accumulation is the only affected process.

e. The text claims the spatial resolution of inactivation is 1.5-2mm. This is somewhat misleading. In Figure S2 of Pinto 2019, 60% of neurons are silenced at this light intensity at 2mm from light center. This broad inactivation is also consistent with the characterization from the Svoboda lab (Li et al., eLife 2019), which suggests that the spread of inactivation at 6 mW extends well beyond 2 mm in radius.

The original estimate referred to full inactivation, but the reviewer is of course correct that we still saw partial inactivation at 2 mm. We have therefore replaced these statements in the text by "≥ 2 mm" (lines 146, 517).

8) In Figure 2-Supp 3, the effects of posterior vs frontal cortex inactivation do not appear to be very different from each other. This is somewhat different from the averages of single area effects. In general, the statistical tests in the paper do not directly compare the effects of posterior cortex inactivation vs. frontal cortex inactivation. A more appropriate test for the key conclusion should be an interaction of y-position dependence with cortical regions.

The reviewer is correct that simultaneous inactivation appears to yield qualitatively different results than individual area averages. Our previous statistical procedures did not capture significant differences given their low power. However, our new analysis indeed revealed those differences to be significant (Figure 3—figure supplement 1). Despite this, it remains the case that frontal inactivation caused larger deficits on longer timescales, either when comparing single-area averages (Figure 2C) or simultaneous inactivation (Figure 3). We have expanded the description of these findings in the Results section, copied below for the reviewer's convenience (line 232):

"(…) Indeed, frontal and posterior areas differed significantly in terms of the magnitude and time course of evidence-weighting deficits induced by their inactivation (Figure 2C, 2-way repeated measure ANOVA with factors time bin and area group; F(time)5,15 = 3.09, p(time) = 0.047, F(area)1,3 = 33.93, p(area) = 0.010, F(interaction)5,15 = 3.60, p(interaction) = 0.025).

To further explore the different contributions of posterior and frontal cortical areas to the decision-making process, we next analyzed the effect of inactivating these two groups of areas simultaneously, using the same mixed-effects modeling approach as above. Compatible with our previous analysis, we found significant differences in how these two manipulations impacted the use of sensory evidence (Figure 3). In particular, compared to posterior areas, frontal inactivation resulted in a significantly larger decrease in the use of sensory evidence occurring long before laser onset (1.0 – 1.5 s, p = 0.006, z-test). Moreover, it led to decreases in the use of sensory evidence occurring after inactivation (p < 0.001, z-test), lending further support for a role of these regions in post-accumulation processes.

Finally, we wondered whether evidence information from different areas is evenly combined, at least from a behavioral standpoint. To do this, we compared the effects of simultaneously inactivating all frontal or posterior areas to that expected by an even combination of the effects of inactivating areas individually (i.e. their average). Both posterior and frontal significantly deviated from the even-combination prediction (Figure 3—figure supplement 1, p < 0.05, z-test). This could suggest that signals from the different dorsal cortical areas are combined with different weights towards a final decision."

9) The explanation of power analysis is not very clear (page 26-27). How are the control trials subsampled at different number of inactivation trials? What does it mean to bootstrap all the inactivation conditions together? At what effect size is n=250 sufficient to detect the effect?

We regret that this section was not written clearly enough. We have rewritten it to clarify the reviewer’s questions (line 672):

“To estimate statistical power, we performed a bootstrapping-based power analysis based on the one described by Guo et al. (Guo et al., 2014). We randomly subsampled the full dataset containing all inactivation conditions. In each subsample, we selected different numbers of inactivation trials regardless of area-epoch combination (50 < n < 1000, in steps of 25), and added a random subset of control trials such that the relative proportion of control to laser trials was preserved in the subsampled dataset. We then ran the bootstrapping procedure described above to compute laser-induced changes in overall performance combined across all inactivation conditions, extracting p-values for each of the values of n subsamples. We repeated this procedure ten times. Power was defined as the minimum number of trials required to observe p < 0.05 at the empirical effect size pooled across conditions, as defined by the first n where the 2 x SEM across the 10 repeats is below 0.05. We obtained an aggregate power of n = 250.”

10) The non-monotonic effect of cluster 3 (V1 and RSC) in Figure 2c is counterintuitive. The effect seems to be present in several individual conditions in Figure 1-Supp 2. However, other conditions don't show this (e.g. delay epoch inactivation). The text states that the effect is potentially compatible with findings that multiple timescales exist in a single region. Please explain this notion more clearly and how it could lead to no deficit for recent stimulus information but deficits for distant stimulus memory.

We share the reviewer's puzzlement about these findings. However, they have remained true even after we accounted for inter-subject and inter-condition variability. Following the reviewer's suggestion, we have expanded the discussion of the findings, including possible technical artifacts leading to them (Discussion, line 375):

"This could be in part due to technical limitations of the experiments. First, the laser powers we used result in large inactivation spreads, potentially encompassing neighboring regions. Moreover, local inactivation could result in changes in the activity of interconnected regions (Young et al., 2000), a possibility that should be evaluated in future studies using simultaneous inactivation and large-scale recordings across the dorsal cortex. At face value, however, the findings could be a reflection of the fact that diverse timescales exist at the level of individual neurons within each region (Bernacchia et al., 2011; Cavanagh et al., 2020; Scott et al., 2017; Spitmaan et al., 2020; Wasmuht et al., 2018). For example, inactivating an area with multimodal distributions of intrinsic timescales across its neurons could conceivably result in non-monotonic effects of inactivation."

11) Mice speed up during photostimulation in nearly all conditions (Figure 2-Supp 1). Are mice responding to the light? Ideally, a negative control could be included to show there are no non-specific effects of photostimulation when analyzed in the logistic regression. This could be done by photostimulation in GFP mice or by inactivation a cortical region not involved in the behavior.

We performed the suggested controls in the context of a recent paper, in which we used identical photostimulation parameters (Pinto et al., 2019). We did not observe the significant effects on running speed for any of the locations tested here when we performed the experiments in mice not expressing channelrhodopsin (and only a minor effect in 1/29 tested locations). We have added a statement to this effect when describing these results (line 151):

“Importantly, we have previously shown that these effects are specific to mice expressing ChR2, ruling out a non-specific light effect (Pinto et al., 2019).”

Reviewer #3 (Recommendations for the authors):

Related to the above comment on aggregating data across mice, the presentation of the data would be more transparent if mouse-by-mouse results were shown, where possible (like they are in Figure 1B,C; Figure 1-table S1 is also helpful). For example, symbols for individual mice could be shown in Figure 1E instead of (or in addition to) the mean across mice. Presumably change in performance was calculated within mice and then averaged, rather than averaging laser on and laser off performance across mice and then taking the difference between the two. But the description ("inactivation-induced change in overall % correct performance for each inactivation epoch, for data combined across mice", line 119) could apply to either analysis.

We have now added individual mouse data where applicable (Figures 1E, 2A, 3A, 4G, Figure 1—figure supplement 3, Figure 2—figure supplement 1, Figure 2—figure supplement 2, Figure 4—figure supplement 1B,D). However, performance was in fact calculated on data pooled across mice and averaged across bootstrapping iterations (see Materials and methods) to account for different numbers of trials across mice. Thus, the displayed averages are closer to trial-count-weighted averages across mice. We now realize that the wording was unclear, and have changed “combined across mice” to “pooled across mice” where relevant.

https://doi.org/10.7554/eLife.70263.sa2

Article and author information

Author details

  1. Lucas Pinto

    1. Department of Neuroscience, Northwestern University, Chicago, United States
    2. Princeton Neuroscience Institute, Princeton University, Princeton, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0471-9317
  2. David W Tank

    Princeton Neuroscience Institute, Princeton University, Princeton, United States
    Contribution
    Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review and editing
    Contributed equally with
    Carlos D Brody
    For correspondence
    dwtank@princeton.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9423-4267
  3. Carlos D Brody

    Princeton Neuroscience Institute, Princeton University, Princeton, United States
    Contribution
    Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review and editing
    Contributed equally with
    David W Tank
    For correspondence
    brody@princeton.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4201-561X

Funding

National Institutes of Health (U01NS090541)

  • Lucas Pinto
  • David W Tank
  • Carlos D Brody

National Institutes of Health (U19NS104648)

  • Lucas Pinto
  • David W Tank
  • Carlos D Brody

National Institutes of Health (F32NS101871)

  • Lucas Pinto

National Institutes of Health (K99MH120047)

  • Lucas Pinto

National Institutes of Health (R00MH120047)

  • Lucas Pinto

Simons Foundation (872599SPI)

  • Lucas Pinto

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Sue Ann Koay and Kanaka Rajan for discussions, Abigail Russo and E Mika Diamanti for comments on the manuscript, and Samantha Stein and Scott Baptista for technical assistance.

Ethics

All procedures were approved by the Institutional Animal Care and Use Committee at Princeton University (protocols 1910-15 and 1910-18) and were performed in accordance with the Guide for the Care and Use of Laboratory Animals. Surgical procedures were done under isoflurane anesthesia. The animals received two doses of meloxicam for analgesia , given at the time of surgery and 24 h later, as well as peri-operative I.P. injections of body-temperature saline to maintain hydration. Body temperature was maintained constant using a homeothermic control system. The mice were allowed to recover for at least 5 days before starting behavioral training. After recovery they were restricted to 1 - 2 mL of water per day and extensively handled for another 5 days, or until they no longer showed signs of stress. We started behavioral training after their weights were stable and they accepted handling. During training, the full allotted fluid volume was typically delivered within the behavioral session, but supplemented if necessary. The mice were weighed and monitored daily for signs of dehydration. If these were present or their body mass fell below 80% of the initial value, they received supplemental water until recovering. They were group housed throughout the experiment, and had daily access to an enriched environment. The animals were trained 5 - 7 days/week.

Senior Editor

  1. Michael J Frank, Brown University, United States

Reviewing Editor

  1. Naoshige Uchida, Harvard University, United States

Reviewer

  1. Gidon Felsen, University of Colorado School of Medicine, United States

Publication history

  1. Preprint posted: December 29, 2020 (view preprint)
  2. Received: May 11, 2021
  3. Accepted: May 27, 2022
  4. Version of Record published: June 16, 2022 (version 1)

Copyright

© 2022, Pinto et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,184
    Page views
  • 302
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Lucas Pinto
  2. David W Tank
  3. Carlos D Brody
(2022)
Multiple timescales of sensory-evidence accumulation across the dorsal cortex
eLife 11:e70263.
https://doi.org/10.7554/eLife.70263

Further reading

    1. Neuroscience
    David S Jacobs, Madeleine C Allen ... Bita Moghaddam
    Research Advance Updated

    Previously, we developed a novel model for anxiety during motivated behavior by training rats to perform a task where actions executed to obtain a reward were probabilistically punished and observed that after learning, neuronal activity in the ventral tegmental area (VTA) and dorsomedial prefrontal cortex (dmPFC) represent the relationship between action and punishment risk (Park and Moghaddam, 2017). Here, we used male and female rats to expand on the previous work by focusing on neural changes in the dmPFC and VTA that were associated with the learning of probabilistic punishment, and anxiolytic treatment with diazepam after learning. We find that adaptive neural responses of dmPFC and VTA during the learning of anxiogenic contingencies are independent from the punisher experience and occur primarily during the peri-action and reward period. Our results also identify peri-action ramping of VTA neural calcium activity, and VTA-dmPFC correlated activity, as potential markers for the anxiolytic properties of diazepam.

    1. Neuroscience
    Haiwei Zhang, Hongchen Li ... Ping Lv
    Research Article Updated

    Repressor element 1-silencing transcription factor (REST) is a transcriptional repressor that recognizes neuron-restrictive silencer elements in the mammalian genomes in a tissue- and cell-specific manner. The identity of REST target genes and molecular details of how REST regulates them are emerging. We performed conditional null deletion of Rest (cKO), mainly restricted to murine hair cells (HCs) and auditory neurons (aka spiral ganglion neurons [SGNs]). Null inactivation of full-length REST did not affect the development of normal HCs and SGNs but manifested as progressive hearing loss in adult mice. We found that the inactivation of REST resulted in an increased abundance of Kv7.4 channels at the transcript, protein, and functional levels. Specifically, we found that SGNs and HCs from Rest cKO mice displayed increased Kv7.4 expression and augmented Kv7 currents; SGN’s excitability was also significantly reduced. Administration of a compound with Kv7.4 channel activator activity, fasudil, recapitulated progressive hearing loss in mice. In contrast, inhibition of the Kv7 channels by XE991 rescued the auditory phenotype of Rest cKO mice. Previous studies identified some loss-of-function mutations within the Kv7.4-coding gene, Kcnq4, as a causative factor for progressive hearing loss in mice and humans. Thus, the findings reveal that a critical homeostatic Kv7.4 channel level is required for proper auditory functions.