1. Neuroscience
Download icon

Autocorrelation structure at rest predicts value correlates of single neurons during reward-guided choice

  1. Sean E Cavanagh
  2. Joni D Wallis
  3. Steven W Kennerley Is a corresponding author
  4. Laurence T Hunt Is a corresponding author
  1. University College London, United Kingdom
  2. University of California, Berkeley, United States
Research Advance
Cited
1
Views
2,986
Comments
0
Cite as: eLife 2016;5:e18937 doi: 10.7554/eLife.18937

Abstract

Correlates of value are routinely observed in the prefrontal cortex (PFC) during reward-guided decision making. In previous work (Hunt et al., 2015), we argued that PFC correlates of chosen value are a consequence of varying rates of a dynamical evidence accumulation process. Yet within PFC, there is substantial variability in chosen value correlates across individual neurons. Here we show that this variability is explained by neurons having different temporal receptive fields of integration, indexed by examining neuronal spike rate autocorrelation structure whilst at rest. We find that neurons with protracted resting temporal receptive fields exhibit stronger chosen value correlates during choice. Within orbitofrontal cortex, these neurons also sustain coding of chosen value from choice through the delivery of reward, providing a potential neural mechanism for maintaining predictions and updating stored values during learning. These findings reveal that within PFC, variability in temporal specialisation across neurons predicts involvement in specific decision-making computations.

https://doi.org/10.7554/eLife.18937.001

Introduction

Theoretical models of decision making emphasise the importance of evidence accumulation across time until a categorical choice is reached (Bogacz et al., 2006; Gold and Shadlen, 2007). One widely studied class of evidence accumulation models are cortical attractor networks, originally derived from studies of working memory (Amit and Brunel, 1997; Wang, 1999, 2002). These rely upon strong recurrent connections between similarly tuned neurons to integrate evidence across time, and exhibit temporally extended persistent activity that stores the outcome of the decision process in memory (Wang, 2002; Wong and Wang, 2006). In value-guided decision making tasks, attractor network models predict the emergence of correlates of chosen value during choice (Hunt et al., 2012; Rustichini and Padoa-Schioppa, 2015). These value correlates result from varying speeds of decision formation across different trials, an issue we explored closely in our previous paper (Hunt et al., 2015). However, in contrast to the relative homogeneity of chosen value correlates within such models, it is known that decision correlates are highly heterogeneous across different cells within a given region (Kennerley et al., 2009; Wallis and Kennerley, 2010; Meister et al., 2013). The source and functional significance of this neuronal heterogeneity remains unclear.

Neurons also exhibit heterogeneity in their temporal receptive fields of integration (Chen et al., 2015). The temporal receptive field of a neuron can be established by examining its spike-count autocorrelation function (ACF) at rest (Ogawa and Komatsu, 2010). A slowly decaying ACF whilst at rest reflects temporal stability in firing, suggesting that the neuron integrates information across long periods of time; by contrast, a fast-decaying ACF reflects temporal variability in firing. Recently, this approach was used to demonstrate a hierarchy of temporal receptive fields across areas of cortex (Murray et al., 2014), with populations of neurons in lower and higher cortical areas exhibiting brief and extended temporal receptive fields, respectively. Those areas with temporally extended receptive fields thus appear intrinsically adapted to cognitive tasks involving extended integration of information across time, such as working memory and decision making (Mazurek et al., 2003; Gold and Shadlen, 2007; Wang, 2012; Chaudhuri et al., 2015; Chen et al., 2015). Yet in addition to the heterogeneity of temporal fields across regions, similar heterogeneity is also evident within cortical areas (Ogawa and Komatsu, 2010; Nishida et al., 2014). It remains unknown whether this intra-regional heterogeneity in temporal specialisation might predict the computations served by different neurons in decision-making tasks.

In our previous study of reward-guided decision making (Hunt et al., 2015), we provided evidence that correlates of chosen value may emerge as a consequence of varying rates of evidence accumulation. A corollary of this idea is that neurons functionally specialised to perform temporally extended computations (such as evidence accumulation) might exhibit stronger chosen value correlates during choice. We hypothesised that this would be indexed by measuring individual neurons’ temporal receptive fields whilst at rest. We also hypothesised that this functional specialisation might support other temporally extended computations during reward-guided choice, such as the maintenance of value coding until reward delivery. This could be one component of a mechanism for credit assignment in learning, which is known to rely upon PFC and in particular orbitofrontal cortex (Walton et al., 2010; Takahashi et al., 2011; Chau et al., 2015; Jocham et al., 2016), with the other component being a representation of the chosen stimulus identity, which is also encoded by OFC neurons (Raghuraman and Padoa-Schioppa, 2014Lopatina et al., 2015). We therefore sought to link variability in spike-rate autocorrelation at rest with the variability of neuronal responses during reward-guided choices.

Results

We re-examined the neural correlates of chosen value during choice within rhesus macaque prefrontal cortex (PFC) (Hosokawa et al., 2013; Hunt et al., 2015), and extended our analysis to the time of reward delivery (Figure 1, Figure 1—figure supplement 1). During choice, chosen value correlates were remarkably similar across all three PFC brain regions (dorsolateral prefrontal cortex (DLPFC), orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC)) at the population level (Figure 1A). However, this was not the case at the time of outcome, where the chosen value correlates predominated in OFC (Figure 1B). This value signal at outcome contained information about both the chosen benefit and chosen cost (Figure 1—figure supplement 2). As well as variability in value correlates across time, there was a large degree of variability at the level of single neurons constituting the population averages, both at choice and outcome (Figure 1C–D). Within each region there were some neurons with strong chosen value correlates, but other neurons with weak or non-selective responses to chosen value.

Figure 1 with 2 supplements see all
Homogeneity and heterogeneity of chosen value correlates.

(A) At decision time, chosen value correlates appeared homogenous across regions in their expression. The coefficient of partial determination (CPD) for chosen value averaged across populations of DLPFC (n = 310), OFC (n = 214) and ACC (n = 333) neurons (lines denote mean ± SE for each region). CPD was calculated by regressing chosen value onto firing rate during the choice period of a cost-benefit decision making task (see Materials and methods). Chosen value correlates were not significantly different between any brain region (permutation tests; DLPFC v OFC, no cluster survived thresholding, DLPFC v ACC, p=0.2706, OFC v ACC, no cluster survived thresholding; see Materials and methods). Dashed lines mark the null hypothesis level for CPD in each cortical area (see Materials and methods). (B) Population averages when chosen value was regressed onto firing rate during reward delivery. OFC showed stronger chosen value correlates following reward onset than ACC and DLPFC (permutation tests; OFC v DLPFC, p=0.0010, OFC v ACC, p=0.0028; see Materials and methods). (C and D) Within each region, chosen value correlates were heterogeneous across neurons. Chosen value correlates of the individual neurons contributing to the population averages in A and B respectively. Within each matrix: each row is a neuron (sorted by maximum CPD within the corresponding epoch and area), each column is a 10 ms time bin. Hence, neurons are sorted in a different order in C and D. Chosen value coding at reward delivery was weaker than at choice. Figure 1—figure supplement 1 shows the fraction of neurons with reliable coding of chosen value at choice and at the outcome. Figure 1—figure supplement 2 shows that OFC codes chosen value, as opposed to chosen benefit alone, at the time of reward delivery.

https://doi.org/10.7554/eLife.18937.002

We hypothesised that this variability might be accounted for by intrinsic firing properties of the neurons at rest, reflecting different neurons’ temporal specialisation. We characterised resting properties of neuronal firing by examining their spike rate autocorrelation during pre-trial fixation. The decay of the autocorrelation function (ACF) provides a metric of each neuron’s temporal stability in firing rate. Careful inspection of ACFs at the level of single neurons demonstrated marked heterogeneity of ACFs across individual neurons (Figure 2), complementing previous descriptions that have examined average population responses (Murray et al., 2014; Chaudhuri et al., 2015). We fitted an exponential decay function (Murray et al., 2014) to all neurons that could be described by such an equation, yielding a single decay time constant, τ, for each neuron (446 of 857 neurons, see Figure 2—figure supplement 1 and Materials and methods). We found a large degree of heterogeneity in time constants across neurons, both within and between cortical areas (Figure 2C). Time constants were larger in the DLPFC and ACC population (Kruskal-Wallis test, p=0.0007), but most variable within OFC and ACC populations (Bartlett’s Statistic = 11.913, p=0.0026). Averaging across the ACFs of individual neurons prior to fitting the exponential equation yielded similar qualitative results to the population averages reported in Murray et al. (2014) (Figure 2—figure supplements 2 and 3).

Figure 2 with 4 supplements see all
Single neurons show variability in resting autocorrelation structure.

(A) Autocorrelation matrix and structure of an example low time constant OFC single neuron. (B) Autocorrelation matrix and structure of an example high time constant single OFC neuron. This neuron has a stable autocorrelation maintained across time. Fitting of time constants was only performed on cells that showed an exponentially decaying autocorrelation. See Figure 2—figure supplement 1 for single neuron examples of excluded cells. (C) Histograms of the time constants within the three PFC brain regions. Time constants are highly variable across neurons; with the greatest heterogeneity present within OFC and ACC populations. Solid and dashed vertical lines represent mean(Log(τ)) and mean(Log(τ)) ± SD(Log(τ)) respectively. See Figure 2—figure supplement 2 for autocorrelation structure at the population level. Figure 2—figure supplement 3 for population autocorrelation when trials are filtered for fluctuations in firing rate. Figure 2—figure supplement 4 shows the population autocorrelation across trial time.

https://doi.org/10.7554/eLife.18937.005

Our main question pertained to whether the observed variability in single-cell resting activity within PFC may determine different functional computations during a cost-benefit decision making task (Hosokawa et al., 2013; Hunt et al., 2015). We first sought to visually identify a potential relationship with chosen value by sorting the matrices in Figure 1C by time constant. To maximise our sensitivity, and because of the similarity in chosen value correlates across PFC brain regions at choice (Figure 1A), we collapsed this analysis across all three PFC regions (n = 446 neurons). We found that more neurons with high chosen value coefficient of partial determination (CPD) were more apparent at the bottom of the sorted matrix than at the top (Figure 3A), implying a relationship between chosen value coding and resting τ. To test this relationship statistically, neurons were subdivided into high and low time constant populations using a median split (Figure 3B). The population with a higher τ (more stable activity at rest) had more variance explained by chosen value during choice (permutation test (see Materials and methods), p=0.0298). We further demonstrated this relationship by performing a rank correlation between each neuron’s coefficient of partial determination (CPD) at the time of the maximum population-average CPD with its time constant (Correlation Coefficient = 0.148, p=0.0018; 95% CI [0.0556, 0.2373], Figure 3—figure supplement 1). This relationship was also present when controlling for the baseline firing rate and brain area using multiple regression (see Materials and methods, β = 0.3315, p=0.0254; 95% CI [0.0878 0.5751]).

Figure 3 with 1 supplement see all
Resting time constant predicts chosen value correlates during decision phase.

(A) Strong chosen value correlates were more prevalent in neurons with higher time constants. Coefficient of partial determination (CPD) for chosen value across time for each PFC neuron (n = 446) was stacked into a matrix. The rows of the matrix (i.e. each individual neuron) were sorted by increasing time constant, and then convolved with a Gaussian function (see Materials and methods). The white dashed line indicates a median split by time constant; high time constant neurons are beneath the line, low time constant neurons are above. The graph to the right of this matrix shows the individual decay time constant for each neuron (row) in the matrix. (B) When all neurons are subdivided by a median split of time constant, those with a higher time constant exhibit stronger chosen value correlates. Black trace indicates a significant cluster of bins, corrected for multiple comparisons across time (see Materials and methods, p=0.0298). CPD (mean ± SE) for chosen value was calculated by multiple linear regression analysis (see Materials and methods). Figure 3—figure supplement 1 shows a rank correlation of resting time constant with chosen value coding across time.

https://doi.org/10.7554/eLife.18937.010

We then repeated the analysis in Figure 3B across all three regions. We found that the relationship between high τ and chosen coding was particularly prominent in OFC and ACC, but observed no significant difference in the chosen value coding between populations with high/low τ in DLPFC (Figure 4). If the chosen value correlates were purely related to the dynamics of choice processes, we might expect them to return to baseline levels after the choice had been executed. Although this was largely the case, a degree of chosen value coding persisted until reward outcome, particularly within OFC (Figure 1B). Within OFC, we found that persistent coding of chosen value from choice to outcome was more evident within the high τ neuronal population than within the low τ population, particularly during the experience of reward delivery (Figure 4; permutation test at time of outcome (see Materials and methods), p=0.0082). Such sustained coding of chosen value from choice through outcome was not present in ACC and DLPFC. This implies a unique neuronal signature within OFC which could contribute to the linking of choices to outcomes, a process critical for learning.

Figure 4 with 1 supplement see all
Orbitofrontal neurons with higher resting time constant maintain a representation of chosen value from choice through the experience of reward delivery.

As in Figure 3B, a median split of neurons by their resting time constant was performed within each PFC area. The coefficient of partial determination (CPD) for chosen value in high time constant (blue) and low time constant (red) neurons is plotted timelocked to both choice and reward onset. Chosen value explained more of the variance in neuronal firing in the OFC neurons with a higher time constant both at choice (p=0.0066) and shortly after reward delivery (p=0.0082). Chosen value is therefore maintained across the trial within OFC, but returns to baseline before the next trial begins. CPD (mean ± SE) for chosen value was calculated by multiple linear regression analysis (see Materials and methods). Figure 4—figure supplement 1 shows a rank correlation of resting time constant with chosen value coding during the decision phase and reward delivery.

https://doi.org/10.7554/eLife.18937.012

Given the above result, we sought to address whether the same OFC neurons were signalling chosen value at the time of both choice and outcome. We performed a cross-temporal pattern analysis on data from the OFC (Kennerley et al., 2011; Stokes et al., 2013). This involves cross-correlating the chosen value regression coefficients of the entire neuronal population at all of the different time bins. If the same neurons encode chosen value at timepoints t and t+δt, one would expect a high correlation between these two timepoints; conversely, coding of chosen value by different neural ensembles would yield a far smaller, or zero, correlation. By examining the matrix of correlation coefficients at all possible timelags, different types of population neural coding can be revealed (such as transient, reactivation, or sustained coding; see Figure 5A). To avoid this analysis being confounded by noise correlations, we performed a ‘split half’ cross correlation analysis, calculating the regression coefficients for chosen value separately for odd and even trials.

Figure 5 with 2 supplements see all
The same OFC neurons correlate strongly with chosen value at both choice and during reward delivery, but only those with high time constants.

(A) Schematic representing the cross-temporal pattern analysis. Each pixel represents a correlation coefficient between two population vectors. Entries into the vectors contain each neuron’s chosen value regression coefficient at Time T and at Time T + δt. If the chosen value correlates are consistent across the neuronal population at the two distinct time points, there will be a strong cross-temporal correlation (red colour). At two points close in time, chosen value correlates of each neuron will inevitably be similar. If these correlates are consistent for only a short period of time, there will be a transient population code; whereas if each neuron’s chosen value correlate is consistent for a prolonged period, there will be a sustained population code. If each neuron within a population correlates with chosen value at two separate points of a trial (e.g. choice and outcome), in the absence of sustained coding bridging the two, there is a reactivation population code. (B) Cross-temporal pattern analysis of OFC neurons (n = 214). There is clear evidence for sustained coding of the chosen value at choice (top left), as well as before and throughout outcome (bottom right), reflected by strong correlations extending off the diagonal of the plot. Blue lines indicate a significant area of cross-correlation (p<0.05, see Materials and methods). There is also sustained coding of the chosen value signal from choice through outcome, shown by a strong cross-temporal correlation both prior (grey dashed box) and during reward (black dashed box). Within the dashed areas, blue lines indicate a significant area of cross-correlation (p<0.05, see Materials and methods). (C and D) The black dashed inset (bottom left quadrant in B) is then performed in high (C) and low (D) time constant OFC neurons separately. The sustained coding is present specifically in high time constant cells (largest cluster of cross correlation, p=0.0002), but absent in low time constant cells (p=0.2248; permutation test, see Materials and methods). See also Figure 5—figure supplement 1 and 2: Sustained chosen value correlates are present at choice and outcome within DLPFC and ACC, but sustained coding from choice through outcome is absent. Sustained coding between choice and outcome was much stronger in OFC than in DLPFC or ACC (permutation tests; OFC v DLPFC, p=0.0008, OFC v ACC, p<0.0001, see Materials and methods).

https://doi.org/10.7554/eLife.18937.014

During the choice epoch, there was unsurprisingly evidence for on-diagonal coding (top left quadrant of Figure 5B). The OFC neuronal population code was also persistent across time during this epoch (warm off-diagonal elements in top left quadrant of Figure 5B), and even more so during the outcome epoch (warm off-diagonal elements in bottom right quadrant of Figure 5B). This sustained activity reflects the notion that dynamical decision processes within the OFC population may take place over several hundreds of milliseconds. Crucially, however, there was also evidence for sustained coding: the same neuronal population in OFC at choice encoded chosen value from at least 1000 ms before outcome through to 1000 ms after outcome (warm colours in Figure 5B, grey and black dashed boxes, permutation tests (see Materials and methods), largest clusters p<0.0001); such sustained coding of value from choice through outcome was absent within DLPFC (Figure 5—figure supplement 1A) and ACC (Figure 5—figure supplement 2A) neuronal populations. Within OFC, this sustained population code appeared most prominent in the neurons with a high resting time constant τ (Figure 5C), but absent in those with a low τ (Figure 5D). Note, however, that this difference should be interpreted cautiously, as a formal comparison of cluster size within the high and low τ populations (using a non-parametric permutation test, see Materials and methods) was not significant (p=0.59). Nonetheless, the sustained population code from choice through outcome was much stronger in OFC (Figure 5B–C) than in both the ACC and DLPFC populations (Figure 5—figure supplement 1 and 2; permutation tests, OFC v DLPFC, p=0008; OFC v ACC, p<0.0001; see Materials and methods). This demonstrates that OFC neurons with persistent activity at rest encode a 'sustained' representation of chosen value until an expected outcome is experienced, and that this neural signature appears unique to OFC.

Discussion

We have shown that characterising the temporal receptive field of integration of individual PFC neurons based upon their resting activity has significant predictive power for describing their role in decision-making computations. These include the accumulation of evidence during choice, and the persistence of value encoding until the experience of outcome delivery.

Circuits within the prefrontal cortex are endowed with several features that may support persistent activity. These include complex pyramidal cell morphology, strong reciprocal connections, slow-decaying NMDA-Receptor transmission and augmenting synapses (Wang, 2001; Elston, 2003; Wang et al., 2006, 2008; Freeman, 1995; Wang et al., 2013). These factors may account for both the prolonged resting stability within PFC, and the ability of its neurons to support computations that subserve flexible cognition (Miller et al., 1996). However, there are different cell-classes within the PFC, with substantial heterogeneity in their morphology, synapses and expression of slow-decaying NMDA-Receptors (Wong and Wang, 2006; Zaitsev et al., 2009; Wang et al., 2013). When randomly sampling neurons within the macaque PFC, the morphology, cell-type, cortical layer and synaptic features are unknown. Recorded neurons are therefore likely sampled from separate subnetworks with differing resting stabilities and distinct roles in cognitive processing (Wang et al., 2013). This may explain the heterogeneity we observed in both resting activity and involvement in decision making computations observed across PFC neurons. Recent evidence has shown diversity in functional responses of PFC neurons dependent upon the cell-type and cortical layer in which they were located (Zhou et al., 2012; Pinto and Dan, 2015).

Most importantly in this study, we demonstrated that neurons with higher resting time constants had strong chosen value correlates at choice. Following on from our previous work (Hunt et al., 2015) – where we demonstrated that chosen value correlates can arise indirectly from the dynamics of decision processes – our result implies that neurons with more persistent resting activity are more involved in value-based choice. This provides new experimental evidence to support computational theories which attribute evidence integration to strongly recurrent attractor networks (Wang, 2002; Wong and Wang, 2006). Neurons located within these reverberant PFC subnetworks would be expected to have both higher time constants and stronger value correlates. It also indicates that such models need refinement if they are to encompass the heterogeneous correlates of decisions varaiables that we and others have observed (Kennerley et al., 2009; Meister et al., 2013). Our findings facilitate several testable predictions for research into single-neuron mechanisms of decision making. For perceptual decisions, such as the random dot-motion task, which involve the integration of evidence over time more explicitly than our cost-benefit decision paradigm, we would predict task-related neurons would also have high time constants (Gold and Shadlen, 2007).

Cross-temporal pattern analysis (Stokes et al., 2015) provides a powerful tool to allow for the interrogation of maintained activity within neuronal populations. In addition to decision-making, computational models of working memory also rely upon stable, persistent activity within richly reverberant networks for the retention of information across delays (Wang, 1999). Our data showing that evidence maintenance is indeed fulfilled by neurons with higher time constants concurs with this hypothesis. The ability to maintain a representation of chosen value across delays may explain why OFC is essential for delay-based decision making (Rudebeck et al., 2006) and why OFC damage causes decision-making and credit assignment deficits (Rudebeck et al., 2008; Noonan et al., 2010; Walton et al., 2010; Camille et al., 2011; Chau et al., 2015).

Our data on single neuron time constants have provided new insights into potential credit assignment mechanisms within the orbitofrontal cortex. Several imaging and lesion studies have argued that the OFC is involved in the assignment of credit during learning and decision-making (Walton et al., 2010; Takahashi et al., 2011; Chau et al., 2015; Akaishi et al., 2016). Single neuron studies have demonstrated OFC cells encode the reward identity across delays (Lara et al., 2009), encode specific outcome features during learning (Raghuraman and Padoa-Schioppa, 2014Lopatina et al., 2015), and in some cases the same neurons are involved in both choice and outcome processes (Kennerley and Wallis, 2009). Indeed, there is a large body of evidence suggesting OFC signals outcome expectancies (Rangel and Hare, 2010; Schoenbaum et al., 2010). However, despite ideas that OFC is critical for credit assignment during learning, we are not aware of any study that has demonstrated what a neuronal signature of credit assignment might resemble. Here we show that OFC neurons with high temporal specializations not only encode an integrated chosen value signal during choice, but that the same OFC neurons maintain this representation through to the experience of an outcome. This neural signature - when combined with a representation of the chosen stimulus identity, which is also encoded in OFC (Raghuraman and Padoa-Schioppa, 2014Lopatina et al., 2015) - could be a key computation for credit assignment processes.

As well as our findings at the single-neuron level, our results reiterate the value of assigning timescales at the level of a cortical area (Murray et al., 2014). We replicated the findings of Murray et al. (2014) showing that the anterior cingulate cortex (ACC) had the longest timescale within the PFC regions studied. It is possible the ACC may be supporting extended cognitive processes that our experimental paradigm was not designed to capture. These include the encoding or integrating of reward, planning and/or choice information across multiple trials (Matsumoto et al., 2007; Seo and Lee, 2007; Bernacchia et al., 2011; Hayden et al., 2011; Kennerley et al., 2011; Stoll et al., 2016). Future studies might explore the timescales of other prefrontal regions proposed to have unique roles in storing information across multiple trials, such as frontal polar cortex (Boorman et al., 2009; Donoso et al., 2014).

We demonstrate that calculating the decay in a neuron’s intrinsic resting-state autocorrelation can provide a powerful tool for predicting functional properties during cognitive tasks. Our findings therefore have important implications for how neurophysiological datasets are collected and analysed. One current method of avoiding variability in neuronal responses during cognitive tasks is pre-selection of neurons based upon their response properties; neurons with stable, persistent responses on memory guided saccade tasks are preferentially selected for analysis in decision-making tasks (Roitman and Shadlen, 2002; Huk and Shadlen, 2005; Yang and Shadlen, 2007; Mante et al., 2013; Kira et al., 2015). This method may lead investigators to record from neurons with longer temporal receptive fields, as evidenced (Murray et al., 2014) by the higher population-level time constants within the lateral intraparietal area (LIP) when neurons are screened prior to recording (Freedman and Assad, 2006) versus when they are not (Seo et al., 2009). A more unbiased characterisation of the heterogeneity of neuronal responses may be obtained by recording from all encountered neurons and categorising them post-hoc, as is more common practice in PFC studies (Freedman et al., 2001; Padoa-Schioppa and Assad, 2006; Kim et al., 2008; Kennerley et al., 2009; Hanks et al., 2015). In the context of decision-making, this has highlighted several ‘non-classical’ neuronal response profiles in regions such as LIP (Meister et al., 2013). Indeed, even in spite of pre-screening neurons prior to the task, substantial heterogeneity in task-related responses can nonetheless remain (Premereur et al., 2011). A more complete understanding of decision-making computations requires us to understand the roles of all of the neurons in these decision processes.

In summary, we have shown that functional specialisation for temporally extended computations predicts the involvement of PFC neurons in specific aspects of value-guided decision making. We anticipate that this approach may become significant in predicting the role of neurons in many other temporally extended computations dependent upon prefrontal cortex. These might include working memory (Wang et al., 2013), strategic (Seo et al., 2014) and rule-based reasoning (Buschman et al., 2012), and foraging behaviours (Hayden et al., 2011).

Materials and methods

Neurophysiological procedures, task structure and regression analysis of single-neuron responses have been reported previously (Hosokawa et al., 2013; Hunt et al., 2015). In brief, four male rhesus macaques served as subjects. Recordings were taken from dorsolateral prefrontal cortex (DLPFC), orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC). The sample size of neurons recorded was therefore predetermined from this pre-existing dataset. The number of neurons and cortical areas recorded from each of the four subjects have been reported previously (Hosokawa et al., 2013). The regression model for analysing correlates of chosen value was the same as defined previously (Hunt et al., 2015).

Null hypothesis for coefficient of partial determination (Figure 1)

A ‘null hypothesis’ test for the coefficient of partial determination (CPD) was developed to make interpreting results easier. For each behavioural session, a single regressor of interest (e.g. chosen value - Figure 1A and B; chosen benefit - Figure 1—figure supplement 2A; chosen cost – Figure 1—figure supplement 2B), was shuffled across trials and a ‘permuted’ CPD calculated. This procedure was repeated 1000 times. For each neuron, at each time point, the permuted CPD was averaged across all of the permutations. The null hypothesis CPD for a cortical area was set at the upper bound of the 95% confidence interval across the population.

Calculation of autocorrelograms (Figure 2)

Single-neuron activity during a 1 s fixation period was used to assign time constants. Single unit responses were time locked to the onset of the fixation period of successfully completed trials to create rasters (lasting 1 s from the onset of fixation). The rasters were divided into 20 separate, successive 50 ms bins. The spike count for each neuron within each bin was computed for each trial. We calculated the across-trial correlation of spike counts between all of the bins using Pearson’s correlation coefficient. For each individual neuron, this produced an autocorrelation matrix when plotted as a function of trial time (e.g. Figure 2A left side), or an exponential decay when plotted as a function of time lag between bins (e.g. Figure 2A right side).

Using an exponential decay equation (Murray et al., 2014), the decay of the autocorrelation with increasing separation time between bins was fitted to the data using the following equation:

(1) R(kΔ)=A[exp(kΔτ)+B]

In which kΔ refers to the time lag between time bins (50 to 950 ms) and τ is the time constant of the cortical area. Neurons from all areas, particularly ACC, showed evidence of lower correlation values at the shortest time lag (50 ms; Figure 2—figure supplement 2). This may reflect refractoriness or negative adaptation (Murray et al. 2014). To overcome this, fitting started from the largest reduction in autocorrelation (between two consecutive time bins) onwards.

Assigning a time constant to single neurons (Figure 2A–B, Figure 2—figure supplement 1)

For most of the key analyses, individual parameters of the autocorrelation decay function in Equation 1 were estimated for each neuron. Cells with an autocorrelation function poorly fitted by an exponential decay were excluded from the analysis (see Figure 2—figure supplement 1 for examples). Initially neurons failing to meet a set of objective criteria were removed (176/857). These criteria were as follows:

  1. Fixation firing rate of greater than 1 Hz

  2. Decline in the autocorrelation function in the first 250 ms of time lags

  3. No 50 ms time-bin within the fixation period with zero spikes across all recorded trials

  4. A and B parameters from Equation 1 cannot both be positive when the autocorrelation function is fitted.

This was followed by a process of visual inspection by two blinded independent observers, where a further set of neurons were considered to possess autocorrelation functions poorly characterised by an exponential decay (235/857 neurons). The autocorrelation functions of all included / excluded neurons are available as supplementary material.

The remaining 446 neurons were assigned a time constant using expectation maximisation in a hierarchical (random effects) fitting procedure. The decay of their resting autocorrelation was fitted using the same equation as above, with log(τ), A and B being estimated as a multinomial Gaussian across the neuronal population. Fitting started after the first reduction in autocorrelation between time bins. Neurons from each PFC area were fitted separately.

Comparing single neuron time constants across cortical areas

Single neuron time constants were log-transformed and grouped by cortical area (DLPFC; OFC; ACC). The variance of these groups was compared using Bartlett’s Test. Single neuron time constants were also grouped by cortical area and compared using a Kruskal-Wallis test.

Assigning time constant at the population level (Figure 2—figure supplements 24)

Autocorrelation as a function of trial-time and time lag can also be averaged across a population of neurons, prior to fitting Equation 1 (see Figure 2—figure supplements 2- 4). In addition to the data lost due to incomplete trials, previous investigators have excluded a further proportion of trials due to the drifting resting firing rate of neurons over the course of a session (Murray et al., 2014; Nishida et al., 2014). As we intended to assign time constants to individual neurons, we decided that estimating autocorrelation from a restricted trial number would not provide the best estimation of spike-count autocorrelation. However, it is possible our method artificially inflated autocorrelation due to drifting firing rates throughout a session. Therefore, as a control analysis, we filtered trials when firing rates drifted, using the same approach as in (Nishida et al., 2014). For each neuron, the total spike count during the fixation period of each trial was calculated. A sliding window of these spike counts for 100 trials was subdivided into 4 groups of 25 trials and entered into a Kruskal-Wallis test. By shifting this sliding window from the 1st to the last trial within a session, we obtained the longest sequence of trials in which activity did not differ significantly (p>0.005). This procedure reduced the number of trials used for estimating the autocorrelation function on average by 38.4%. When comparing the population level fits to data using the method reported above, very similar time constants were obtained (compare Figure 2—figure supplement 2 versus Figure 2—figure supplement 3).

Display matrix of chosen value correlates sorted by time constant (Figure 3A)

The coefficient of partial determination (CPD) for chosen value (see Hunt et al., 2015) across time for each PFC neuron (n = 446) was stacked into a matrix. The rows of the matrix (i.e. each individual neuron) were sorted by increasing time constant, and then smoothed across neurons with a Gaussian kernel, Full Width at Half Maximum=4.5 neurons (S.D. = 2).

Significance testing using cluster-based permutation tests (Figure 1A, Figure 1B, Figure 3B and Figure 4)

To identify significant clusters of chosen value coding whilst correcting for multiple comparisons across time, cluster based permutation tests were used (Nichols and Holmes, 2002).

In Figure 1A and B, a two-sample T-test compared the chosen value coefficient of partial determination (CPD) at each time bin between two given cortical areas. In Figure 3B and Figure 4, a two-sample T-test compared the chosen value CPD at each time bin between the median split of neurons with high and low time constants. The longest window of consecutive bins using an uncorrected (cluster-forming) threshold of p<0.01 within a pre-specified time window was then identified. The pre-specified time windows were as follows:

Time window onsetTime window offset
Figure 1A, Figure 3B and 4Choice epoch onsetChoice epoch offset (1 s after choice epoch onset)
Figure 1B, Figure 4Reward onset1 s after reward onset

The size of this cluster was compared to a null distribution constructed using a permutation test. Neurons assigned to either each cortical area (Figure 1), or high and low time constant groups (Figure 3B and Figure 4) were randomly permuted 10,000 times and the cluster analysis was repeated for each permutation. The length of the longest cluster for each permutation was entered into the null distribution. The true cluster size was significant at the p<0.05 or p<0.01 level (corrected) if the true cluster length exceeded the 97.5th percentile or 99.5th percentile of the null distribution, respectively.

Multiple-linear regression

To further test the relationship of chosen value coefficient of partial determination (CPD) with resting time constant, the log-transformed time constant and log-transformed fixation firing rate, along with additional regressors to control for brain area, were regressed onto the log-transformed chosen value CPD of each neuron at the time of the maximal across-area population CPD (410 ms, see Figure 3B).

Cross-temporal pattern analysis (Figure 5)

To assess the maintenance and re-emergence of chosen value correlates throughout a trial, we performed a population cross-temporal pattern analysis (Kennerley et al., 2011; Stokes et al., 2013). This used the same regression model as before (Hunt et al., 2015), except that the regression coefficient (Z-score) for each neuron’s chosen value coding was calculated separately for odd and even trials. This ‘split-half’ method was utilised to prevent the analysis being confounded by noise correlations.

A population vector (V), with each entry being the chosen value correlates for n cells, was produced for each time point. The population vectors at all of the different time points were then cross-correlated to produce a matrix of correlation coefficients (Figure 5A). Each matrix of correlation coefficients was averaged across the diagonal in order for the data to reflect both odd-to-even and even-to-odd trial projections. This analysis was performed on all of the cells within a cortical area, with the analysis also performed separately following a median split of within-area time constant (Figure 5B–D, Figure 5—figure supplements 12). The consistency of the population code between choice and outcome within OFC cells was of particular interest. Therefore, data from the 1 s choice epoch were correlated against two 1 s periods directly preceding and following reward onset, with the results displayed within the grey and black dashed boxes respectively in Figure 5B and Figure 5—figure supplements 1A2A for all cells, and separately for high and low times constants in Figure 5C–D and Figure 5—figure supplements 1B-C2B-C).

To demonstrate sustained population coding of chosen value correlates during choice, a cluster-based permutation test was used (Nichols and Holmes, 2002). All correlation coefficients with an uncorrected p<0.01 were highlighted. Any area of interconnecting pixels was defined as a true cluster. The ordering of all of the population vectors was then randomised and the analysis repeated. This permutation occurred 10,000 times and produced a null distribution of cluster sizes. True clusters were significant to the p<0.05 (0.01) level if the area of interconnecting pixels exceeded the 97.5th percentile (99.5th) of those in the null distribution.

This analysis was also repeated for the 1 by 1 s periods of reward onset versus reward onset; choice onset versus 1 s prior to reward onset; choice onset versus 1 s following reward onset; 1s pre-reward onset versus 1s following reward onset.

Comparing sustained coding between cortical areas and high/low time constant neurons (Figure 5)

To compare the sustained coding present from choice through reward delivery between different cortical areas, a permutation test was performed. The black dashed area of the cross-temporal population correlation matrices of Figure 5B, Figure 5—figure supplement 1A and Figure 5—figure supplement 2A were extracted. For each pair of brain areas, the cross-temporal correlation coefficients at each corresponding pixel were compared using Fisher’s r-to-Z transformation. All pixels which had correlation coefficients which were significantly different between brain areas (with an uncorrected p<0.01) were highlighted. Any area of interconnecting pixels was defined as a true cluster. The largest area of interconnecting pixels was identified and defined as the ‘Largest True Difference Cluster’. The assignment of neurons to brain areas was then shuffled and the analysis repeated 10,000 times to produce a null distribution of Difference Cluster sizes, against which the true cluster size was compared. The test was performed independently for OFC v DLPFC, OFC v ACC and DLPFC v ACC.

A similar test was performed to compare high vs. low time constant neurons (i.e. to compare Figure 5C vs. Figure 5D); except in the permuted data, neurons were shuffled between high/low groups - as opposed to between different brain areas.

Data availability

Data (and MATLAB scripts to reproduce the analyses shown in this paper) are available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.5b331 (Cavanagh et al., 2016)

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
    Neural mechanisms of visual working memory in prefrontal cortex of the macaque
    1. EK Miller
    2. CA Erickson
    3. R Desimone
    (1996)
    Journal of Neuroscience 16:5154–5167.
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
    Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task
    1. JD Roitman
    2. MN Shadlen
    (2002)
    Journal of Neuroscience 22:9475–9489.
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
    Synaptic basis of cortical persistent activity: the importance of NMDA receptors to working memory
    1. XJ Wang
    (1999)
    Journal of Neuroscience 19:9587–9603.
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71

Decision letter

  1. Michael J Frank
    Reviewing Editor; Brown University, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Autocorrelation structure at rest predicts value correlates of single neurons during reward-guided choice" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Eve Marder as the Senior Editor. The following individual involved in review of your submission has agreed to reveal his identity: Daeyeol Lee (Reviewer #1).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This paper builds on the authors' earlier analysis of the single-neuron data previously collected in three different regions of the primate prefrontal cortex (DLPFC, ACC, and OFC), now focusing on the relationship between the intrinsic timescale (time constant) of spiking activity and their role in value-based decision making. In particular, they found that neurons in the ACC and OFC with longer time constants during fixation (as measured by spike rate autocorrelation) are more likely to encode chosen value signals. Moreover, OFC neurons tended to encode the chosen value signals more strongly during the outcome period compared to the neurons in the DLPFC and ACC. The cross temporal correlation analysis also revealed that the chosen value coding in the OFC during the choice and outcome epochs was consistent. The authors suggested that this might be important for the proposed role of the OFC in resolving temporal credit assignment.

Essential revisions:

Overall, the findings reported in this manuscript are very timely. They are nice additions to the previous work from the authors, and provide important new insights into the neural mechanism of decision making. Also, the manuscript is written clearly and easy to follow. Nevertheless, both reviewers identified some important issues that need to be clarified.

1) It is not clear how the persistent coding of chosen value signals can be used for, or reflect, the resolution of the temporal credit assignment problem. For resolving the temporal credit assignment problem, the brain must recognize one of the previous actions or previously visited states that is related to a particular outcome after a delay. How can the chosen value signals, rather than the signals directly related to previous actions or states, can be used for this purpose? The focus on the temporal credit assignment problem also seems a bit inconsistent with the prediction of the authors from the Wang model about the relationship between the timescale and value coding. In other words, their model of mutual inhibition predicts this relationship, even though it is not clear how that model resolves the temporal credit assignment problem.

2) Similarly, the Discussion focuses mostly on the function of the OFC, but the results from this manuscript and Murray et al. showed that the longest time scale is seen in the ACC. Therefore, it might be helpful to include some discussion about the possible function of the long time scale of ACC activity. One possibility is that ACC might play a more important role in integrating signals across multiple trials, as suggested by Seo and Lee (2007) and Bernacchia et al. (2011).

3) Tests of the association between time constant and chosen-value coding mainly use a median split on the time constant tau. However, it doesn't look like tau values fall into discrete high and low clusters (there's no apparent discontinuity at the median in Figure 3A, right-hand side). The rank correlation test mentioned in the third paragraph of the Results seems like a much more natural approach. What's the justification for not using the rank correlation for all the analyses, i.e. the tests of the entire time course, of individual brain regions, and of the outcome phase? As it stands the correlation analysis is confined to a single time point, and the criterion for choosing this time point is vague ("maximal population response"). Does this mean (1) maximum spike rate, (2) maximum population-average CPD, or (3) maximum high-versus-low effect in Figure 3B? #1 seems like the most natural reading, but #2 is what I would guess given the context (#3 would be circular).

4) In a few places the paper infers differences because an effect reaches significance in one condition but not another. A direct contrast between the two conditions is generally more appropriate in such cases. This applies to: (1) greater outcome-related value coding in OFC than in DLPFC/ACC (Results, fourth paragraph); (2) greater reactivation coding in high-tau than low-tau neurons (Results, last paragraph); (3) greater reactivation coding in OFC than in DLPFC/ACC (Figure 5—figure supplement 12).

5) For the cross-temporal correlation analysis, the authors draw a distinction between "sustained" and "reactivation" coding. But this distinction sometimes gets blurry. The evidence mainly supports reactivation coding, but the conclusion is that OFC is "maintaining a representation of chosen value until an expected outcome is experienced" (Results, last paragraph), which sounds more like sustained coding. Similarly, earlier (Results, fourth paragraph) the paper concludes that OFC codes value through the choice-outcome interval, but only a post-outcome epoch is actually tested.

6) The paper should at least briefly address the distinction between "chosen value" and what one might call "outcome value" – the size of the juice reward. These aren't identical, since chosen value also incorporates an effort/delay requirement. But they may be correlated. Can the authors rule out that OFC is merely encoding the juice magnitude in the outcome phase? That is, is there direct evidence it also encodes the (already completed) effort or delay requirement?

https://doi.org/10.7554/eLife.18937.022

Author response

Essential revisions:

Overall, the findings reported in this manuscript are very timely. They are nice additions to the previous work from the authors, and provide important new insights into the neural mechanism of decision making. Also, the manuscript is written clearly and easy to follow. Nevertheless, both reviewers identified some important issues that need to be clarified.

1) It is not clear how the persistent coding of chosen value signals can be used for, or reflect, the resolution of the temporal credit assignment problem. For resolving the temporal credit assignment problem, the brain must recognize one of the previous actions or previously visited states that is related to a particular outcome after a delay. How can the chosen value signals, rather than the signals directly related to previous actions or states, can be used for this purpose? The focus on the temporal credit assignment problem also seems a bit inconsistent with the prediction of the authors from the Wang model about the relationship between the timescale and value coding. In other words, their model of mutual inhibition predicts this relationship, even though it is not clear how that model resolves the temporal credit assignment problem.

This comment prompted us to consider more carefully the implications of a sustained chosen value signal from choice through outcome. We agree that a chosen value signal could only play an important part in resolving the temporal credit assignment problem in combination with a representation of the chosen stimuli/action to which credit must be assigned.

In our original paper (Huntet al., 2015), we found that a signal for chosen action was present in dorsolateral prefrontal cortex (DLPFC) in the latter part of the choice epoch. We therefore tested whether chosen action coding was also present at the time of reward delivery. Just as is evident during the choice epoch, chosen action signals during the outcome epoch are predominantly found in DLPFC:

Author response image 1
Population averages when chosen action was regressed onto firing rate during reward delivery.

DLPFC showed stronger chosen action correlates following reward onset than ACC and OFC (permutation tests; DLPFC v OFC, p = 0.0006, DLPFC v ACC, p = 0.0002; see Methods). Dashed lines mark the null hypothesis level for CPD in each cortical area (see Methods).

https://doi.org/10.7554/eLife.18937.017

We also tested whether the presence of this chosen action signal, which might contribute to the resolution of the temporal credit assignment problem, was predominant within a subset of neurons with a particular resting time constant. We found that the strongest coding of chosen action during outcome was found within the high time constant neurons within DLPFC:

Author response image 2
Dorsolateral prefrontal cortex neurons with higher resting time constant code chosen action more strongly around reward onset.

(A) As in Figure 4, a median split of neurons by their resting time constant was performed within DLPFC. The coefficient of partial determination (CPD) for chosen action in high time constant (blue) and low time constant (red) neurons is plotted timelocked to reward onset. CPD (mean ± SE) for chosen action was calculated by multiple linear regression analysis (see Methods). (B) As in Figure 4—figure supplement 1, a rank correlation between resting time constant and chosen action coding is plotted. There was a positive correlation between resting time constant and the coefficient of partial determination (CPD) for chosen action at the time of the maximum population-average CPD during outcome (vertical purple line and asterisk, correlation coefficient = 0.1607, p = 0.0249).

https://doi.org/10.7554/eLife.18937.018

This provides some evidence that it may indeed be high time-constant cells that carry the most information about signals relevant for credit assignment at outcome: chosen value (in OFC), and chosen action (in DLPFC).

However, it is important to note that this task requires choices between different stimuli based on their values, rather than action values per se. Although there may have been some credit assignment of outcomes to actions, optimal credit assignment in this task would involve the assignment of values to the visual stimuli. It is possible that firing rates of OFC neurons could encode information about individual stimuli using a linear relationship that also reflected a value code. In our study, such a ‘stimulus identity’ code would be inherently confounded with value, so we do not believe it could be isolated from this dataset. Nevertheless, recent studies (Lopatinaet al., 2015) have demonstrated that single OFC neurons can encode specific stimuli or stimulus-outcome relationships, so it appears plausible that both a stimulus-specific code and a sustained chosen value code could be utilized across the OFC population to support both choice and credit assignment processes.

Nonetheless, we accept the reviewer’s point and have therefore toned down the references to credit assignment throughout the manuscript (except when referencing other papers that address the question of credit assignment more directly). Instead, we refer to this signal as an important component of a system to update stored values.

For example, in the Abstract, we now say:

“Within orbitofrontal cortex, these neurons also sustain coding of chosen value from choice through the delivery of reward, providing a potential neural mechanism for maintaining predictions and updating stored values during learning.”

In the Introduction, we now say:

“This could be one component of a mechanism for credit assignment in learning, which is known to rely upon PFC and in particular orbitofrontal cortex (Walton et al., 2010; Takahashi et al., 2011; Chau et al., 2015; Jocham et al., 2016), with the other component being a representation of the chosen stimulus identity, which is also encoded by OFC neurons (Lopatina et al., 2015).”

In the Results, we now say:

“This implies a unique neuronal signature within OFC which could contribute to the linking of choices to outcomes, a process critical for learning.”

In the Discussion we now add:

“This neural signature – when combined with a representation of the chosen stimulus identity, which is also encoded in OFC (Lopatina et al., 2015) – could be a key computation for credit assignment processes.”

2) Similarly, the Discussion focuses mostly on the function of the OFC, but the results from this manuscript and Murray et al. showed that the longest time scale is seen in the ACC. Therefore, it might be helpful to include some discussion about the possible function of the long time scale of ACC activity. One possibility is that ACC might play a more important role in integrating signals across multiple trials, as suggested by Seo and Lee (2007) and Bernacchia et al. (2011).

Thank you for raising this comment. We agree that this is a point worth discussing. We have added the following paragraph to the Discussion:

“As well as our findings at the single-neuron level, our results reiterate the value of assigning timescales at the level of a cortical area (Murray et al., 2014). […] Future studies might explore the timescales of other prefrontal regions proposed to have unique roles in storing information across multiple trials, such as frontal polar cortex (Boorman et al., 2009; Donoso et al., 2014).”

3) Tests of the association between time constant and chosen-value coding mainly use a median split on the time constant tau. However, it doesn't look like tau values fall into discrete high and low clusters (there's no apparent discontinuity at the median in Figure 3A, right-hand side). The rank correlation test mentioned in the third paragraph of the Results seems like a much more natural approach. What's the justification for not using the rank correlation for all the analyses, i.e. the tests of the entire time course, of individual brain regions, and of the outcome phase? As it stands the correlation analysis is confined to a single time point, and the criterion for choosing this time point is vague ("maximal population response"). Does this mean (1) maximum spike rate, (2) maximum population-average CPD, or (3) maximum high-versus-low effect in Figure 3B? #1 seems like the most natural reading, but #2 is what I would guess given the context (#3 would be circular).

We thank the reviewers for raising this important point. As mentioned by the reviewers below, we felt that the median split was the most straightforward way to visualise the difference between chosen value correlates in both low and high time constant cells. We also tried to show a representation of chosen value correlates across the entire population in Figure 3A, sorted by time constant. We felt that it was important to show both of these, as they demonstrate that there are some cells with comparatively low time constants that do have some variance explained by chosen value – but they are fewer in number and weaker than those with high time constants.

In response to this comment, however, we have added sliding rank correlation tests for both choice and outcome periods for both Figure 3 and Figure 4 as supplementary figures. In short, these supplementary analyses essentially recapitulate the results shown using the median split approach.

In new Figure 3—figure supplement 1 (collapsed across regions), there is a positive correlation between time constant and chosen value coefficient of partial determination (CPD) around the maximum-population average CPD. This time course corresponds to the greatest separation between the high and low time constant median split in Figure 3.

With respect to the reviewer’s final comment: on each plot we have now marked the maximum population-average CPD with a vertical blue line (i.e. the reviewer is correct that we meant #2). We recognise that this was previously vague within the text. We have clarified we meant #2 within the Results section:

“We further demonstrated this relationship by performing a rank correlation between each neuron’s coefficient of partial determination (CPD) at the time of the maximum population-average CPD with its time constant (Correlation Coefficient = 0.148, p = 0.0018; 95% CI [0.0556, 0.2373], Figure 3—figure supplement 1).”

Note that the ‘dip’ in correlation between chosen value correlates and time constants (at the very start of the trial) is also observed in the median split analysis (Figure 3). This result may seem surprising – it appears very early for such a factor to explain variance in neural firing. However, we interpret this finding in a similar way to the ‘prescient’ pre-stimulus activity observed in (Padoa-Schioppa, 2013). In particular, the firing rate of neurons before trial onset may affect the success of the network in making decisions, and so induce correlations with chosen value pre-stimulus. This is indeed apparent in the analyses of Figure 1 and Figure 1—figure supplement 2, where there is chosen value coding that is slightly higher than chance, pre-stimulus, across all three regions. The negative correlation with time constants suggests that this is in a different set of cells to the high time constant ‘temporal integrators’ that express chosen value most strongly during choice. However, as this point is rather orthogonal to the main point of the paper (and there is far less variance explained by chosen value at this time point anyway), we do not focus on this explicitly in the main text.

We also perform the sliding rank correlation separated by regions, in new Figure 4—figure supplement 1. Here there is a positive correlation between time constant and chosen value coefficient of partial determination (CPD) around the maximum-population average CPD within OFC during choice. The positive correlation between time constant and CPD emerges later within ACC, similar to the median split method in Figure 4. At the time of outcome, there is only a relationship between time constant and chosen value coding within OFC; this is in the form of a strong positive correlation both at the maximum-population CPD around 900ms into the outcome period, and in the period shortly after reward onset.

We are in agreement with the view expressed by the reviewers below, however, that the median split is overall the clearer way to visualise the results. This is especially true for the cross-temporal pattern analysis in Figure 5, which relies upon correlating the regression coefficients of distinct groups of neurons. We therefore feel that it is best to focus predominantly on this median-split method in the main manuscript, and include the sliding rank correlation figures shown above as supplementary analyses.

4) In a few places the paper infers differences because an effect reaches significance in one condition but not another. A direct contrast between the two conditions is generally more appropriate in such cases. This applies to: (1) greater outcome-related value coding in OFC than in DLPFC/ACC (Results, fourth paragraph); (2) greater reactivation coding in high-tau than low-tau neurons (Results, last paragraph); (3) greater reactivation coding in OFC than in DLPFC/ACC (Figure 5—figure supplement 12).

This is a useful comment which has helped to make our conclusions more thorough. To address this comment, we used non-parametric permutation tests to perform the direct contrasts between the conditions. We did this for all of the three suggestions raised by the reviewer.

For (1), we have added a permutation test to compare chosen value coding across brain regions to Figure 1B. Details of this test have been added to the “Significance Testing using Cluster-Based Permutation Tests” subheading on the Methods section:

Significance Testing using Cluster-Based Permutation Tests” (Figure 1A, Figure 1B, Figure 3B and Figure 4).

To identify significant clusters of chosen value coding whilst correcting for multiple comparisons across time, cluster based permutation tests were used (Nichols & Holmes, 2002).

[…] The length of the longest cluster for each permutation was entered into the null distribution. The true cluster size was significant at the p<0.05 or p<0.01 level (corrected) if the true cluster length exceeded the 97.5th percentile or 99.5th percentile of the null distribution, respectively.”

This test highlighted that chosen value coding following reward onset was significantly stronger in OFC than DLPFC or ACC, which we now describe in the Figure 1 legend:

“OFC showed stronger chosen value correlates following reward onset than ACC and DLPFC (permutation tests; OFC v DLPFC, p = 0.0010, OFC v ACC, p = 0.0028; see Methods).”

For (2), we developed a similar permutation test to compare sustained coding between high and low time constant neurons. Details of this test have been included within the new section of the methods entitled “Comparing sustained coding between cortical areas and high/low time constant neurons (Figure 5)”.

“Comparing sustained coding between cortical areas and high/low time constant neurons (Figure 5)

To compare the sustained coding present from choice through reward delivery between different cortical areas, a permutation test was performed. […] A similar test was performed to compare high vs. low time constant neurons (i.e. to compare Figure 5C vs. Figure 5D); except in the permuted data, neurons were shuffled between high/low groups – as opposed to between different brain areas.”

Note that this direct comparison did not produce a statistically significant difference. We now report this in the text:

“Within OFC, this sustained population code appeared most prominent in the neurons with a high resting time constant τ (Figure 5C), but absent in those with a low τ (Figure 5D). Note, however, that this difference should be interpreted cautiously, as a formal comparison of cluster size within the high and low τ populations (using a non-parametric permutation test, see Methods) was not significant (p=0.59).”

For (3), we developed a similar permutation test to compare reactivation coding across cortical areas. Details of this test were also included within the new section of the methods entitled Comparing sustained coding between cortical areas and high/low time constant neurons (Figure 5)”.This test highlighted sustained coding was stronger in OFC than in DLPFC or ACC, which we now report in the Results section:

“Nonetheless, the sustained population code from choice through outcome was much stronger in OFC (Figure 5B-C) than in both the ACC and DLPFC populations (Figure 5—figure supplement 1 and 2; permutation tests, OFC v DLPFC, p = 0008; OFC v ACC, p < 0.0001; see Methods).”

5) For the cross-temporal correlation analysis, the authors draw a distinction between "sustained" and "reactivation" coding. But this distinction sometimes gets blurry. The evidence mainly supports reactivation coding, but the conclusion is that OFC is "maintaining a representation of chosen value until an expected outcome is experienced" (Results, last paragraph), which sounds more like sustained coding. Similarly, earlier (Results, fourth paragraph) the paper concludes that OFC codes value through the choice-outcome interval, but only a post-outcome epoch is actually tested.

On reflection, we agree with the reviewers that our use of the terms “reactivation” and “sustained” coding became confusing during the Results section. To clarify whether orbitofrontal cortex activity is better characterised as ‘sustained’ coding from choice through reward delivery, or alternatively a ‘reactivation’ code which re-emerges only in response to reward delivery, we extended our cross-temporal pattern analysis backwards in time to begin 1500ms prior to reward onset. We also extended our permutation tests to incorporate a ‘pre-outcome’ epoch (-1000ms to 0ms after reward onset), in addition to our existing ‘outcome’ epoch (0 to 1000ms after reward onset). We chose these epoch lengths to ensure the analysis was not contaminated by choice activity given the differing lengths of the cost (effort/delay requirement) epochs on different trials.

We found a strong cross-correlation within OFC activity that was present for the whole of our extended analysis epoch. There were large significant clusters of activity within the “Pre-Outcome” epoch (grey dashed area) which extended all the way until around 1500ms after reward onset (which is the end of the reward delivery period) in the “Outcome” epoch (black dashed area). We show these new results in the updated Figure 5.

Such sustained coding from pre- to post-outcome was absent in a similar analysis within the ACC and DLPFC populations (Figure 5—figure supplement 1 and 2). These results suggest that OFC exhibits “sustained” coding from choice through outcome, rather than “reactivation” coding, and that this sustained coding is unique to OFC. The Results section has been updated accordingly:

“Crucially, however, there was also evidence for sustained coding: the same neuronal population in OFC at choice encoded chosen value from at least 1000ms before outcome through to 1000ms after outcome (warm colours in Figure 5B, grey and black dashed boxes, permutation tests (see Methods), largest clusters p < 0.0001); such sustained coding of value from choice through outcome was absent within DLPFC (Figure 5—figure supplement 1A) and ACC (Figure 5—figure supplement 2A) neuronal populations. […] This demonstrates that OFC neurons with persistent activity at rest encode a “sustained” representation of chosen value until an expected outcome is experienced, and that this neural signature appears unique to OFC.”

6) The paper should at least briefly address the distinction between "chosen value" and what one might call "outcome value" – the size of the juice reward. These aren't identical, since chosen value also incorporates an effort/delay requirement. But they may be correlated. Can the authors rule out that OFC is merely encoding the juice magnitude in the outcome phase? That is, is there direct evidence it also encodes the (already completed) effort or delay requirement?

Thanks for raising this point; this crucial distinction was not addressed in the manuscript. We have now included evidence to show that OFC is encoding the chosen cost (i.e. the effort / delay requirement already completed) at the time of outcome – see new Figure 1—figure supplement 2. We used the same regression model as before, but split chosen value into a benefit and cost component.

“However, this was not the case at the time of outcome, where chosen value correlates predominated in OFC (Figure 1B). This value signal at outcome contained information about both the chosen benefit and chosen cost (Figure 1—figure supplement 2).”

https://doi.org/10.7554/eLife.18937.023

Article and author information

Author details

  1. Sean E Cavanagh

    Sobell Department of Motor Neuroscience, University College London, London, United Kingdom
    Contribution
    SEC, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-9275-2725
  2. Joni D Wallis

    1. Department of Psychology, University of California, Berkeley, Berkeley, United States
    2. Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, United States
    Contribution
    JDW, Conception and design, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  3. Steven W Kennerley

    1. Sobell Department of Motor Neuroscience, University College London, London, United Kingdom
    2. Department of Psychology, University of California, Berkeley, Berkeley, United States
    3. Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, United States
    Contribution
    SWK, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Laurence T Hunt
    For correspondence
    s.kennerley@ucl.ac.uk
    Competing interests
    The authors declare that no competing interests exist.
  4. Laurence T Hunt

    1. Sobell Department of Motor Neuroscience, University College London, London, United Kingdom
    2. Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
    Contribution
    LTH, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Steven W Kennerley
    For correspondence
    laurence.hunt@ucl.ac.uk
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-8393-8533

Funding

Middlesex Hospital Medical School General Charitable Trust (Graduate Student Fellowship)

  • Sean E Cavanagh

National Institute on Drug Abuse (R21-DA035209)

  • Joni D Wallis

National Institute of Mental Health (R01-MH097990)

  • Joni D Wallis

Wellcome Trust (096689/Z/11/Z)

  • Steven W Kennerley

National Institute of Mental Health (F32MH081521)

  • Steven W Kennerley

Wellcome Trust (098830/Z/12/Z)

  • Laurence T Hunt

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

LTH was supported by a Sir Henry Wellcome Fellowship from the Wellcome Trust (098830/Z/12/Z). JDW was supported by funding from NIMH R01-MH097990 and NIDA R21-DA035209. SWK was supported by NIMH (F32MH081521) and by a Wellcome Trust New Investigator Award (096689/Z/11/Z). SEC was supported by the Middlesex Hospital Medical School General Charitable Trust.

Ethics

Animal experimentation: Ethical approval was obtained for this study. All procedures were in accord with the National Institute of Health guidelines (Assurance Number A3084-01) and the recommendations of the U.C. Berkeley Animal Care and Use Committee (Protocol Number R283).

Reviewing Editor

  1. Michael J Frank, Reviewing Editor, Brown University, United States

Publication history

  1. Received: June 20, 2016
  2. Accepted: September 15, 2016
  3. Version of Record published: October 5, 2016 (version 1)

Copyright

© 2016, Cavanagh et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,986
    Page views
  • 407
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Giles L Colclough et al.
    Research Article
    1. Human Biology and Medicine
    2. Neuroscience
    Philipp Janz et al.
    Research Article