1. Neuroscience
Download icon

Capturing the temporal evolution of choice across prefrontal cortex

  1. Laurence T Hunt  Is a corresponding author
  2. Timothy EJ Behrens
  3. Takayuki Hosokawa
  4. Jonathan D Wallis
  5. Steven W Kennerley
  1. University College London, United Kingdom
  2. Oxford University, John Radcliffe Hospital, United Kingdom
  3. University of California, Berkeley, United States
  4. Tohoku University, Japan
Research Article
  • Cited 20
  • Views 3,631
  • Annotations
Cite this article as: eLife 2015;4:e11945 doi: 10.7554/eLife.11945

Abstract

Activity in prefrontal cortex (PFC) has been richly described using economic models of choice. Yet such descriptions fail to capture the dynamics of decision formation. Describing dynamic neural processes has proven challenging due to the problem of indexing the internal state of PFC and its trial-by-trial variation. Using primate neurophysiology and human magnetoencephalography, we here recover a single-trial index of PFC internal states from multiple simultaneously recorded PFC subregions. This index can explain the origins of neural representations of economic variables in PFC. It describes the relationship between neural dynamics and behaviour in both human and monkey PFC, directly bridging between human neuroimaging data and underlying neuronal activity. Moreover, it reveals a functionally dissociable interaction between orbitofrontal cortex, anterior cingulate cortex and dorsolateral PFC in guiding cost-benefit decisions. We cast our observations in terms of a recurrent neural network model of choice, providing formal links to mechanistic dynamical accounts of decision-making.

https://doi.org/10.7554/eLife.11945.001

eLife digest

In 1848, a railroad worker named Phineas Gage suffered an accident that was to secure him a place in neuroscience lore. While constructing a new railway line, a mistimed explosion propelled an iron bar into the base of his skull, where it passed behind his left eye before exiting through the top of his head. Gage survived the accident, but those who knew him reported significant changes in his personality and behaviour.

Gage’s ability to make decisions was particularly impaired by his injury. Decision-making involves weighing up the costs and benefits associated with alternative courses of action. It entails looking into the future to decide whether an anticipated reward will justify the effort or expense necessary to obtain it. This process is dependent on a region of the brain called the prefrontal cortex, the area that sustained the most damage in Phineas Gage.

While many studies have shown correlations between activity in particular parts of prefrontal cortex and the outcome of decisions, little is known about how this activity evolves over time as a decision is made. To explore this process, Hunt et al. trained macaque monkeys to choose between pairs of images that were associated with specific rewards (quantities of fruit juice) and costs (either amounts of work or fixed delays).

Electrode recordings revealed changes in prefrontal activity that varied over time as the monkeys deliberated over each pair of images, choosing for example between a large reward after a long delay versus a smaller reward immediately. This activity was consistent with a mathematical model of decision-making, which also explains data from brain imaging experiments in humans. This provides an important link between human data and electrode recordings in animals.

However, some of the patterns of activity observed in both macaques and humans appeared to reflect the speed at which decisions were made, rather than the outcome of the decisions themselves. By extracting information about decision speed on each decision from each region, it was shown that communication between regions of prefrontal cortex changes when choices are between two different amounts of work, as opposed to two different delays. Further experiments are needed to explore this phenomenon and to determine how other brain regions interact with the prefrontal cortex to support the decision-making process.

https://doi.org/10.7554/eLife.11945.002

Introduction

Correlates of decision variables are routinely found in prefrontal cortex (PFC) during value-guided decision making (Clithero and Rangel, 2014; Kennerley and Walton, 2011). They have been richly described using static, economic models of choice (Glimcher and Fehr, 2014). Neuroeconomic accounts explain firing rates of single neurons or human neuroimaging data in terms of experimental variables that motivate choice behaviour. These include the magnitude or likelihood of available reward, and the costs involved in obtaining those rewards. By forming neural representations of such quantities, it is argued that the brain can use these representations to guide selection of the most valuable alternative. Our present understanding of economic decision formation has been founded upon careful study of neural representations of value and how they differ across PFC subregions (Glimcher and Fehr, 2014; Rushworth et al., 2012; Rangel and Hare, 2010).

However, an alternative perspective on the origins of neural representations during decision making stems from choice models from mathematical psychology (Busemeyer and Townsend, 1993). Such models seek to explain decisions in terms of their temporal evolution or dynamics. Whilst originally accounting for temporally varying features of behaviour, such as reaction times and eye gaze (Busemeyer and Townsend, 1993; Ratcliff and Rouder, 1998; Krajbich et al., 2010), they have also successfully captured temporally varying features of neural activity, most notably in the lateral intraparietal cortex (LIP) during perceptual choice (Shadlen and Newsome, 2001; Gold and Shadlen, 2007). These psychological models can also be related to other dynamical models (Bogacz et al., 2006) such as nonlinear attractor network models firmly rooted in neurobiology (Wang, 2002). The common feature of both psychological and neurobiological accounts is that they focus on the temporal evolution of a decision signal across time, rather than the strength with which it represents decision variables at a particular fixed point in time.

Contrasting the two perspectives, it becomes apparent that even if neuronal activity underwent the exact same physiological process on different trials (such as a ramp-to-threshold), it might nevertheless appear to represent certain economic features of a decision. Put simply, as the decision unfolds, neural activity will be higher on trials that ramp quickly than on those that ramp slowly. It will therefore correlate with any variable that affects the decision speed, and this effect will appear most prominent at timepoints in the trial when the ramping process, or rate of change, is maximal. Even in more complex dynamical situations, it is easy to see how economic variables that predict the rate of change would appear to be represented in neural activity if analysed without knowledge of the underlying dynamics. This might particularly be true of the value of the chosen item (‘chosen value’), which strongly affects behavioural reaction times in economic choice tasks.

This should then influence how we interpret the meaning of such activity (Hunt, 2014; O’Doherty, 2014). The neuroeconomic perspective often labels chosen value representations as a ‘post-decision’ signal, arguing that chosen value signals do not reflect the decision competition itself but instead the outcome of the comparison process (Cai and Padoa-Schioppa, 2012; Blanchard and Hayden, 2014). One interpretation of such representations is that they are needed for subsequent computation of a reward prediction error (Rangel and Hare, 2010). Yet the dynamical perspective proposes that they may in fact originate as a consequence of time evolving decision processes. Rather than casting a chosen value representation as signaling ‘pre-decision’ or ‘post-decision’ variables to downstream brain areas, it argues that such correlates inevitably emerge as a decision is made. In its most extreme form, this hypothesis might contend that chosen value signals would be fully accounted for by variation in underlying decision rates.

To assess this proposal more carefully, it becomes critical to index how neural dynamics unfold on individual trials. In PFC, tackling this problem provides unique challenges. Several recent studies have recovered single-trial dynamical information in structures close to motor output, based upon neuronal spiking data (Thura and Cisek, 2014; Kaufman et al., 2015; Murakami et al., 2014; Bollimunta et al., 2012; Kiani et al., 2014; Carnevale et al., 2015). Yet the distance of PFC from either sensory input or motor output produces neural activity that is poorly aligned with simple features of the experimental task (Rigotti et al., 2013). Attempts to understand PFC activity should therefore extract its dynamical state based on neural information alone, rather than first sorting spike trains by motor output or experimental variables as is common in other approaches. It is also unclear whether neuronal firing rates are in fact the most reliable source of single-trial information. The esotericism of PFC neuronal responses is compounded by a high degree of stochasticity in neuronal spike trains, delivering statistical challenges to obtaining single-trial information (Churchland et al., 2007; Park et al., 2014). Current techniques sometimes overcome this problem by examining tens or hundreds of simultaneously recorded single neurons (Churchland et al., 2007). Whilst this can be fruitful, such data is rarely available in humans, where the contribution of PFC to value-guided choice is most critical.

In the present study, we therefore sought an alternative index of single-trial neural dynamics that overcame these limitations. We focussed on observations at the mesoscopic scale of the local field potential (LFP). One advantage of this approach is that it can be easily related to simultaneously recorded neuronal spike trains on a trial-by-trial basis in animals, but also underlies the magnetoencephalography (MEG) signal observable in humans (Buzsáki et al., 2012). As such, we could recover dynamical information from electrophysiology recordings in macaque monkeys and also from non-invasive magnetoencephalography (MEG) recordings in humans. This provides an important bridge between observations at microscopic (cellular) and macroscopic (whole-brain) scales. Importantly, our index provides information about the speed of a local internal neural decision process that goes beyond a simple behavioural measure of reaction time.

Using this index, we could draw several new conclusions about PFC activity during choice. We first demonstrate a temporal evolution from action value difference to chosen action signals that are selectively present in dorsolateral prefrontal cortex (DLPFC) neurons, but not anterior cingulate cortex (ACC) or orbitofrontal cortex (OFC). This contrasts against the ubiquitous encoding across all three subregions of ‘chosen value’. However, we then demonstrate that correlates of chosen value may arise as a consequence of dynamical processing, as opposed to being purely represented as a neuroeconomic post-decision variable. Next, by simultaneously indexing decision formation across multiple PFC subregions, we show that functional interactions can be reshaped in a task-dependent manner. We find that OFC and ACC dynamics selectively influence dorsolateral DLPFC activity on delay- and effort-based decisions respectively. Finally, our observations can be related to predictions from a dynamical neural network model of choice. This provides formal links to established dynamical mechanisms underlying perceptual decision-making.

Results

We examined a study of cost-benefit decision making in which single neuron firing and local field potentials (LFPs) were recorded simultaneously from three PFC subregions fundamental to value-guided choice in four macaque monkeys: orbitofrontal cortex (OFC), anterior cingulate cortex (ACC) and dorsolateral prefrontal cortex (DLPFC). On each trial, subjects chose between two pictures that led to a certain reward magnitude (quantity of fruit juice) after paying a certain cost (Figure 1A). On half of trials this cost was physical effort needed to obtain reward, and on half of trials it was delay to reward. Subjects were overtrained on the cost and benefit paired with each picture. Their choices were well described by a linear trade-off between reward and cost combined with a softmax choice function (see Hosokawa et al., 2013 for detailed behavioural modelling). This allows a straightforward definition of ‘value’ used in the analyses below (see methods). Throughout the paper, we analyse neuronal firing and LFP during a 1s choice phase, when subjects fixated centrally whilst both pictures were on screen. After this 1s period, a go cue appeared, instructing subjects to saccade to their preferred picture to indicate their choice.

Figure 1 with 3 supplements see all
Time-varying value correlates in single units and LFP during choice.

(A) Subjects chose between two pictures of differing value (reward and cost (physical effort, or delay)) by saccade. Neural activity was examined during the 1s choice epoch, whilst monkeys held fixation. (B) Correlates of decision variables in DLPFC single units (n=303). Coefficient of partial determination for multiple regression of left minus right action value (blue), chosen value (green) and left minus right choice (red) onto single neuron activity. Lines show mean /- s.e. across all recorded neurons. See also Figure 1—figure supplement 1. (C) Early action value difference coding predicts late chosen action coding. Z-scored regression coefficients for left minus right action value at 300 ms (ordinate) are plotted against Z-scored regression coefficients for left minus right choice at 700 ms (abscissa), for each DLPFC neuron (R=0.42; p=3.6*10-14). (D) Baseline-corrected ERP during choice epoch, in example subject, split by region. Lines denote mean /- s.e. across n = 125 (DLPFC)/n = 43 (OFC)/n = 85 (ACC) electrodes. Other subjects plotted in Figure 1—figure supplement 2. Vertical dashed lines are added to allow latency comparison with (E) and (F). (E) Z-statistic of regression of LFP data from example subject onto chosen and unchosen value. Note that at timepoint (i), overall value =chosen+unchosen; at timepoint (ii), value difference = chosen-unchosen. Lines denote mean /- s.e. across electrodes. In Figure 1—figure supplement 3, figure is split into different brain regions and cost/benefit. (F) Temporal derivative of evoked potential in part (D) (averaged across regions). Comparing parts (D), (E) and (F) shows that value correlates at timepoints (i) and (ii) occur when the LFP is ramping (derivative is non-zero), rather than peaking (derivative is near or at zero).

https://doi.org/10.7554/eLife.11945.003

Value correlates vary across time in both single units and LFPs

We first adopted a classical approach to analysing our data, examining features of the task represented by neural activity. Using multiple regression on single neuron firing rates, we observed a dynamic (time-varying) signature of value correlates in the DLPFC (n=303 neurons) (Figure 1B). Early in the trial, value correlates were found in the reference frame of action value difference (i.e. left minus right saccade value influenced neural firing; Figure 1B, blue), but late in the trial these evolved into correlates of the eventual chosen saccade (left vs. right chosen saccade influenced neural firing; Figure 1B, red) (Kim et al., 2008; Louie and Glimcher, 2010). DLPFC neurons that showed strong selectivity for left minus right saccade option value 300 ms after decision onset also showed strong selectivity for left minus right choices 700 ms after decision onset (Figure 1C). This temporal evolution from action value difference to categorical choice was selectively present in DLPFC, but not ACC (n = 321 neurons) or OFC (n = 212 neurons) (Figure 1—figure supplement 1). Notably, however, all three regions showed correlates of chosen value with a similar timecourse (Figure 1B, Figure 1—figure supplement 1).

We then applied the same approach to consider dynamics of LFP value correlates (electrodes: DLPFC, n=208; ACC, n=207; OFC, n=146). The overall shape of the choice phase evoked LFP was surprisingly consistent across PFC subregions (Figure 1D) and subjects (Figure 1—figure supplement 2). All areas contained a fast biphasic component shortly after sensory input, and a slower component lasting several hundred milliseconds after this. Because of this similarity, we collapsed across regions (see Figure 1—figure supplement 3 for regions separately), and regressed LFP amplitude onto both chosen value and unchosen value across time. This revealed two timepoints at which correlates of value emerged (Figure 1E). Early in the trial (~250 ms) value correlates were positive for both chosen and unchosen value, but later in the trial (~450 ms) value correlates for unchosen value flipped to become negatively signed. This implies a temporal evolution from an early representation of value sum (=chosen+unchosen value) to a later representation of value difference (=chosen-unchosen value). This progression is notably similar to observations made in human PFC during value-guided choice using MEG (Hunt et al., 2012). Crucially, these value correlates were maximal when the event-related LFP was ramping, not peaking (that is, when the temporal derivative of the LFP was large [Figure 1F]). This suggests chosen and unchosen value influence the rate at which LFP dynamics unfold on each trial, rather than being explicitly represented by the amplitude of an evoked response.

As revealed by these classical analyses, value correlates vary across time at both microscopic (single unit) and mesoscopic (LFP) scales. Yet it remains unclear how we might bridge these dynamics at different scales. Below, we demonstrate that forming such bridges is of fundamental importance, especially if it can be achieved at the level of individual trials. It allows us to interrogate the relationship between neural dynamics and behaviour, the encoding of neuroeconomic variables, and how brain regions interact during choice. Moreover, by applying to the same approach to human data, it links observations in human MEG recordings with their underlying neuronal counterparts.

Extracting single-trial information about choice dynamics from event-related LFP

One approach to bridging across data scales would be to extract single-trial information about choice-related dynamics from one scale, and use it to explain variance in the other. The evoked LFP appeared consistent across different recording electrodes and sessions (Figure 2—figure supplement 1), and possessed a high single-trial signal to noise ratio (Figure 2—figure supplement 2). We therefore pursued a data-driven index of single-trial dynamics from LFP data.

The approach we adopted was to apply principal components analysis (PCA) across trials (see Methods). For each subject, the input data for macaque PCA was decision-locked raw LFP data, stacked across all electrodes from all recording sessions to form a large matrix X. X has dimensions nSingleTrials by nTimebins (Figure 2A). PCA decomposes X into temporal principal components V, with dimensions nTimepoints by nComponents, and component weights U, with dimensions nSingleTrials by nComponents (Figure 2A). The principal components (PCs) in V provide a set of temporal basis functions (Figure 2B) that capture the principal modes of variation in the shape of the waveform across trials. The single-trial weights in U tell us how much of each PC is present in each single-trial response. They are returned separately for each electrode. Because data are stacked, however, the decomposition has the same meaning across different electrodes and recording sessions.

Figure 2 with 5 supplements see all
Extraction of internal dynamics from LFP data via principal components analysis (PCA).

(A) A large matrix of single-trial data is formed by stacking data across all recordings. This was performed separately within each subject. Consistent evoked potentials were found across recording electrodes (Figure 2—figure supplement 1) and trials (Figure 2—figure supplement 2). The single trial-weights U are returned separately for each trial on each electrode, but their interpretation is determined by the shape of the components in V, and so is common to all electrodes. (B) The first and second principal components of this matrix, for example macaque subject N. (C) The shapes of PC1 and PC2 are consistent across subjects. (D) Left panel: the effect of adding/subtracting PC1 to the main ERP shape in example subject N, modulating its amplitude; right panel: the effect of adding/subtracting PC2 to PC1, modulating its latency. As shown in Figure 2—figure supplement 3, this shift in latency can also be related to changes in low-frequency oscillatory phase during the decision period. (E) The influence of chosen value, unchosen value and error trials on PC2 scores (i.e. the second column of matrix V), estimated via multiple regression. Bars show regression coefficients (a.u.), mean /- s.e. across electrodes (macaque). ** denotes p<0.01, one-sample T-test. Effects are shown separately for monkey K (who saccaded freely during choice epoch, and indicated response using joystick) in Figure 2—figure supplement 4. Influence of variables on PC1 scores shown in Figure 2—figure supplement 5.

https://doi.org/10.7554/eLife.11945.007

The top two PCs were strikingly consistent across all four macaque subjects (Figure 2C). Importantly, they were readily interpretable in terms of their effects on LFP dynamics. PC1 (Figure 2B, left panel) resembled the basic shape of the event-related field potential (Figure 1D). Adding or subtracting PC1 therefore captures variation across trials in response amplitude (Figure 2D, left panel). More notable, however, was PC2, which was relatively flat at the beginning of the decision, but from 200 ms onwards resembled the temporal derivative of PC1 (Figure 2B, right panel). Adding or subtracting the temporal derivative of a waveform controls the latency at which it peaks (Figure 2D, right panel) (Friston et al., 1998; Mayhew et al., 2006). Knowledge about PC2 weights therefore provides a parsimonious description of single-trial ERP latencies, a key feature of the dynamical ERP response. Consistent with this idea, PC2 weights were also found to correlate with the phase of low frequency (theta frequency [4-–8 Hz]) oscillations during the decision period (Figure 2—figure supplement 3).

In light of this, we hypothesised that factors modulating reaction time in value-based choice studies (Busemeyer and Townsend, 1993) would affect the weight of PC2, controlling the latency of the LFP waveform. We found chosen value had a strong and consistent effect on PC2 across all regions studied. When chosen value was higher, PC2 was more positive, implying the waveform peaked earlier in time (Figure 2E; Figure 2—figure supplement 4). On ‘error trials’, where the subject chose the less valuable option, PC2 weights were more positive than on correct (Figure 2E). Under the assumption that higher value trials, and error trials, were associated with faster decision dynamics (Busemeyer and Townsend, 1993; Ratcliff and Rouder, 1998), PC2 weights therefore provide a neurally-derived index of the speed of the decision on each trial. Crucially, however, they are obtained separately for each individual electrode. They therefore provide a local measurement of neural dynamics, beyond a simple behavioural measure of reaction time. PC1 weights, capturing response amplitude rather than latency, were primarily influenced by value sum (same sign for both chosen and unchosen value, with the exception of ACC) (Figure 2—figure supplement 5).

Internal single-trial dynamics explain neural activity over and above external variables

We next considered whether our internal LFP-derived index might explain features of neural firing that we could not previously explain using external experimentally-derived variables. We explored this idea using multiple regression. We regressed experimental factors onto neural firing rates (see Supplementary file 1 for full list) to capture variance explained by these factors, as in Figure 1B. However, we also included as a coregressor the single-trial PC2 weight for that trial, estimated from the decomposition of the LFP. This allows us to examine the influence of PC2 weights on neuronal firing, having controlled for the contribution of all external decision variables. To avoid contamination between spikes and LFP, we used LFP data recorded simultaneously from a neighbouring electrode in the same cortical area. (Note that reaction time is not included as an additional measure of trial-by-trial decision dynamics in our task, as the imposed 1s choice delay prior to response led to a floor effect in subjects’ response times.)

Figure 3A and B shows two individual neurons from DLPFC that exemplify the action value-to-choice transformation depicted in Figure 1B/C. As can be seen, variance in their firing is explained by the action value difference early (~200-–400 ms), and the chosen action late (~600–800 ms) in the decision process. In both cases, the LFP-derived PC2 single trials weights explain additional variance as this value-to-choice transformation occurs (Figure 3A/B). Across the population, PC2 had a similar effect at this point in time (Figure 3C). 44.9% of neurons in DLPFC were found to have a significant modulation by LFP-derived PC2 weights during the choice epoch (23.4% positively modulated, 21.5% negatively modulated, p<0.01 in a 250 ms–750 ms window, corrected for multiple comparisons across time). This finding forms a link between choice dynamics at mesoscopic and microscopic scales, over and above that which can be obtained from examining time-varying value correlates (as in Figure 1). Additional analyses confirmed that this relationship could not be explained as a simple consequence of first-order correlations between LFP amplitude and firing rate (Whittingstall and Logothetis, 2009).

Single unit firing in DLPFC explained by simultaneously recorded LFP dynamics, over and above contribution from experimental variables.

(A)/(B) Two example neurons from DLPFC each showing a transition from encoding action value difference (blue) to later encoding the selected action (red), and influenced by the LFP-derived PC2 single trial weights (cyan) in the intervening period. The coefficient of partial determination (CPD) is plotted for all three factors (see Methods for full list of other task-related variables included in regression model). (C) Timecourse of population CPD explained by LFP-derived PC2 weights (for n=205 DLPFC neurons that had a simultaneously recorded LFP on a separate DLPFC electrode). Lines show mean /- s.e. across neurons.

https://doi.org/10.7554/eLife.11945.013

Having formed this link between our measure of single-trial LFP dynamics and neural firing, we could then address several questions concerning the roles of these dynamics in value-guided decision making.

Influence of single-trial LFP components on single unit chosen value correlates

The representation of chosen value is found in many neural structures during choice, but its interpretation has remained unclear. It has been pointed out that this is a ‘post-decision’ variable, and several suggestions have been offered for why this might need to be encoded (Rangel and Hare, 2010). In our study, the timecourse of this signal (found across all three cortical areas, Figure 1—figure supplement 1) appears similar to the timecourse of influence of the LFP PC2 weights on neuronal firing (Figure 3C). We sought to explore the relationship between chosen value correlates identified in single unit firing and our single-trial indices of ERP amplitude (PC1) and latency (PC2).

To address this, we reanalysed the findings in Figure 1B, but asked whether including the top two LFP principal components in our regression model selectively reduced the variance explained by chosen value, but not by action value difference or chosen action (see Methods). As a control, we compared this model to one where two noise components (PC101 and PC102) from the LFP PCA were included as coregressors instead of PC1 and PC2. We found that including the LFP-derived PC1/PC2 as coregressors caused a significant reduction in chosen value coding, but not of action value difference or chosen action coding (Figure 4; Figure 4—figure supplement 1). Similar results could also be obtained via an alternative approach, in which instead of using noise components, principal components were shuffled across trials in a fashion that preserved their underlying correlation with chosen value, or by examining the contribution of PC1 or PC2 alone (Figure 4—figure supplement 2).

Figure 4 with 3 supplements see all
Chosen value, but not action value difference or chosen action, is explained away by neural dynamics.

CPD for chosen value (green) is reduced by including PC1/PC2 from the LFP decomposition as coregressors in the decision model. The reduction in CPD (n=205 DLPFC neurons) for each of the decision variables in Figure 1C as a consequence of including PC1/2 is shown, by subtracting it from a control model that included two noise components (PC101/102). Note that the maximal value that this reduction can take is determined by the CPD for each regressor at each point in time (shown in Figure 1B). Lines show mean /- s.e. across neurons. Dots denote timepoints with a significant (p<0.05, permutation test) change in CPD across the DLPFC population. See also Figure 4. – figure supplement 1 for OFC/ACC, and Figure 4—figure supplement 2 for the contribution of PC2 alone. Figure 4—figure supplement 3 compares the effect of local DLPFC PC weights with respect to those from a distal, simultaneously recorded brain region (i.e. either OFC or ACC).

https://doi.org/10.7554/eLife.11945.014

We then asked whether chosen value coding was reduced more by local (within-region) neural dynamics, compared to global (whole-brain) dynamics. We found a smaller but significant reduction could also still be found even if these ‘local’ principal components (i.e. from the same brain region) were orthogonalised with respect to those of another, simultaneously recorded brain region, when compared to performing the same analysis in reverse (Figure 4—figure supplement 3). This implies that some of the neuronal variance attributed to chosen value correlates originates as a consequence of the speed and amplitude at which dynamics unfold locally within a particular cortical area, and demonstrates the utility of our single-trial index for capturing these local dynamics.

OFC and ACC dynamics have distinct influences on delay- and effort-based decisions respectively

ACC and OFC have established roles in effort- and delay-based decisions respectively (Rudebeck et al., 2006; Prevost et al., 2010; Croxson et al., 2009; Kurniawan et al., 2013; Kable and Glimcher, 2007; Parvizi et al., 2013). Because our LFP decomposition indexes decision dynamics locally, we hypothesised that the internal dynamics of ACC and OFC might preferentially influence activity in DLPFC on different trial types. Specifically, we predicted that firing rates in DLPFC would be preferentially affected by ACC internal dynamics on effort-based trials, but by OFC internal dynamics on delay-based trials. We could test this hypothesis by examining sessions in which DLPFC, ACC and OFC were all recorded simultaneously. We built a regression model in which LFP-derived PC2 weights from OFC and ACC competed for variance in explaining neural firing in DLPFC (see Methods). We estimated this separately on delay and effort trials. Both areas were found to influence DLPFC activity (Figure 5—figure supplement 1), but strikingly, ACC explained more variance in DLPFC firing on effort trials than delay trials (Figure 5A, magenta), whereas the converse was true for OFC (Figure 5A, black). This was not found to be true for analyses performed in the reverse direction (using DLPFC/ACC PC2 weights to explain OFC firing, or DLPFC/OFC PC2 weights to explain ACC firing).

Figure 5 with 1 supplement see all
ACC and OFC local dynamics explain greater DLPFC neural firing on effort-based and delay-based decisions, respectively.

(A) The relative CPD for DLPFC firing explained by PC2 on effort trials minus delay trials, where PC2 is derived from simultaneously recorded ACC LFP (magenta) and OFC LFP (black) (n=124 DLPFC units which had simultaneous recordings in both OFC and ACC). ACC explains more variance on effort than delay trials (positive-going values), whereas the converse is true for OFC (negative-going values). See also Figure 5—figure supplement 1. Lines show mean /- s.e. across neurons. Dots denote timepoints with a significant (p<0.05, permutation test) change in CPD across the DLPFC population. (B) The same analysis as in Figure 5A, having first performed a median split for chosen value-selectivity. Neurons with high chosen value selectivity (top panel, n=62 DLPFC units) show the effect found in Figure 5A, whereas neurons with low chosen value selectivity (bottom panel, n= 62 DLPFC units) do not.

https://doi.org/10.7554/eLife.11945.018

Notably, OFC and ACC PC2 weights modulated DLPFC neuron firing from around 200 ms after choice onset (Figure 5—figure supplement 1), consistent with the time at which DLPFC neurons first begin to encode choice values (Figure 1B). These effects on DLPFC firing were maintained for several hundred milliseconds. Critically however, just prior to when response coding in DLPFC was peaking (around 800 ms, see Figure 1B), the contribution of ACC and OFC PC2 to DLPFC neuron firing changed in a cost-specific way. OFC PC2 began to selectively explain spiking variance on delay trials, whilst ACC began to explain spiking variance on effort trials (Figure 5—figure supplement 1).

We also found that when we performed a median split on DLPFC neurons by the degree to which they encoded chosen value in the analysis shown in Figure 1B, neurons with high chosen value selectivity were also those preferentially influenced by other regions’ dynamics (Figure 5B). If DLPFC chosen value coding is interpreted as reflecting the speed at which a decision process unfolds, then these findings imply that the influence of one region’s internal dynamics on another region’s dynamics can be flexibly reshaped according to current task demands. This was not true, by contrast, when a median split was performed based upon the response selectivity of the DLPFC neuronal population.

Similar single-trial choice dynamics can be obtained from human MEG data

The LFP value correlates in Figure 1E showed similar temporal profiles to a previous study in which human subjects made binary value-guided choices whilst undergoing MEG (Hunt et al., 2012). We therefore asked whether the MEG signal from this study, in the PFC subregion where this temporal profile had been observed (ventromedial prefrontal cortex, MNI 6, 28, –8 mm), could also be subjected to a similar single-trial PCA decomposition as our LFP data. Each subject provides a single ‘virtual electrode’ from which observations are made, and so data were stacked to form the matrix X with dimensions nSingleTrials ([=nTrials*nSubjects]) by nTimebins (see Methods). This was then decomposed as for the macaque data. We found a similar relationship between PC1 and PC2 (Figure 6A) controlling waveform amplitude and latency (Figure 6B). Moreover, PC2 had a similar relationship to chosen value and error trials as in macaques (Figure 6C). Finally, in this experiment, we also had a direct behavioural readout of decision formation on every trial as subjects could respond at any time after decision onset. Subject reaction times were strongly negatively predictive of PC2, over and above any contribution of chosen value, unchosen value or errors (Figure 6C). Our observations here provide a link between the dynamics of choice at the mesoscopic scale in macaques and its macroscopic counterpart in humans.

PCA decomposition of human MEG data shows similar characteristics to decomposition of monkey LFP data.

(A) Averaged evoked response from data beamformed to ventromedial prefrontal cortex (MNI coordinate (6, 28, -8mm)), from a previous study of value-guided decision making (Hunt et al., 2012; 2013). Lines show mean /- s.e. across trials. (B) PCA decomposition of human MEG data yields two principal components similar to those found in macaque LFP data (cf. Figure 2C). (C) The influence of chosen value, unchosen value and error trials on PC2 scores, estimated via multiple regression, in humans. Effects are similar to those found in macaque PFC (cf. Figure 2E). Also shown, in yellow, is the additional effect of reaction time (orthogonalised with respect to chosen value, unchosen value and error trials). Bars show mean /- s.e. across trials; ** denotes p<0.01, one-sample T-test.

https://doi.org/10.7554/eLife.11945.020

Several features of the data are explained by competition via mutual inhibition

The transition in value correlates observed at the macroscopic scale in human MEG can be explained with reference to a class of recurrent neural network models displaying competition via mutual inhibition (Hunt et al., 2012). Such models have success in explaining many features of neural data in perceptual choice (Shadlen and Newsome, 2001; Wang, 2002), focusing on how a decision might be implemented locally within a single cortical area of interest. We therefore asked whether these models might explain some of our observations at the microscopic (single-neuron) level, and whether a similar decomposition approach can be applied to mesoscopic (LFP) predictions from the network model as to our data.

We simulated a spiking attractor network model of choice, configured with the same connectivity structure and parameters as had been used in a previous study of perceptual decision-making (Wang, 2002) (Figure 7A; see also ‘Supplementary details of network modelling’). We analysed the model predictions as we had the data, regressing action value difference, chosen value and chosen action onto single-neuron firing rates. Strikingly, value correlates in the network model possessed the same temporal profile as in DLPFC neurons (compare Figure 1B with Figure 7B). One discrepancy was the more sustained chosen value signal in the network model than in the data (which may result from inputs to the model persisting even after an attractor basin has been reached). We then ran a similar dimensionality reduction on summed network activity, a proxy for the model’s LFP predictions, as we had done on both macaque LFP and human MEG data. By contrast with the LFP data, we obtained a single principal component that controlled both waveform amplitude and latency (see ‘Supplementary details of network modelling’, and Figure 7—figure supplement 1). This suggests that within the network model there is covariation between these two features of the simulated ERP waveform, whereas in the data they are orthogonal. However, this principal component correlated with value in a similar fashion to both macaque and human data (compare Figure 2E/6C with Figure 7C). We then regressed the principal component back onto single unit firing rates, as in Figure 3. We found a similar timecourse of influence of the model’s principal component on single unit firing to that found in the data (compare Figure 3C with Figure 7D, cyan). Moreover, we found that the internal dynamics of the model extracted from the PCA explained chosen value coding in a similar fashion to that observed in data (Figure 4), causing a marked reduction (compare Figure 7B with Figure 7D, green). Our observations in this study may therefore be explained mechanistically via a simple attractor network model of choice, which similarly captured LFP dynamics in our previous human MEG study (Hunt et al., 2012). Further details of network modelling are provided in Supplementary Information.

Figure 7 with 1 supplement see all
Relationship between value, LFP and single units explained by competition via mutual inhibition.

(A) Model schematic. A and B units receive value-related inputs, integrate these via recurrent excitation, and competition via mutual inhibition. See ‘Supplementary Details of Network Modelling’ for details. (B) Correlates of decision variables in attractor network model single unit activity. Regression coefficient for option value difference (blue line), chosen value (green line) and chosen option (red line) onto firing rates of ‘A’ selective units in the network model. Compare with Figure 1B: in DLPFC, ‘A’ selective units are hypothesised to correspond to left selective units, and ‘B’ units to right selective units. (C) Regression of decision variables onto PC1 from PCA-decomposed LFP model predictions, as in figure 2e/f. Note that in model, PC1 captures variability in decision latencies, not PC2 as in data (see Figure 7—figure supplement 1). (D) Correlates of decision variables when model LFP PC1 is included as coregressor. LFP PC1 from model explains firing rates with a similar timecourse to LFP PC2 from data (Figure 3C), but explains away much of the contribution of chosen value to single unit firing (Figure 4).

https://doi.org/10.7554/eLife.11945.021

Discussion

Many previous studies (Kim et al., 2008; Daw et al., 2006; Padoa-Schioppa, 2013; Hampton et al., 2006; Padoa-Schioppa and Assad, 2006), including our own (Kennerley et al., 2011), note the representation of chosen value in multiple brain structures during choice. Why this quantity is encoded so widely has been a matter of debate (Rangel and Hare, 2010). It is important to remember that this encoding comes from the viewpoint of the experimenter, seeking to describe variability across trials at a particular timepoint in the trial or decision process. As we have shown here, and would be predicted from previous behavioural studies (Busemeyer and Townsend, 1993; Ratcliff and Rouder, 1998; Krajbich et al., 2010), decision processes are dynamic and decision formation inherently exhibits variability across trials. This means that what appears to the experimenter as encoding of a decision related-signal may be a consequence of another ongoing process. To paraphrase a recent observation (Cisek, 2006), “the role of a decision-making system is to produce decisions, not to describe them.”

Decision-making systems accumulate evidence in favour of different alternatives across time, and generate a categorical choice. This has been most carefully considered in integrate-to-bound models of perceptual choice and their neural correlates (Gold and Shadlen, 2007). Crucially, the time at which sensory evidence is maximally encoded in a neural integrator is not at the beginning of the decision (when no integration has occurred), nor at the end (when the bound has been reached on all trials), but in the middle of the decision process – when on some trials the bound has been reached, whereas on others little net evidence has been accumulated for each alternative. It can be very difficult in practice to infer when a decision begins or ends, and this poses difficulties when labelling a neural correlate as ‘pre-decision’ or ‘post-decision’. The dynamical perspective instead explains certain correlates as emerging as the decision is formed. These signals may still remain of functional significance for other computations (such as subsequent computation of the reward prediction error [Rangel and Hare, 2010]), but our perspective on how they are generated is changed.

Value-guided choices are, like perceptual decisions, a dynamical process (Busemeyer and Townsend, 1993; Krajbich et al., 2010; Summerfield and Tsetsos, 2012). This brings into focus the potentially important role played by decision dynamics in generating correlates of chosen value. In the present study, we found that in DLPFC correlates of chosen value occurred maximally during the transformation from an initial representation of evidence (action value difference) to an eventual representation of choice (chosen action) (Figure 1B). A similar temporal progression could also be found in the network model of competition via mutual inhibition (Figure 7B). Crucially, however, a portion of the variance captured by chosen value was then explained away by including the speed and amplitude of the local LFP response on that trial as a coregressor, estimated via PCA decomposition of the LFP (Figure 4).

It is important to note that there are some caveats to this interpretation. Firstly, only some, not all, of the variance was removed by including PCA components as coregressors. As such, it may be the case that there is coding of chosen value during the decision that is not explained by the amplitude and speed of the evoked response during decision formation. However, despite the reasonable signal-to-noise ratio observed in LFP data, single-trial PC weights will likely contain a degree of observation noise. Hence our estimates likely form a lower bound on how much of the chosen value signal might be explained away by LFP dynamics. To address this question further, future studies might seek to estimate the degree of observation noise in estimating dynamics, and the theoretical limit that this imposes on how much variance could be explained in neural firing. A further important caveat is that chosen value correlates may be explained by different mechanisms at different points in the trial. LFPs are often interpreted as measurements of local synaptic input, and this may imply that our principal components mediate chosen value representations at the level of single units. It is also clear from Figure 1—figure supplement 1 that correlates of chosen value are present in single unit firing until the end of the decision epoch. Future work may investigate the origin and functional significance of this persistent chosen value coding, perhaps for use at later task stages.

Our single-trial index also provides critical information about how dynamics simultaneously unfold at different speeds in different brain areas. This was evidenced most clearly by examining the interaction between OFC and ACC principal component weights, and DLPFC neuronal firing. Previous work has suggested a functional specialisation of OFC and ACC for delay- and effort-based decisions, respectively. In rats, lesions made to OFC lead to impulsive choices (of a small, immediate reward over a larger, delayed reward) whereas lesions made to ACC lead to less effortful choices (of a small, easily obtained reward over a larger reward demanding more work) (Rudebeck et al., 2006). In humans, functional imaging activations yield a similar double dissociation between effortful and delay-based decisions (Prevost et al., 2010; Croxson et al., 2009; Kurniawan et al., 2013; Kable and Glimcher, 2007), and further evidence can be drawn from the effects of electrical stimulation of ACC (Parvizi et al., 2013). Our results extend these findings by suggesting that the rate at which dynamics unfold in OFC preferentially influences a decision signal in DLPFC on delay-based trials relative to effort-based trials, whereas the converse is true for ACC’s influence on DLPFC (Figure 5). This supports the view that the effective influence of one brain region on another is modulated by the type of decision currently being made (Hunt et al., 2014; Tauste Campo et al., 2015). We note, however, that our analysis does not test the possibility is that a third (unobserved) variable could jointly affect both regions, doing so differentially on effort vs. delay trials.

Variation in ERP latency also corresponds to a shift in the phase of an oscillation (at low frequencies, present in the evoked response). Consistent with this, we found PC2 weights correlated with the phase of oscillations in the theta range (4-–8 Hz) as the decision was made (Figure 2—figure supplement 3). This suggests a possible way in which future studies may link our measure of LFP latency to spike-LFP coupling (Koralek et al., 2013; Canolty et al., 2010) or inter-regional LFP coherence (Nacher et al., 2013) during cognitive tasks.

The network model attempts to capture the dynamics of a single cortical region, but not interactions between regions. In our study, competitive dynamics similar to the model (saccadic action value difference transforming into chosen action) were clearly visible in DLPFC (Figure 1B/7B), but not in other regions (Figure 1—figure supplement 1). However, at the mesoscopic scale, dynamics of chosen value correlates were relatively indistinguishable across regions (Figure 2E, Figure 1—figure supplement 3). One explanation for these phenomena is that competition proceeds in a distributed fashion across multiple areas (Rushworth et al., 2012; Hunt et al., 2014; Cisek, 2012). Competition in DLPFC might occur selectively in saccadic action space, but this might be complemented by parallel competitions in other regions in other reference frames. Examples of such competitions might include those over particular decision attributes (e.g. OFC for delay, ACC for effort/physical action value), abstract goods (Padoa-Schioppa, 2013; Padoa-Schioppa and Assad, 2006), internal state variables (Bouret and Richmond, 2010), attentional allocation (Lim et al., 2011), goal or task-relevant prioritisation (Hunt et al., 2014; Hare et al., 2009) or other reference frames critical for the decision at hand (Hunt et al., 2013). Though lesion evidence clearly implicates areas like ACC, OFC and ventromedial PFC as critical and dissociable in value-based decision-making (Kennerley et al., 2006; Rudebeck et al., 2008; Camille et al., 2011; Noonan et al., 2010), our results indicate that identifying ‘chosen value’ signals is not sufficient to understand the functional dissociations of these areas in the decision-making process. Further studies are needed to fully understand the reference frame in which different decision areas contribute to decision-making (Hunt et al., 2013; 2014; Boorman et al., 2013), with a careful consideration of the relationship between underlying choice dynamics and neuronal correlates of value. Future recurrent network models of choice involving hierarchical competitions across multiple areas (Hunt et al., 2014; Chaudhuri et al., 2015) might also capture this distributed approach, and seek to explain our observations concerning the selective effort- and delay-based interactions across areas (Figure 5). A distributed account would predict that in other experimental paradigms, single neurons would show similar competitive dynamics to our DLPFC neurons but in complementary frames of reference (Rustichini and Padoa-Schioppa, 2015; Strait et al., 2014).

PCA is one of many possible approaches to obtaining a useful set of temporal basis functions to describe variation in ERP waveforms, and it may be improved upon by future investigations. We selected it for its simplicity, and its known properties of returning a temporal derivative when capturing evoked responses of variable durations (Friston et al., 1998; Mayhew et al., 2006; Woolrich et al., 2004). Directly computing the ERP waveform and its temporal derivative would also be a valid approach to derive temporal basis functions. However, this approach may face problems if different components of the evoked response do not covary with each other. For example, the large, fast evoked response observed across all regions within 200 ms of stimulus onset (Figure 1D) was comparatively small in the first two principal components (Figure 2B). This implies that cross-trial variation in this early sensory-evoked component occurred largely orthogonal to cross-trial variation in the later (putatively decision-related) component. This feature of the data is identified automatically using PCA.

The LFP decomposition returned by PCA shows a notable similarity to that observed in decompositions of trial-averaged waveforms from multi-electrode recordings in motor cortex during movement (Churchland et al., 2012). This provides links to dynamical systems perspectives on such activity. Unlike previous studies, however, our approach leverages mesoscopic dynamics containing high signal to noise ratio and reproducibility to extract single-trial information. This dispenses any requirement to first relate neural activity to experimental variables by averaging or regression (Churchland et al., 2012; Mante et al., 2013). We contend that this is particularly important in studying PFC and other areas distant from sensory input or motor output, whose relationship to experimental variables may often be considerably more complex than the experimenter envisaged (Rigotti et al., 2013).

Materials and methods

Neurophysiological procedures (monkey)

Request a detailed protocol

Full details of neurophysiological recording procedures are detailed in (Hosokawa et al., 2013) and (Kennerley et al., 2009), and precise recording locations in (Hosokawa et al., 2013). In brief, four male rhesus macaques served as subjects. Arrays of 10-–24 tungsten microelectrodes (FHC Instruments) were lowered acutely each day. Recordings were made from dorsolateral prefrontal cortex (DLPFC; primarily the dorsal bank of the principal sulcus), orbitofrontal cortex (OFC; primarily areas 11 and 13 between the medial and lateral orbital sulci) and anterior cingulate cortex (ACC; primarily area 24c in the dorsal bank of the cingulate sulcus). Recordings were also made from two subjects in the cingulate motor area, but this region is not considered in the present study. We excluded 5 sessions in monkey B that contained exclusively effort-based or delay-based decisions, 1 further session in monkey B where LFP data was not recorded, and 2 sessions in monkey E where the LFP data was heavily artifact-contaminated. Any electrode which did not contain a well-isolated neuron, and so might not lie within grey matter, was not included in the LFP analysis. From visual inspection of the evoked data, we further excluded from further analysis ~4% of individual electrodes that contained a high degree of artifact in the LFP recording. The numbers of recording electrodes included from each subject, after all exclusion criteria were applied, are shown in Figure 1—figure supplement 2. (As multiple units could occasionally be isolated from the same electrode, the total number of neurons exceeded the total number of electrodes for each region.) All procedures were in accord with the National Institute of Health guidelines and the recommendations of the University of California Berkeley Animal Care and Use Committee.

Experimental task (monkey)

Request a detailed protocol

Full details of the experimental task are detailed in (Hosokawa et al., 2013). In brief, subjects were well-trained on the expected value of a set of 32 pictures, 16 of which predicted a quantity of reward (fruit juice) and associated effort to obtain reward, and 16 of which predicted a quantity of reward and an associated delay to reward (Figure 1A). The costs and benefits were titrated such that choice probabilities were approximately equally (and linearly) affected by both cost and benefit (Hosokawa et al., 2013). Subjects performed a cost-benefit decision task where they chose between two pictures on each trial. On half of the trials (‘effort’ trials) they chose between a pseudorandomly selected pair of the ‘effort’-associated pictures. On half of the trials (‘delay’ trials) they chose between a pseudorandomly selected pair of the ‘delay’-associated pictures. Effort and delay trials were interleaved.

On each trial, subjects fixated a central fixation point for 1 s, followed by the appearance of the two pictures on left and right sides of the screen for 1 s (‘choice phase’). Throughout this choice phase, the monkey held fixation. Importantly, this meant that the time period when the monkey could respond (after the onset of the go cue, at 1000 ms) was outside of the window of our analyses, and so there was no potential behavioural (motoric) confound in neural analyses. This was with the exception of monkey K, who could not be sufficiently well-trained to fixate and so was free to saccade during the 1 s period. After the appearance of the go cue, the monkey saccaded to the preferred picture (except monkey K, who made a joystick response to the side with the preferred picture). Subjects received juice reward after the ‘cost’ (effort/delay) was delivered. Only successfully completed trials are included in our analysis.

MEG recordings and experimental task (human)

Request a detailed protocol

Full details of the experimental task, MEG data acquisition and analysis protocols are provided in (Hunt et al., 2013). Briefly, 18 subjects chose between two risky prospects consisting of differential levels of monetary reward magnitude and probability. Subjective values for each option were estimated using Prospect theory. We analysed data from ‘comparison’ trials (where both options appeared simultaneously, and subjects were free to respond at any time after decision onset). The analysed data were beamformed to a region of ventromedial prefrontal cortex (MNI coordinates = 6,28,−6 mm) previously found to contain dynamics analogous to those of the biophysical network model (Hunt et al., 2012). All subjects provided informed consent in accordance with local ethical guidelines.

Regression analysis of macaque single neurons (Figure 1B/C, Figure 1—figure supplement 1)

Request a detailed protocol

Well-isolated single units (see [Hosokawa et al., 2013] for details) were timelocked to the onset of the choice epoch of successfully completed trials to create rasters (lasting from 1 s prior to choice onset to 2 s after choice onset). Each trial’s data was then convolved with a boxcar to estimate the local average firing rate of the neuron on each trial in 200 ms sliding bins. These binned data were then regressed, across trials, against five variables using ordinary linear regression: a constant term for effort trials, a constant term for delay trials, the value difference between left and right options, whether the subject chose left or right, and the value of the chosen option. For each neuron, we calculated the coefficient of partial determination for each factor as in (Kennerley et al., 2011):

CPD(Xi) = [SSE(X~i)- SSE(X)]/SSE(X~i)

where SSE(X) refers to the sum of squared errors in a regression model that includes a set of regressors X, and X~i is a set of all the regressors included in the full model except Xi. In Figure 1B/Figure 1—figure supplement 1, we plot the mean /- s.e. across all neurons recorded in a given brain area (for sessions that were also included in the LFP analysis). In Figure 1C, we calculated the Z-statistic for action value difference at 300 ms and chosen action at 700 ms for each neuron from the regression model, and plotted the relationship between these Z-statistics across all DLPFC neurons.

Regression analysis of macaque LFP (Figure 1, Figure 1—figure supplement 3)

Request a detailed protocol

LFP data were downsampled from 1 KHz to 100 Hz, and timelocked from 500 ms before to 1000 ms after the onset of the choice phase. The average event-related waveform was computed for each electrode, and the grand mean /- s.e. of all electrodes within each brain region was plotted (Figure 1D). Based on previous behavioural modelling of the task(Hosokawa et al., 2013), value for each option was defined as reward level (scaled from 1 to 4) minus cost level (scaled from 1 to 4). For Figure 1E, we estimated the influence of chosen value and unchosen value on the evoked response at each timepoint using linear regression across trials, plotting the mean /- s.e. of the Z-scored regression coefficient across electrodes. In Figure 1—figure supplement 3, we split this regression into chosen and unchosen reward and cost. For Figure 1F, we estimated the local temporal derivative of the grand mean event-related potential, averaged over a local sliding window of 80 ms.

Dimensionality reduction of LFP/MEG data via principal components analysis (Figure 2)

Request a detailed protocol

To extract single trial information from the local field potential (LFP), we used principal components analysis (PCA) of event-related data. Single-trial LFP data were timelocked from 200 ms before to 1000 ms after the onset of the choice phase. Separately for each subject, all data from all electrodes in OFC, ACC and DLPFC were stacked to form a large matrix X. The dimensions of X are [nRecordedElectrodes*nTrials] by nTimepoints. Each row of X corresponds to the LFP data collected from a single trial, on a single electrode.

To extract single trial information from the human data, a similar ‘stacked’ matrix was formed, but here single-trial data across all subjects were stacked, to make a matrix X with dimensions [nSubjects*nTrials] by nTimepoints. Each row of X corresponds to the beamformed data collected from a single trial, in a single subjects. (One difficulty faced by this approach is that beamforming contains a singular value decomposition (SVD) step to determine the orientation of sources, and when beamforming is run separately on each subject, then the meaning of a positive-going and negative-going deflection can be different across subjects because of an arbitrary sign flip induced by SVD. To overcome this, we used an iterated procedure that selected a sign for each subject that maximised the correlation coefficient in the evoked response, between subjects, and then multiplied each subject’s data by the appropriate sign. Note that this sign-flipping correction does not produce changes in the PCA decomposition of the data, but instead simply ensures that the principal component weights can be interpreted in a consistent manner across subjects. It would not be a necessary step were a similar analysis to be run on a scalp potential, for example, whose sign is already consistent across subjects.)

The mean timecourse of the data was removed prior to running PCA, such that the principal components then captured cross-trial variability in the shape of the event-related waveform. We also removed trials that potentially contained non-neuronal (e.g. movement, electrical) artefacts. To achieve this, within-trial variability was indexed by taking the square root of the standard deviation of the event-related waveform across time. We excluded any trials whose within-trial variability index lay more than 2.32 standard deviations above the mean (>99th percentile).

In macaque data, most trials occur more than once in X, as in each recording session multiple LFP electrodes are recorded simultaneously and so there is a separate row for each electrode. Simultaneous multielectrode recording is not fundamental to the analysis approach of extracting single-trial dynamics. However, it becomes important when relating LFP and cellular data, below. In human data, each trial occurs only once, as there is one ‘virtual electrode’ per subject.

PCA was performed using the svd function in MATLAB. To ensure components had similar interpretations across subjects, we constrained PC1 to be positive 190 ms after stimulus onset, and PC2 to be negative 530 ms after stimulus onset. This constraint can be reasonably imposed by flipping the sign of both the principal component and its corresponding PC weights where necessary (as the sign of components returned by singular value decomposition is arbitrary). After this constraint was applied, highly similar components were observed across subjects (Figure 2C). In all subjects, PC1 controlled waveform amplitude, and PC2 controlled waveform latency. However, the timecourse of monkey K’s principal components were notably different in latency from other subjects’ waveforms (Figure 2C), and also had somewhat different value correlates, particularly in PC1, likely due to an inability to sustain fixation. For this reason, Figure 2E and Figure 2—figure supplement 4 excludes data from monkey K, and monkey K’s PCA value correlates are plotted separately in Figure 2—figure supplement 3. Importantly, this does not affect the interpretation of PC2, whose effects remain similar in monkey K as in other monkeys.

We note that it would have equally been possible to run dimensionality reduction on the data from a single electrode, in a single session. However, the difficulty faced with such an approach is how then to match components between different components and sessions, such that the components from the different PCAs have the same interpretation. The stacked PCA approach here is analogous to that used in 'dual regression' of functional MRI data, where resting state networks are matched across subjects by stacking all subjects into a large matrix before running dimensionality reduction (Filippini et al., 2009). It is appropriate due to the high degree of consistency of ERP waveforms (Figure 1—figure supplement 2 and Figure 2—figure supplements 1/2).

Regression analysis of single neuron data, using principal components derived from simultaneously recorded LFP (Figures 3–5 and associated figure supplements)

Request a detailed protocol

Single neuron data were rasterised and binned as for the regression analysis of decision variables. For main Figure 3C, a regression model was estimated containing 19 coregressors to model out the effect of all experimental variables (see Supplementary file 1 for a full list), plus the regressor of interest (trial-by-trial weights of PC2, derived from PCA decomposition of LFP from a simultaneously recorded electrode in another part of DLPFC). The CPD for the LFP-derived PC2 on single neuron firing was then estimated from this model. To calculate the percentage of individual significant neurons (in main text), we estimated a significance criterion that controlled for multiple comparisons across time empirically from the data, by repeating the regression model but permuting the design matrix (Nichols and Holmes, 2002). Within a time window of 250 ms-–750 ms after choice onset, this yielded a p<0.01 significance criterion of Z>2.99 for significant positive responses, and Z<-3.05 for significant negative responses. Any neurons exceeding these thresholds at any point during this time window are reported as significant. We found the CPD for PC2 remained similar when a reduced model of experimental variables was used, containing 6 coregressors rather than 19 – a constant term for each trial type, an indicator variable for the chosen action (L-R), the action value difference between left and right options, and the chosen value. For Figure 3A/B, we used this model and plotted CPD for action value difference, LFP-derived PC2 and chosen action in two single-neuron examples. For Figure 4 and associated figure supplements, we repeated this model, but subtracted the CPD for chosen value, action value difference and chosen action for a model that included PC1 and PC2 as coregressors, versus a model that included PC101 and PC102 (to account for any potential general reduction in CPD by including coregressors). Positive values on this figure therefore indicate a reduction in explained variance by experimental factors when PC1 and PC2 are included as coregressors in the model. Note that these analyses required simultaneously recorded LFP from another electrode in the same brain region, but this was only available for some recording sessions. The number of recorded neurons in each analysis was therefore smaller than in Figure 1/Figure 1—figure supplement 1 (DLPFC: n=205 neurons; OFC: n=114; ACC: n=194). For Figure 4—figure supplement 3, we sought to explore whether local LFP principal components provide greater contributions to the reduction in chosen value variance than those recorded from other areas. We therefore repeated the same analysis as in main Figure 4, but first orthogonalised local (DLPFC) PC1/2 weights with respect to PC1/2 weights from another simultaneously recorded area (either OFC/ACC). However, because both electrodes will contain observation noise on every trial, this analysis alone may not fully control for distal brain region effects. To provide a fairer comparison, we therefore performed the same analysis in reverse: examining the effect of the distal OFC/ACC PC1/2 weights, orthogonalised with respect to the local DLPFC PC1/2 weights. Figure 4—figure supplement 3B shows the latter analysis (distal orthogonalised with respect to local) subtracted from the former (local orthogonalised with respect to distal).

For Figure 5 and associated figure supplement, we examined the 40 recorded sessions where DLPFC, OFC and ACC recordings were made simultaneously. Single neuron firing rates from the 124 DLPFC units were explained using a model that contained separate 6 terms: separate constant terms for effort and delay trials, separate LFP-derived PC2 weights from OFC for delay and effort trials, and separate LFP-derived PC2 weights from ACC for delay and effort trials. Figure 5—figure supplement 1 shows the separate effects of PC2 from ACC and OFC for delay and effort trials respectively, on DLPFC firing; Figure 5 shows the difference in CPD for effort minus delay trials for each region.

Significance for neuronal populations in Figures 4,5 was assessed using a non-parametric permutation test (Nichols and Holmes, 2002). At each timepoint, for each neuron, we compared the CPD for the relevant variable of interest to the CPD for the same variable in a 500 ms pre-choice baseline period. Across the neuronal population, we tested whether this difference was significantly greater than zero, by comparing the mean difference to that of a null distribution, generated by randomly sign-flipping each neuron’s effect and averaging across this permuted data. 10,000 permutations were used. We elected to use a non-parametric test as the difference in CPD will yield a non-Gaussian distribution; in practice, however, similar results could be obtained with parametric statistics (Student’s T-test).

Supplementary details of network modeling

Methods

We adopted the same model as developed by (Wang, 2002) to make predictions of single neuron firing rates and regressions. The network consists of 2000 neurons, of which 400 are inhibitory interneurons and 1600 are excitatory pyramidal cells (see main Figure 7A for model schematic). The excitatory pyramidal cells fall into three categories: those selective for option A (240 cells), those selective for option B (240 cells), and the remaining, non-selective population (1120 cells).

Structure of network connectivity

Request a detailed protocol

As in (Wang, 2002), the network possesses all-to-all connectivity but the strength of connections between cell types from different populations varies. The majority of connections have a synaptic strength w=1. Within the selective populations (‘recurrent’ connections from ‘A’ selective to ‘A’ selective and from ‘B’ selective to ‘B’ selective cells), however, excitatory connections are stronger, with a synaptic strength w=1.7. Between the selective populations (i.e. from ‘A’ selective to ‘B’ selective cells and vice versa), excitatory connections are weaker, with a synaptic strength w-=0.8765 (= 1-f(w-1)/(1-f), where f=0.15 (fraction of cells in each excitatory pool). (Note that these weights are based on a Hebbian principle, as cells with similar selectivity are endowed with stronger connections, whereas those with different selectivity have weaker connections.) This network structure provides the network with its properties of integration via recurrent excitation, and competition via mutual inhibition (Wang, 2002).

Selective inputs

Request a detailed protocol

Selective excitatory cells receive input, IA and IB, whose firing rate is proportional to the value of option A or option B on each trial. For simulation purposes, the mean input rate, for each trial of both option A and B were independently drawn from a uniform distribution from 20 to 60 Hz, giving a fixed ‘value’ for each trial for each option, µA and µB. Note that this differs from the model of (Wang, 2002) in that option A and option B are not anticorrelated with each other (appropriate to the task that we are modelling here). As in (Wang, 2002), at every 50 ms, the Poisson input rates to each population were resampled were independently resampled from Gaussian distributions with means µA and µB, and standard deviation σ=10. Hence the two inputs vary stochastically in time (if σ were to be set to 0, the two inputs would become constant across time).

Background inputs; implementation of simulations

Request a detailed protocol

In addition to selective inputs, all neurons receive AMPA currents from background ‘noise’ inputs generated from a Poisson process with rate 2.4 kHz that varies independently from cell to cell. Both pyramidal cells and interneurons are described by leaky integrate-and-fire neurons using the exact same physiological parameters and dynamical equations as specified in (Wang, 2002). The network was initialised for 500 ms without selective inputs, and the decision was presented to the network for the subsequent 2500 ms (total simulation time of 3 s). The ‘chosen option’ was determined by the selective population whose average firing rate first exceeded 25 Hz, with the additional condition that the average firing rate of that population had to be 15 Hz greater than that of the other population. LFP predictions were generated by summing the firing rates of all cells (both selective and non-selective) in the network model. (Note that although LFP is thought to be more closely related to synaptic inputs than firing rates, input conductances and firing rates are essentially collinear within the model).

Simulations and regression of single-neuron firing rates

Request a detailed protocol

We implemented the model in MATLAB and simulated 400 trials of data. The local firing rate of each population was estimated by applying a sliding window of 50 ms, and averaging across the firing rate of all cell types within a given population. To obtain the results in Figure 7B, we then regressed each population’s firing rates against a model that contained a constant term, a term for whether the model chose A on each trial, the difference in value between option A and B (=(µAB)/80), and the value of chosen option (=µA/80 on ‘chose A’ trials, µB/80 on ‘chose B’ trials). We plot the regression coefficient of the ‘A’ selective cells as a function of time.

Results

Request a detailed protocol

Main Figure 7B shows the structure of the network model. We first tried to replicate the transitions in value correlates that occur at the level of single neuron firing in DLPFC (Figure 1B). Figure 7B shows that by applying multiple regression, we reveal that in the selective populations of neurons, there is a variation across time in factors that influence their firing rates. Option values have a strong influence on firing rates of the selective neurons early in the trial (Figure 4B, blue), but later in the trial, as the network selects an option by approaching a stable attractor state, firing rates become better explained by the network’s choice (Figure 7B, red). At the same time, the chosen value also influences neuronal firing (Figure 7B, green). Note that this coding of chosen value happens across all neurons in the network (i.e. both selective and non-selective), whereas the transition from option value difference to chosen option only occurs in the selective cells.

Value correlates occurring at the level of the LFP across all three areas (Figure 1E; Figure 1—figure supplement 3) transition from initially reflecting value sum (chosen unchosen value) to later reflecting value difference (chosen-unchosen value). In a previous study, we showed that this transition is predicted to occur at the level of the LFP from the network model, and also that these predictions explained observations in human MEG data (Hunt et al., 2012). We therefore investigated whether we could apply a similar PCA decomposition to the model to that adopted to study our macaque LFP and human MEG data (main Figure 2), and also whether variability in LFP dynamics would similarly explain away chosen value coding in single neuron firing rates, as observed in Figure 4.

We ran a similar PCA decomposition on single-trial LFP predictions from the model to that run on data. The shape of the evoked LFP response from the model is shown in Figure 7—figure supplement 1A. Note that the evoked response from the model differs somewhat from the data. First, at the end of the trial, the model remains in a stable (working memory) attractor state with high activity, whereas the evoked response appears to return gradually towards its baseline level. This could easily be explained by the prepared action plan not being stored in prefrontal cortex, but in premotor regions. A second difference is that there is little to no variation in the single-trial amplitude of the response. Instead, the principal source of variability is in the reaction time, or rate of rise, of the network model. The only variation in amplitude is when the model reaches its final attractor state, but crucially this covaries with the speed at which the attractor basin is approached, and so with the rate of rise of the model. As such, the top two principal components (shown in Figure 7—figure supplement 1B) do not straightforwardly resemble the principal components in the data: it is PC1 that captures the primary variation in response latency, playing a similar role to PC2 in the data (see Figure 7—figure supplement 1C). We therefore regressed single-trial model PC1 weights onto chosen and unchosen value, errors and reaction times, and found a similar pattern of results to that found in the data – a predominant influence of chosen value on single trial PC1 weights; a smaller, but positive influence of unchosen value on single trial PC1 weights, not found in the current dataset; a positive influence of error trials on network PC1 weights, and a strong negative influence of reaction times on network PC1 weights (Figure 7C).

Finally, we asked whether in the network model, chosen value variability is explained away by the internal dynamics of the model, as estimated by the LFP decomposition. We asked the same question of the model as we did of the data: does including PC1/2 from the model LFP decomposition cause a reduction in the regression for chosen value in model single neuron firing, but not for option value difference or for chosen option (Figure 4)? We found that it does (compare Figure 7D with Figure 7B).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
    Neuroeconomics: Decision-Making and the Brain
    1. P Glimcher
    2. E Fehr
    (2014)
    P Glimcher, E Fehr, editors. San Diego: Academic Press.
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
    Coherent delta-band oscillations between cortical areas correlate with decision making
    1. V Nacher
    2. A Ledberg
    3. G Deco
    4. R Romo
    (2013)
    Proceedings of the National Academy of Sciences of the United States of America 110:15085–15090.
    https://doi.org/10.1073/pnas.1314681110
  49. 49
  50. 50
  51. 51
    The problem with value
    1. JP O'Doherty
    (2014)
    Neuroscience and Biobehavioral Reviews 43:259–268.
    https://doi.org/10.1016/j.neubiorev.2014.03.027
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
    Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey
    1. MN Shadlen
    2. WT Newsome
    (2001)
    Journal of Neurophysiology 86:1916–1936.
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71

Decision letter

  1. Michael J Frank
    Reviewing Editor; Brown University, United States

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]

Thank you for choosing to send your work entitled "Capturing the Temporal Evolution of Choice in Prefrontal Cortex" for consideration at eLife. Your full submission has been evaluated by Eve Marder (Senior editor), Michael Frank (Reviewing editor), and two peer reviewers, and the decision was reached after discussions between the reviewers. Based on our discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife in its current form.

We should note first that all involved expressed interest in the findings and the high quality work, but questioned the significance and insight afforded by the work over and above what your group has published in the Nature Neuroscience article, which showed how the time course of various value-related signals are replicated using the attractor models. The main issues that came up in the review discussion (some of which are reiterated by the individual reviewer comments below) are as follows. First, both reviewers thought that the conclusions were overstated. They felt that while the PCA analysis was interesting, just because some of the variance in chosen value signals could be accounted for by a latency-based PC, that does not itself make the chosen value signal functionally irrelevant (epiphenomenal) (and even then, some variance remained). They also weren't compelled by the implication that the signals are pre-decisional rather than post-decisional, given that it is very difficult to infer when a decision was made (i.e. to show that the signal emerges when LFP is still ramping does not necessarily imply that it was prior to the decision). It was felt that it is equally plausible and perhaps more likely that CV signals are emerging as decisions are formed, and that CV signals have multiple functions, depending on the time point.

That said, we agree that the single-trial metric of LFP ERP latency does seem like a nice innovation. And its relationship to task variables (chosen value and error commission) provides some validation. But the reviewers were equally concerned about the relationship with effort vs delay signals given the timing of the observed variables, and the significance of the LFP latency metric's relationship to spiking activity seems murkier.

Thus we are rejecting the paper on the grounds that the main new analyses are not as compelling as would be needed for eLife. If you feel strongly that you can address these criticisms with new analyses and reframe the paper so that it is in-line with what can be concluded from the data given the concerns, we could consider a new submission, but again the new insights afforded beyond your previous work (beside the discovery of a PC component that relates to ERP latency) would have to be quite clear.

Reviewer #1:

This manuscript by Hunt et al. is based on the analyses of several different types of data collected from multiple experiments that have been described in previous papers. The most important piece of data was the single-neuron activity recorded from 3 different cortical areas, including the dorsolateral prefrontal cortex (DLPFC), anterior cingulate cortex (ACC), and orbitofrontal cortex (OFC). The authors have also analyzed the local field potential (LFP) data collected from these areas in the same experiment. The results obtained from these analyses were then compared to the results from the human MEG experiment and their network model, which is based on an influential model of Wang (2002). Through a set of clever analyses, the authors draw their main conclusion that a trial-by-trial index of decision dynamics, derived from the PCA analysis of LFP data, was largely correlated with the chosen value. Most interestingly, this index was used to demonstrate that the functional interactions between different cortical areas reflect the type of information considered during decision making (e.g., delay vs. effort). However, there are several major weaknesses in this manuscript.

1) The authors argue that the chosen value signals are epiphenomenon based on the fact that the chosen value signals encoded by individual neurons can be substantially diminished by the PCA-based trial-by-trial index of the decision process measured from the local activity. This argument is not convincing. It appears that the same logic, if applied to other types of signals encoded by any neurons, would make virtually any kind of neural signals epiphenomenon. In other words, the fact that a certain type of signal encoded by a given neuron can be accounted for by the local dynamics of its surrounding network does not mean that the encoded signal has no functional significance.

2) Although the dynamic changes in the functional connectivity identified by this analysis related to effort vs. delay condition are interesting, the time course of this difference is not consistent with the authors' conclusion. The results shown in Figure 5 indicate that the difference does not arise until 600 ms after the onset of the choice period. However, the chosen action signals in the DLPFC begin to emerge within 200 ms from the onset of the choice period, and reach its first asymptotic level before 400 ms. Therefore, the communication between DLPFC and ACC/OFC identified in this analysis is unlikely to be the main driver for the choice signals in the DLPFC.

3) It is not clear whether the use of PCA (i.e., PC2) is really necessary to derive the main conclusion of this study, because an alternative possibility is that the explanatory power of PC2 might be largely due to the similarity of its temporal profile with that of chosen value signals (shown in Figure 1B). If this is the case, then the fact that PC2 is correlated with the chosen value signal (Figure 2E) is not too surprising.

Reviewer #2:

This paper reports a monkey electrophysiology study of two-alternative value-based decision making, focusing on the temporal evolution of decision-related neural signals in DLPFC, ACC, and OFC. Main findings include the following:

a) DLPFC spiking activity appeared to progress from a graded encoding of the relative action values to a discrete encoding of the selected action.

b) LFPs in all 3 regions positively encoded the chosen option's value throughout the choice epoch, but shifted from a positive-signed to a negative-signed encoding of the unchosen option's value. This can be interpreted as a progression from encoding "value sum" to "value difference".

c) Single-trial LFPs were summarized using two PCA-based indices that reflected the overall amplitude and latency of the waveform, respectively. These indices scaled with the value of the chosen option, and also showed an association with spiking activity.

d) DLPFC spiking was associated with LFP latency in other regions. The association was stronger with OFC when delay-related costs were at stake, and stronger with ACC when effort-related costs were at stake.

Parallel analyses were applied both to a human MEG data set and to the behaviour of a neural network model, and some of the same effects were obtained. The application of directly parallel analyses to multiple modalities of neural data is a significant strength of the paper. The development of trial-wise indices of LFP latency is likewise a potentially important contribution.

Comments:

1) The paper has a very cogent Introduction pointing out the challenge of reading out static decision variables from dynamic processes. But some of the rhetoric around this topic seems to go beyond what the data support. The authors express skepticism that neural activity represents economic variables (the word "represent" frequently appears in scare-quotes). But the idea that neural signals evolve dynamically isn't incompatible with the idea that they represent economic variables. This paper's conclusions really seem to pertain to how, rather than whether, economic variables are neurally represented. The results suggest that chosen value, for example, is encoded in the LFP amplitude in a specific time window (e.g. around 400 ms where PC2 peaks, indicative of the latency of the underlying waveform), rather than, say, the overall peak of the evoked LFP.

2) Trial-by-trial LFP-derived indices are shown to be related to spiking activity, but it's not entirely clear how to relate this finding to what is already known about spike/LFP coupling. Could this effect be accounted for in terms of straightforward first-order correlations between the spiking and LFP time courses? What makes this seem plausible is that the time course of PC2 weights (Figure 2B) looks similar to the time course of PC2 effects on spiking (Figure 3). Spikes and LFPs were recorded from different electrodes to avoid contamination, but it's still possible that a genuine relationship holds between the two (to mention just one previous example, a general negative correlation between multi-unit activity and LFP is shown in Whittingstall & Logothetis, 2009, Figure 1A).

3) I don't follow the logic of the analysis in the subsection “Chosen value correlates as an epiphenomenon of varying decision dynamics”. In a regression model of spike rates, predictors for LFP-derived PC1 and PC2 steal variance away from a "chosen value" predictor but not from other task-related predictors. This is taken to imply that "chosen value" effects are mainly picking up on the latency of the neural response. But this result actually seems consistent with these effects being related to either amplitude (PC1) or latency (PC2). The latency interpretation would be on stronger footing if it could be shown that chosen value effects varied with the addition/removal of PC2 only.

4) I'm concerned about whether it's appropriate to treat LFPs from simultaneously recorded electrodes as independent observations. Panel A of Figure 4—figure supplement 2 shows that LFP features are highly correlated across electrodes, even in different brain regions. This probably isn't a problem for the PCA decomposition, but it does seem like an issue for inferential tests of associations between LFP and task variables. For example, Figure 1D and 1E plot "mean /- s.e. across electrodes"-but since multiple electrodes were recorded per trial, the nominal number of electrodes may be larger than the true number of independent observations. The same concern applies to the results in Figure 2E and related figures. It would be helpful for the paper to clarify how interdependence among electrodes was dealt with.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for resubmitting your work entitled "Capturing the temporal evolution of choice in prefrontal cortex" for further consideration at eLife. Your revised article has been favorably evaluated by Eve Marder (Senior editor), Michael Frank (Reviewing editor), and three reviewers, including a new referee who was not involved in the previous round of review. They all agree that the manuscript has been improved and the novel contributions are clear and worthwhile relative to the previous work. However, there are some remaining issues that need to be addressed, as outlined below.

1) The authors claim that the PC2 from LFP provides a measure of local dynamics that is more specific (and relevant to unit activity) than larger scale global dynamics. The analysis provided by the authors (effect of local LFP after orthogonalizing on LFP from another region) is encouraging in this regard, however supporting the claim would also require showing that the opposite is not true (i.e. that running the analysis in reverse does not yield the same result). The correlation coefficients in the histograms of Figure 4—figure supplement 3 suggest that this may be the case, although there does not seem to be much of a within region advantage for DLPFC in particular. Also, as far as I can tell, the authors did not incorporate reaction time, which is typically used as a trial-by-trial indicator of decision dynamics, into these analyses. I suspect that this is because the required delay is sufficiently long to create a floor effect on RTs, however if this is the case it deserves mention in the text. Otherwise the authors should test whether LFP measures relate to the "chosen value" aspects of neural activity after accounting for RT.

2) The authors show that PC2 captures variance in neural firing that is typically ascribed to chosen value. I agree with the authors that this is an important point. Yet I found the interpretation of this point to be somewhat overly specific: "We therefore questioned whether chosen value coding might simply be a consequence of the same neural dynamics occurring at varying latencies across trials." This is certainly one interpretation of the authors' results, but not the only one. Another (more standard) interpretation of this finding is that the LFP components mediate the chosen value representations at the level of single units. Given that the LFPs are often interpreted as measurements of local synaptic input, I find this interpretation to be fairly reasonable and consistent with the data provided. The finding that PC1 (which does not contain timing information) also steals variance from chosen value may even provide some specific evidence for this interpretation. As I understand it, the main reason that the authors' interpret the data in terms of dynamics is provided in the final analysis showing that in an attractor network of decision-making the same sorts of signals emerge as a byproduct of competition. I think in order to improve the clarity it would be useful to avoid specific interpretations of causality until the Discussion section where both possibilities should be discussed.

3) Relatedly, another reviewer noted that the most direct evidence for the notion that chosen-value effects emerge from "the same choice dynamics occurring at different rates on different trials" is derived from the analysis on p. 10, but I'm afraid I'm still not onboard with the logic of this. The authors foreground an analysis showing that the chosen-value effect in spike rates is reduced by jointly including PC1 (indexing LFP amplitude) and PC2 (indexing LFP latency) in the regression model (Figure 4, Figure 4—figure supplement 1, and Figure 4—figure supplement 3). I still don't see how this is germane to their conclusion about latency. The specific effect of the latency-related PC2 is now shown as a supplement, and is considerably weaker albeit nonzero. The original chosen-value effect, which peaked at a% CPD of around 0.9 (Figure 1B), is reduced by up to about 0.1 by both PCs together (Figure 4), or 0.03 by PC2 alone (Figure 4—figure supplement 2). So wouldn't it be at least as accurate to conclude that chosen-value effects emerge from the same neural dynamics occurring at different amplitudes across trials? (The predictions of the attractor network are also unclear in this regard, i.e. in the predictions for combination of PCs, and so it seems important that the authors clarify the predictions of the attractor model.) The same concern applies to the local-versus-distal result in Figure 4—figure supplement 3, which is interpreted solely in terms of neural latency but results for PC2 alone are not shown.

It seems to me an easier conclusion to square with the data would be that chosen-value effects originate from a combination of the amplitude and speed of the dynamically unfolding neural response. This may not be so starkly different from existing perspectives, although it's of course still valuable to see the details of how it plays out. The authors could also easily reframe the result that LFP measures partially mediate the chosen value effect and tone down the oppositional framing of decision dynamics versus static chosen value representation. They could use the network model results as a podium to say that chosen value representations need not emerge for the purpose of encoding chosen value, which would be entirely accurate from my standpoint.

4) A similar issue comes up in the section on neural network modeling, which shows that chosen-value effects in the simulation can be explained away by a principal component capturing variability in neural dynamics, but doesn't state clearly enough that "dynamics" here refers, not just to latency, but to a mixture of amplitude and latency (Figure 7—figure supplement 1, panel C). (Related to this, a sentence the sentence “we obtained principal components that controlled waveform amplitude and latency, as in the ERPs" seems inaccurate, since what was actually extracted was a single component that mixed amplitude and latency, unlike the earlier analysis.)

5) The authors use the first principal component of the attractor network model decomposition rather than PC2. While I agree with the authors that this component seems to contain some temporal information, it does seem to mainly capture amplitude. Given that the authors interpret PC2 as if it were the derivative of the ERP, it seems that it would be a bit easier to make the connection between computational model and biology if model signals were decomposed using separate components to capture overall amplitude and derivative. This could be done simply by creating weight vectors based on the average model response (ERP) and the derivative of that signal.

6) The authors note that PC1 relates primarily to value sum, however this does not seem to be the case in ACC, where PC1 takes positive coefficients for chosen value and negative coefficients for unchosen value. This makes me wonder whether the added contributions of PC1 in explaining single unit chosen value effects relate primarily to the inclusion of PC1 from ACC electrodes. In my opinion, such a finding would not necessarily detract from the dynamics idea, as overall ACC signal amplitude may play a role in adjusting decision threshold, which directly affects dynamics. Either way, the authors should make the heterogeneity across recording locations clear in this regard.

7) The section on inter-region correlations (“OFC and ACC dynamics have distinct influences on delay- and effort-based decisions respectively”) is quite specific about causal directionality, concluding that DLPFC activity is influenced by OFC and ACC. It isn't apparent to me that there's any support for this. (A third-variable explanation would be a plausible alternative.)

https://doi.org/10.7554/eLife.11945.027

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

We should note first that all involved expressed interest in the findings and the high quality work, but questioned the significance and insight afforded by the work over and above what your group has published in the Nature Neuroscience article, which showed how the time course of various value-related signals are replicated using the attractor models.

[…] We could consider a new submission, but again the new insights afforded beyond your previous work (beside the discovery of a PC component that relates to ERP latency) would have to be quite clear.

We thank both reviewers and the Reviewing editor for their careful reading of the paper and their enthusiasm for certain aspects of our results. Below, we address each of the reviewers’ comments point-by-point. First, however, we wish to briefly address the key insights and points of significance afforded by the present study that go beyond our previous work.

The main contribution of the paper is to explain the origins of some of the most commonly found signals during value‐guided choice. Uniquely to this study, we have attempted to do so in both single-unit physiology and human neuroimaging data.

Our paper provides two perspectives on value correlates found in these data. First, we establish the relationship between mesoscopic PCA-derived latency components and correlates of value in single-neuron firing, exploiting the single-trial nature of the measurements obtained from the PCA. Second, we test predictions of different value-related signals via the attractor network model in both forms of data.

Although different in approach, these two perspectives remain complementary to one another, as they both explain neural correlates of value in terms of variation in neural dynamics across trials. In both cases, we believe there to be substantial novelty and significance over and above what we have published previously.

With regards the single-trial PCA approach, both reviewers appreciated that this was a useful and potentially important innovation. The key advantage of our approach is that it provides single-trial estimates of the speed of decision formation from a neural measurement in multiple simultaneously recorded areas. As Reviewer #2 noted, it is a significant strength that this allows the same analyses to be performed on both human and monkey data, and that in the monkey data, the measure can then be related to spiking activity. As Reviewer #1 noted, the application of this approach to study functional interactions between OFC, ACC and DLPFC is an interesting and significant novel result for our understanding of cost-benefit decision formation. In short, this PCA approach provides a reliable internal measure of the speed at which a decision is made which could be useful for understanding the neural processes that regulate the speed of decision or action in a broad array of behavioural neuroscience research.

Whilst the novelty of this aspect of the paper appears not to be in question, both reviewers felt we had overstated some of the claims made. We agree with the reviewers on this point. We have therefore reframed the paper in line with their comments. Moreover, both reviewers had technical concerns about some of the analyses shown, and asked us to rule out alternative explanations of the PCA-derived effects. We have sought to address each of their concerns with several additional new analyses, which we detail in the point-by-point response to their reviews below.

Regarding the attractor network modeling of LFP and single-unit data, one of the overarching claims in both the present paper and our Nature Neuroscience (2012) paper is that the attractor network model of Wang (2002) is a useful framework for explaining the temporal evolution of signals during value-guided choice. A transformation seen in our previous paper (from signaling overall value to value difference in human MEG) is also seen in the LFP data in our current paper across all three brain areas. There are, therefore, some similarities between the ideas motivating both papers.

However, there are several novel findings in the present paper that make a substantial contribution beyond that of our previous paper. In particular:

a) In DLPFC, predictions of the model are borne out at the single‐unit level as well as at the mesoscopic scale of LFP signals (compare Figure 1B and Figure 7B).

b) In DLPFC, the transformation from signaling action value difference to signaling the chosen action happens within the same neurons (Figure 1C), matching the model’s prediction that pools of neurons selective for different options compete across time to become the selected option. Such a finding is consistent with value comparison being realized within DLPFC in an action frame of reference.

c) Chosen value signals are ubiquitous throughout many regions with similar latencies in both single-unit and LFP data, whereas only DLPFC shows action value difference to chosen action transformations (Figure 1—figure supplement 1). Such a finding is consistent with a model of distributed competition via mutual inhibition, as outlined in the discussion.

d) LFP value signals emerge at the time when the derivative of the LFP signal is non-zero (Figure 1E/F), implying that value affects the rate of change of the mesoscopic signal, not the amplitude of the evoked response.

In short, we cannot accept that the present findings do not add additional significance and insight over what was in our previous Nature Neuroscience article. There is a substantial qualitative difference in the evidence being provided in the two papers.

Importantly, however, in the revised manuscript we have provided further additional analyses in support of our conclusions, which we hope address the reviewers’ concerns. These are outlined in the point-by-point responses below.

[…] The main issues that came up in the review discussion (some of which are reiterated by the individual reviewer comments below) are as follows. First, both reviewers thought that the conclusions were overstated. They felt that while the PCA analysis was interesting, just because some of the variance in chosen value signals could be accounted for by a latency-based PC, that does not itself make the chosen value signal functionally irrelevant (epiphenomenal) (and even then, some variance remained). They also weren't compelled by the implication that the signals are pre-decisional rather than post-decisional, given that it is very difficult to infer when a decision was made (i.e. to show that the signal emerges when LFP is still ramping does not necessarily imply that it was prior to the decision). It was felt that it is equally plausible and perhaps more likely that CV signals are emerging as decisions are formed, and that CV signals have multiple functions, depending on the time point.

We agree with both reviewers that some of the conclusions were overstated, and that we made the arguments of the paper too rhetorical. We have therefore reframed the tone of the paper throughout. In particular:

i) We strongly agree that the key idea is that chosen value signals are emerging ‘as’ a decision is formed. We did not intend to imply that the signals are pre-decisional rather than post-decisional. To clarify this, we have added to the Introduction:

“[…] the dynamical perspective proposes that they may in fact originate as a consequence of time evolving decision processes. Rather than casting a chosen value representation as signaling ‘pre-decision’ or ‘post-decision’ variables to downstream brain areas, it argues that such correlates inevitably emerge as a decision is made.”

To the Discussion, we have added:

“It can be very difficult in practice to infer when a decision begins or ends, and this poses difficulties when labelling a neural correlate as ‘pre-decision’ or ‘post-decision’. The dynamical perspective instead explains certain correlates as emerging as the decision is formed.”

ii) We agree with Reviewer #1 that there may still be functional relevance to chosen value signals, and that they may play different roles at different points in the trial. To make this more explicit in the Discussion, we have added:

“[…] These signals may still remain of functional significance for other computations (such as subsequent computation of the reward prediction error (Rangel and Hare, 2010).”

And:

“[…] A further important caveat is that chosen value correlates may be explained by different mechanisms at different points in the trial.”

iii) As Reviewer #2 correctly surmised, the paper’s conclusions really pertain to how certain value signals might originate, rather than whether they must then be functionally relevant or not (as implied by the term ‘epiphenomenal’).

We have therefore removed the claim that because the chosen value signal is reduced by the PCA-derived estimates, it is epiphenomenal. To give examples of the kinds of changes that have been made (there are many such changes throughout the paper):

“This implies that chosen value correlates are epiphenomena of the speed at which dynamics unfold locally within a particular cortical area […].”

has been replaced by:

“This implies that some of the neuronal variance attributed to chosen value correlates can be explained as a consequence of the speed at which dynamics unfold locally within a particular cortical area […].”

iv) We also note in the Discussion that it is important to remember that only some of the chosen value variance is explained using the single- trial PCA approach:

“It is important to note that only a portion, not all, of the variance was removed by this step. As such, it may be the case that there is coding of chosen value during the decision that is not explained the dynamics of decision formation.”

v) We have removed the usage of quotation marks around the word represent, which concerned Reviewer #2.

We hope that by reframing the paper in this way, we have made its tone less rhetorical and also made clearer what can and cannot be concluded from the data.

That said, we agree that the single-trial metric of LFP ERP latency does seem like a nice innovation. And its relationship to task variables (chosen value and error commission) provides some validation. But the reviewers were equally concerned about the relationship with effort vs delay signals given the timing of the observed variables, and the significance of the LFP latency metric's relationship to spiking activity seems murkier.

Reviewer #1 noted the potential interest of the PCA-based analyses between brain regions for effort vs. delay trials, but had concerns about the late timing of these effects. However, these were addressable concerns, and we have provided a response to them in the response to Reviewer #1. Reviewer #2 was concerned that the relationship between the LFP PCA and spiking could be more clearly related to existing knowledge about spike-LFP relationships. We have now addressed this point below.

Reviewer #1:

This manuscript by Hunt et al. is based on the analyses of several different types of data collected from multiple experiments that have been described in previous papers. The most important piece of data was the single-neuron activity recorded from 3 different cortical areas, including the dorsolateral prefrontal cortex (DLPFC), anterior cingulate cortex (ACC), and orbitofrontal cortex (OFC). The authors have also analyzed the local field potential (LFP) data collected from these areas in the same experiment. The results obtained from these analyses were then compared to the results from the human MEG experiment and their network model, which is based on an influential model of Wang (2002). Through a set of clever analyses, the authors draw their main conclusion that a trial-by-trial index of decision dynamics, derived from the PCA analysis of LFP data, was largely correlated with the chosen value. Most interestingly, this index was used to demonstrate that the functional interactions between different cortical areas reflect the type of information considered during decision making (e.g., delay vs. effort).

We thank the reviewer for these comments. It is clear that he/she understood well the close relationship between our current study, our previous MEG study (Hunt et al., 2012) and the Wang (2002) model. In considering the relationship between these different studies, we would like to reiterate the four key findings outlined above concerning the novelty of the present results over and above our previous findings (see point 2 in response to cover letter, above). We are unaware of any study that has used this model to explain the dynamics of single unit data in DLPFC, shown that the action value to chosen action transformation occurs within the same neurons, shown that chosen value single neuron correlates are found with similar strengths and time-courses across multiple simultaneously recorded brain regions, and shown that the LFP value signals emerge as the signal is ramping (when the derivative of the signal is high). We believe these empirical findings to be of considerable importance when evaluating the hypothesis that the Wang (2002) model explains single-neuron dynamics during value‐guided choice.

We are also pleased that the reviewer considered the PCA-based analyses of the LFP data to be both novel and interesting. However, it is clear that he or she felt we overstated our interpretation of the reduction in chosen value signals by the PCA-based single-trial index, and had two other technical concerns about our findings using this approach. As we outline below, we agree with the reviewer’s point 1 and have tempered our conclusions in line with his/her argument. We have also sought to address his/her points 2 and 3 below.

1) The authors argue that the chosen value signals are epiphenomenon based on the fact that the chosen value signals encoded by individual neurons can be substantially diminished by the PCA-based trial-by-trial index of the decision process measured from the local activity. This argument is not convincing. It appears that the same logic, if applied to other types of signals encoded by any neurons, would make virtually any kind of neural signals epiphenomenon. In other words, the fact that a certain type of signal encoded by a given neuron can be accounted for by the local dynamics of its surrounding network does not mean that the encoded signal has no functional significance.

Both reviewers felt that we had overstated the claim that could be made with our results, and we agree with Reviewer #1 that there are difficulties with labeling the chosen value signal as ‘epiphenomenal’. We have sought to address both Reviewer #1 and Reviewer #2’s concerns by changing the tone and conclusions of the paper throughout.

However, we disagree with the reviewer that the same logic would apply to signals encoded by any neurons. The PCA analysis causes a reduction in variance explained by chosen value, but the same reduction in variance is not found for either action value coding or chosen action coding, as is shown in Figure 4. This demonstrates that it is selectively chosen value coding which can be explained as a consequence of the speed at which dynamics unfold locally within a given area.

We refer the reviewers to our responses i) – v) above.

2) Although the dynamic changes in the functional connectivity identified by this analysis related to effort vs. delay condition are interesting, the time course of this difference is not consistent with the authors' conclusion. The results shown in Figure 5 indicate that the difference does not arise until 600 ms after the onset of the choice period. However, the chosen action signals in the DLPFC begin to emerge within 200 ms from the onset of the choice period, and reach its first asymptotic level before 400 ms. Therefore, the communication between DLPFC and ACC/OFC identified in this analysis is unlikely to be the main driver for the choice signals in the DLPFC.

We agree with the reviewer that our original interpretation of this result was problematic. Based on subsequent analysis of the data, we have removed our original interpretation of this finding, and rewritten this section in a way that more reasonably reflects the timings of the different effects shown.

In summary, there is good evidence that OFC and ACC both influence DLPFC at the times relevant for the decision process. However, we now remain more agnostic about why the difference between trial types emerges late in the decision phase than in our initial submission. This late timing is robust, replicating across two different groups of neurons. But in a further analysis, we find that it is particularly strong in DLPFC neurons that correlate with chosen value. We now offer one potential explanation for the late timing of the effect in the Discussion.

Firstly, we think it is important to consider figure 5—figure supplement 1. This shows that although the difference between effort and delay trials for ACC and OFC only emerges late in the decision process, both regions have a significant influence on DLPFC firing far earlier in the trial.

We have now made this point more explicit in the revised manuscript:

“[…] Notably, OFC and ACC PC2 weights both modulated DLPFC neuron firing from around 200 ms after choice onset on both trial types (Figure 5—figure supplement 1), consistent with the time at which DLPFC neurons first begin to encode choice values (Figure 1B). These effects on DLPFC firing were maintained for several hundred milliseconds.”

However, we previously concentrated on the late timing of the difference effect in Figure 5 (subtracting one trial type from another):

“[…] towards the end of the decision period, the contribution of ACC and OFC PC2 to DLPFC neuron firing changed in a cost-specific way. By subtracting one trial type from the other, this revealed that ACC explained more variance in DLPFC firing on effort trials than delay trials (Figure 5A, magenta), whereas the converse was true for OFC (Figure 5A, black).”

As the reviewer correctly surmised, our previous interpretation was that this reflected other regions’ influence on response-selective neurons, as this coincided with the largest CPD for chosen response in Figure 1B. However, the reviewer points out that this effect first appears earlier in the trial. We therefore sought to test this hypothesis more explicitly. We repeated the ‘effort minus delay trial’ difference analysis, having first performed a median split of DLPFC neurons into those with high and low response-selectivity (Author response image 1).

Author response image 1
Same analysis as in Figure 5A, having first performed a median split for response-selectivity.

Both neurons with high response selectivity (left panel, n=62 DLPFC units) and neurons with low chosen value selectivity (right panel, n= 62 DLPFC units) show the effect found in Figure 5A.

https://doi.org/10.7554/eLife.11945.024

The effect can be seen in both sets of neurons (including its late timing), suggesting it is a robust finding. However, it is not noticeably stronger in the response-selective neurons, contradicting our original hypothesis. We have therefore removed the about it being linked to the emergence of the response from the paper. We thank the reviewer for provoking this reanalysis of the data.

Instead, we considered whether the effect might be particularly prominent in other specific classes of neuron. We found that when we instead performed a median split of DLPFC cells by chosen value encoding, there was a clear difference in the degree to which neurons showed the between-region effect in Figure 5.

In light of these additional results, we have added this median split by chosen value as a separate panel to Figure 5:

“We also found that when we performed a median split on DLPFC neurons by the degree to which they encoded chosen value in the analysis shown in Figure 1B, neurons with high chosen value selectivity were also those preferentially influenced by other regions’ dynamics (Figure 5B). This was not true, by contrast, when a median split was performed based upon the response selectivity of the DLPFC neuronal population.”

Although this effect is maximal between 600 and 800 ms, there is evidence for it at earlier time-points in the trial (between 400 and 600 ms). This still leaves some ambiguity as to exactly why the effect is strongest towards the end of the decision period (Figure 5A), and why before this time OFC and ACC both have an influence on DLPFC firing (Figure 5—figure supplement 1).

One possible explanation is that if local PC2 weights reflect the speed of OFC and ACC attractor dynamics, then it is only once these regions reach attractor basins that they begin to selectively influence neural activity in DLPFC. There may also be a delay in the time-course of this interaction between regions. We now put this forward as a potential hypothesis in the Discussion, but we make clear that it is a hypothesis requiring further investigation (please see: “Further, future spiking network models of choice involving hierarchical competitions across multiple areas […] the time-course of these inter-regional effects more explicitly.”

3) It is not clear whether the use of PCA (i.e., PC2) is really necessary to derive the main conclusion of this study, because an alternative possibility is that the explanatory power of PC2 might be largely due to the similarity of its temporal profile with that of chosen value signals (shown in Figure 1B). If this is the case, then the fact that PC2 is correlated with the chosen value signal (Figure 2E) is not too surprising.

We disagree with the reviewer’s comment here. The key point about the obtained principal component is that addition or subtraction of a single-trial estimate of PC2 weights tells us about trial-by-trial variation in dynamics. This is understood most clearly in the right-hand panel of Figure 2D, which shows that adding or subtracting PC2 captures variation in the waveform latency. It therefore provides an index of decision speed. The clearest evidence of this is the strong negative correlation with reaction time in the human MEG data, where subjects were free to respond at any point in the trial (Figure 6). Crucially, the measure provides a local measure of decision formation that otherwise we would not have access to, and it does so on a single-trial basis. It is only because the PCA provides access to these single-trial estimates that we are able to deduce that the chosen value signal is related to internal decision speeds. But more importantly, these single-trial estimates are critical for allowing for subsequent analysis of how this relates to other measures, as can performed in the between-region analysis (Figure 5) or the coupling to ongoing phase of an oscillation (Figure 2—figure supplement 3, see response to Reviewer #2 question 2).

Reviewer #2:

1) The paper has a very cogent Introduction pointing out the challenge of reading out static decision variables from dynamic processes. But some of the rhetoric around this topic seems to go beyond what the data support. The authors express skepticism that neural activity represents economic variables (the word "represent" frequently appears in scare-quotes). But the idea that neural signals evolve dynamically isn't incompatible with the idea that they represent economic variables. This paper's conclusions really seem to pertain to how, rather than whether, economic variables are neurally represented. The results suggest that chosen value, for example, is encoded in the LFP amplitude in a specific time window (e.g. around 400 ms where PC2 peaks, indicative of the latency of the underlying waveform), rather than, say, the overall peak of the evoked LFP.

Both reviewers felt that we had overstated the claim that could be made with our results, and we agree with Reviewer #2 that some of our conclusions were overly rhetorical and went beyond what was supported by the data. We have sought to address both Reviewer #1 and Reviewer #2’s concerns by changing the tone and conclusions of the paper throughout.

Please see our response i)-v) above.

2) Trial-by-trial LFP-derived indices are shown to be related to spiking activity, but it's not entirely clear how to relate this finding to what is already known about spike/LFP coupling. Could this effect be accounted for in terms of straightforward first-order correlations between the spiking and LFP time courses? What makes this seem plausible is that the time course of PC2 weights (Figure 2B) looks similar to the time course of PC2 effects on spiking (Figure 3). Spikes and LFPs were recorded from different electrodes to avoid contamination, but it's still possible that a genuine relationship holds between the two (to mention just one previous example, a general negative correlation between multi-unit activity and LFP is shown in Whittingstall & Logothetis, 2009, Figure 1A).

This is an important point. A first concern is that the effects may be driven be underlying correlations between spiking and LFP, similar to the Whittingstall and Logothetis paper. We addressed this by first regressing the LFP amplitude out of the (temporally smoothed) firing rate time-course on each trial. Then we asked whether the effects shown in Figure 3 hold on the residual firing rate, after LFP amplitude had been regressed out.

As can be seen in Figure 3C, the effect does not appear to be removed by accounting for the first-order relationship between spiking and LFP. We have added the following comment to the manuscript:

“Additional analyses confirmed that this relationship could not be explained as a simple consequence of first-order correlations between LFP amplitude and firing rate.”

However, we also suspect that the reviewer was driving at a deeper point, which is that many researchers approach spike/LFP relationships in different ways to those addressed by our paper, and it is important to draw links between this work and our own. For instance, spike-LFP coupling is often considered in terms of the relationship between spike timing relative to the phase of an oscillation. It might therefore be helpful to address how trial-wise variation in PC2 weights correlated with trial-wise variation in the phase and amplitude of underlying oscillations.

To address this, we computed a time-frequency decomposition of each trial’s LFP response. We then correlated the power and phase of these spectrograms with the underlying PC2 weights. In the case of phase, we computed the circular-linear correlation coefficient that tests the relationship between a linear variable (PC2 weights) and a circular one (phase of oscillation) (Berens P, J Stat Soft, 2009). The results of this are shown below. They indicate that, as might be expected from a shift in the latency of an evoked response, PC2 correlates with the phase of low-frequency components of the time-frequency decomposition. There was also an unpredicted relationship between PC2 weights and high beta/low gamma power relatively late into the trial. We have added Figure 2—figure supplement 3 as an additional figure supplement.

We have also added the following to the Results:

“Knowledge about PC2 weights therefore provides a parsimonious description of single-trial ERP latencies, a key feature of the dynamical ERP response. Consistent with this idea, PC2 weights were also found to correlate with the phase of theta frequency (4-–8 Hz) oscillations during the decision period (Figure 2—figure supplement 3).”

To the Discussion, we have added:

“Variation in ERP latency corresponds to a shift in the phase of an oscillation at low frequencies that make up the evoked response.[…]This suggests a way in which future studies may link our measure of LFP latency to spike-LFP coupling (Hunt, Dolan and Behrens, 2014) or inter-regional LFP coherence (Tauste Campo et al., 2015) during cognitive tasks.”

3) I don't follow the logic of the analysis in the subsection “Chosen value correlates as an epiphenomenon of varying decision dynamics”. In a regression model of spike rates, predictors for LFP-derived PC1 and PC2 steal variance away from a "chosen value" predictor but not from other task-related predictors. This is taken to imply that "chosen value" effects are mainly picking up on the latency of the neural response. But this result actually seems consistent with these effects being related to either amplitude (PC1) or latency (PC2). The latency interpretation would be on stronger footing if it could be shown that chosen value effects varied with the addition/removal of PC2 only.

We included both PC1 and PC2 as we found that they both independently could quench some of the variance in the chosen value signal (and combined together explained more). However, we can see that it would be useful to show the effects of PC2 alone. We have now added this as the reviewer suggested as Figure 4—figure supplement 2.

4) I'm concerned about whether it's appropriate to treat LFPs from simultaneously recorded electrodes as independent observations. Panel A of Figure 4—figure supplement 2 shows that LFP features are highly correlated across electrodes, even in different brain regions. This probably isn't a problem for the PCA decomposition, but it does seem like an issue for inferential tests of associations between LFP and task variables. For example, Figure 1D and 1E plot "mean /- s.e. across electrodes"-but since multiple electrodes were recorded per trial, the nominal number of electrodes may be larger than the true number of independent observations. The same concern applies to the results in Figure 2E and related figures. It would be helpful for the paper to clarify how interdependence among electrodes was dealt with.

This is a well thought-out concern, and one that we had not considered in our original statistical inference across recording electrodes. In practice, however, we do not believe it to be of any real practical concern, simply due to the robustness and reproducibility of the results across recording sessions (which are, of course, independent from one another). To give an example, here are the individual electrode Z-statistic time-series for chosen value, which comprise the findings for the DLPFC, OFC and ACC electrodes in Figure 1D and Figure 1—figure supplement 3. Each of the 40 dashed lines represents the start of a different recording session. Hence electrodes between each pair of dashed lines were those recorded simultaneously (Author response image 2).

Author response image 2
Chosen value Z-statistics for individual electrodes across the 40 recording sessions used to create Figure 1D and Figure 1—figure supplement 3.

Each dashed line reflects the beginning of a new recording session.

https://doi.org/10.7554/eLife.11945.025

As can be seen, the effects are very reproducible across different recording sessions. But perhaps more importantly, they are not noticeably more similar/consistent within recording sessions than across recording sessions.

There are, of course, more sophisticated statistical ways of accounting for interdependence between simultaneously recorded electrodes. But such methods are not particularly widely applied in electrophysiology at present (e.g. when collapsing across simultaneously recorded neurons), and a thorough treatment of this topic might warrant a separate paper by itself. For the purposes of the present study, it wouldn’t really impact upon the conclusions we are trying to draw.

[Editors' note: the author responses to the re-review follow.]

1) The authors claim that the PC2 from LFP provides a measure of local dynamics that is more specific (and relevant to unit activity) than larger scale global dynamics. The analysis provided by the authors (effect of local LFP after orthogonalizing on LFP from another region) is encouraging in this regard, however supporting the claim would also require showing that the opposite is not true (i.e. that running the analysis in reverse does not yield the same result). The correlation coefficients in the histograms of Figure 4—figure supplement 3 suggest that this may be the case, although there does not seem to be much of a within region advantage for DLPFC in particular.

The reviewer is correct in this regard, and this point was of real concern for us in designing this local vs. global analysis. Specifically, the principal component estimates will contain observation noise on both local and distal electrodes. Because of this, running an analysis of the effect of local LFP after orthogonalising on LFP from another region may indeed be insufficient to make the claim that local activity is more specific than global dynamics.

However, due to these concerns, the analysis we originally performed in fact includes the opposite analysis, as requested by the reviewer, within it. To generate Figure 4—figure supplement 3, we first test the effect of local LFP after orthogonalising on another region; then we run the opposite analysis; then we subtract the latter from the former. Testing for an effect greater than zero therefore allows us to address whether local dynamics influence neuronal firing more than global firing, and more so than the opposite direction. Unless we’re misunderstanding what the reviewer is asking, we believe that this deals with the concern raised (to run the opposite analysis to the one that we’ve performed would actually just flip the y-axis of Figure 4—figure supplement 3).

We understand why the reviewer may have missed this detail of our analysis: it was buried in the figure legend of Figure 4—figure supplement 3, and not explicitly referred to in the main text. We apologise for this, and have amended the figure legend, main text and Methods as follows to make this more explicit.

Main text:

“We found a smaller but significant reduction could also still be found even if these ‘local’ principal components (i.e. from the same brain region) were orthogonalised with respect to those of another, simultaneously recorded brain region, when compared to performing the same analysis in reverse (Figure 4—figure supplement 3).”

Methods:

“For Figure 4—figure supplement 3, we sought to explore whether local LFP principal components provide greater contributions to the reduction in chosen value variance than those recorded from other areas. […]To provide a fairer comparison, we therefore performed the same analysis in reverse: examining the effect of the distal OFC/ACC PC1/2 weights, orthogonalised with respect to the local DLPFC PC1/2 weights. Figure 4—figure supplement 3B shows the latter analysis (distal orthogonalised with respect to local) subtracted from the former (local orthogonalised with respect to distal).”

Legend to Figure 4—figure supplement 3:

“Local PC1/2 (controlling for larger-scale global influences) reduces chosen value CPD more than global PC1/2 (controlling for local influences).”

Also, as far as I can tell, the authors did not incorporate reaction time, which is typically used as a trial-by-trial indicator of decision dynamics, into these analyses. I suspect that this is because the required delay is sufficiently long to create a floor effect on RTs, however if this is the case it deserves mention in the text. Otherwise the authors should test whether LFP measures relate to the "chosen value" aspects of neural activity after accounting for RT.

As the reviewer suspected, the required delay was indeed sufficiently long to create a floor effect on RTs. We have added a comment to the main text to acknowledge this point:

“Note that reaction time is not included as an additional measure of trial-by-trial decision dynamics in our task, as the imposed 1000 ms choice delay prior to response led to a floor effect in subjects’ response times.”

2) The authors show that PC2 captures variance in neural firing that is typically ascribed to chosen value. I agree with the authors that this is an important point. Yet I found the interpretation of this point to be somewhat overly specific: "We therefore questioned whether chosen value coding might simply be a consequence of the same neural dynamics occurring at varying latencies across trials." This is certainly one interpretation of the authors' results, but not the only one. Another (more standard) interpretation of this finding is that the LFP components mediate the chosen value representations at the level of single units. Given that the LFPs are often interpreted as measurements of local synaptic input, I find this interpretation to be fairly reasonable and consistent with the data provided. The finding that PC1 (which does not contain timing information) also steals variance from chosen value may even provide some specific evidence for this interpretation. As I understand it, the main reason that the authors' interpret the data in terms of dynamics is provided in the final analysis showing that in an attractor network of decision-making the same sorts of signals emerge as a byproduct of competition. I think in order to improve the clarity it would be useful to avoid specific interpretations of causality until the Discussion section where both possibilities should be discussed.

This is a balanced and fair appraisal of our findings. In response to this, and to point 3 below, we have rewritten the Results section on chosen value correlates and their relationship to PC1/2 in a more neutral tone. We agree that it makes sense to move the dynamical interpretation out of the results section as suggested. We have mentioned in the Discussion that there is more than one plausible interpretation of these findings.

In the Results section:

Previous title: “Chosen value correlates as a consequence of varying decision dynamics” has become: “Influence of single-trial LFP components on single unit chosen value correlates”.

Previous text: “We therefore questioned whether chosen value coding might simply be a consequence of the same neural dynamics occurring at varying latencies across trials” has been replaced by: “We sought to explore the relationship between chosen value correlates identified in single unit firing and our single-trial indices of ERP amplitude (PC1) and latency (PC2).”

Previous text: “This implies that chosen value coding is, at least in part, a consequence of the same choice dynamics occurring at different rates on different trials” has been removed.

In response to point 3 below, the joint importance of speed and amplitude has been emphasised: “Similar results could also be obtained […] by examining the contribution of PC1 or PC2 alone”; and “the neuronal variance attributed to chosen value correlates originates as a consequence of the speed and amplitude at which dynamics unfold.”

In the Discussion section:

Previous text: “Crucially, however, a portion of the variance captured by chosen value was then explained away by including the dynamics occurred on that trial as a coregressor, estimated via PCA decomposition of the LFP (Figure 4)” has become: “Moreover, a portion of the variance captured by chosen value was then explained away by including the speed and amplitude of the local LFP response on that trial as a coregressor, estimated via PCA decomposition of the LFP (Figure 4).”

Previous text: “As such, it may be the case that there is coding of chosen value during the decision that is not explained by the dynamics of decision formation” has become: “As such, it may be the case that there is coding of chosen value during the decision that is not explained by the amplitude and speed of the evoked response during decision formation.”

The following text has been added: “A further important caveat is that chosen value correlates may be explained by different mechanisms at different points in the trial. […] Future work may investigate the origin and functional significance of this persistent chosen value coding, perhaps for use at later task stages.”

3) Relatedly, another reviewer noted that the most direct evidence for the notion that chosen-value effects emerge from "the same choice dynamics occurring at different rates on different trials" is derived from the analysis on p. 10, but I'm afraid I'm still not onboard with the logic of this. The authors foreground an analysis showing that the chosen-value effect in spike rates is reduced by jointly including PC1 (indexing LFP amplitude) and PC2 (indexing LFP latency) in the regression model (Figure 4, Figure 4—figure supplement 1, and Figure 4—figure supplement 3). I still don't see how this is germane to their conclusion about latency. The specific effect of the latency-related PC2 is now shown as a supplement, and is considerably weaker albeit nonzero. The original chosen-value effect, which peaked at a% CPD of around 0.9 (Figure 1B), is reduced by up to about 0.1 by both PCs together (Figure 4), or 0.03 by PC2 alone (Figure 4—figure supplement 2). So wouldn't it be at least as accurate to conclude that chosen-value effects emerge from the same neural dynamics occurring at different amplitudes across trials? (The predictions of the attractor network are also unclear in this regard, i.e. in the predictions for combination of PCs, and so it seems important that the authors clarify the predictions of the attractor model.) The same concern applies to the local-versus-distal result in Figure 4—figure supplement 3, which is interpreted solely in terms of neural latency but results for PC2 alone are not shown. It seems to me an easier conclusion to square with the data would be that chosen-value effects originate from a combination of the amplitude and speed of the dynamically unfolding neural response. This may not be so starkly different from existing perspectives, although it's of course still valuable to see the details of how it plays out. The authors could also easily reframe the result that LFP measures partially mediate the chosen value effect and tone down the oppositional framing of decision dynamics versus static chosen value representation. They could use the network model results as a podium to say that chosen value representations need not emerge for the purpose of encoding chosen value, which would be entirely accurate from my standpoint.

Thanks for these comments. In response to them (and also point 2 above), we have sought to make our results section on chosen value correlates and their relationship with PC1/2 much more balanced. We now save the interpretation for the Discussion section (i.e. once the modeling results have been introduced also), and have sought to clarify the contributions of both LFP amplitude (PC1) and latency (PC2) to the results in this section. We have made several textual edits to the Results and Discussion section, described above in response to point 2.

In addition (and in response to point 6 below), we have amended Figure 4—figure supplement 2 to now show the contribution of PC1 as well as PC2 alone, separately across all three regions.

4) A similar issue comes up in the section on neural network modeling, which shows that chosen-value effects in the simulation can be explained away by a principal component capturing variability in neural dynamics, but doesn't state clearly enough that "dynamics" here refers, not just to latency, but to a mixture of amplitude and latency (Figure 7—figure supplement 1, panel C). (Related to this, a sentence the sentence “we obtained principal components that controlled waveform amplitude and latency, as in the ERPs" seems inaccurate, since what was actually extracted was a single component that mixed amplitude and latency, unlike the earlier analysis.)

In the model, there is greater covariation between the amplitude and the latency of the evoked response than in the data. This is likely to be because the inputs to the network are sustained even after an attractor state has been reached. We have amended this section of the Results to reflect these details more accurately, as the reviewer suggests:

Previous text: “We then ran a similar dimensionality reduction on summed network activity, a proxy for the model’s LFP predictions, as we had done on both macaque LFP and human MEG data. […] We then regressed these principal components back onto single unit firing rates, as in Figure 3. We found a similar time-course of influence of the model’s principal components on single unit firing to that found in the data (compare Figure 3C with Figure 7D, cyan).”

Now reads: “We then ran a similar dimensionality reduction on summed network activity, a proxy for the model’s LFP predictions, as we had done on both macaque LFP and human MEG data. […] We then regressed the principal component back onto single unit firing rates, as in Figure 3. We found a similar time-course of influence of the model’s principal component on single unit firing to that found in the data (compare Figure 3C with Figure 7D, cyan).”

5) The authors use the first principal component of the attractor network model decomposition rather than PC2. While I agree with the authors that this component seems to contain some temporal information, it does seem to mainly capture amplitude. Given that the authors interpret PC2 as if it were the derivative of the ERP, it seems that it would be a bit easier to make the connection between computational model and biology if model signals were decomposed using separate components to capture overall amplitude and derivative. This could be done simply by creating weight vectors based on the average model response (ERP) and the derivative of that signal.

Thanks for this suggestion. We prefer to keep the PCA rather than introduce a different way to perform the analysis. Despite the differences in the components obtained (now discussed in response to point 4), we think it aids clarity to retain a direct relationship between our approaches to analyzing both data and the model. In the model, there is greater covariation between the amplitude and the latency of the evoked response than in the data. This is likely to be because the inputs to the network are sustained even after an attractor state has been reached.

However, the reviewer’s suggestion of using the shape of the waveform and its derivative as basis functions is potentially a helpful one for future studies. We’ve added a comment to the Discussion to highlight this idea.

In the Discussion, we now state:

“PCA is one of many possible approaches to obtaining a useful set of temporal basis functions to describe variation in ERP waveforms, and it may be improved upon by future investigations. […] This feature of the data is identified automatically using PCA.”

6) The authors note that PC1 relates primarily to value sum, however this does not seem to be the case in ACC, where PC1 takes positive coefficients for chosen value and negative coefficients for unchosen value. This makes me wonder whether the added contributions of PC1 in explaining single unit chosen value effects relate primarily to the inclusion of PC1 from ACC electrodes. In my opinion, such a finding would not necessarily detract from the dynamics idea, as overall ACC signal amplitude may play a role in adjusting decision threshold, which directly affects dynamics. Either way, the authors should make the heterogeneity across recording locations clear in this regard.

We have added a comment to the Results section to make clear that in ACC, PC1 has an opposite sign for chosen and unchosen value:

“PC1 weights, capturing response amplitude rather than latency, were primarily influenced by value sum (same sign for both chosen and unchosen value, with the exception of ACC)[…].”

To make clearer how the contributions PC1 and PC2 in explaining chosen value effects vary across regions, we have now extended Figure 4—figure supplement 2 to include all three regions, subdivided into PC1 and PC2 reduction in explained variance. (As noted above in response to points 2 and 3, we have also modified the main text in this section to more faithfully reflect the joint contributions of PC1 and PC2 to reducing chosen value CPD).

In ACC, the contribution of PC1 appears particularly strong, a point that we now note in the figure legend. We therefore also repeated the inter-regional analysis (Figure 5) using PC1 instead of PC2. We found that PC1 did not have any differential influence on effort vs. delay trials.

7) The section on inter-region correlations (“OFC and ACC dynamics have distinct influences on delay- and effort-based decisions respectively”) is quite specific about causal directionality, concluding that DLPFC activity is influenced by OFC and ACC. It isn't apparent to me that there's any support for this. (A third-variable explanation would be a plausible alternative.)

This section of the manuscript was particularly hypothesis-driven, given our knowledge about the dissociable effects of lesions to ACC and OFC (e.g. Rudebeck et al., 2006) and imaging activations (e.g. Prevost et al., 2010) in delay- and effort-based decisions. It is for this reason that we were adopted a position about the direction in which we believed that causality might arise. However, to test this more explicitly, we also performed the same analyses in reverse (using DLPFC/OFC to explain ACC firing, and DLPFC/ACC to explain OFC firing). As shown below (Author response image 3), neither of these analyses produced significant effects.

Author response image 3
The same analysis as in main Figure 5, repeated with ACC firing being the dependent variable (left) and OFC firing being the dependent variable (right).
https://doi.org/10.7554/eLife.11945.026

We now note these two null results in the main text:

“Both areas were found to influence DLPFC activity (Figure 5—figure supplement 1), but strikingly, ACC explained more variance in DLPFC firing on effort trials than delay trials (Figure 5A, magenta), whereas the converse was true for OFC (Figure 5A, black). This was not found to be true for analyses performed in the reverse direction (using DLPFC/ACC PC2 weights to explain OFC firing, or DLPFC/OFC PC2 weights to explain ACC firing).”

However, we agree with the reviewer that there is always the possibility of a third-variable explanation whenever considering functional connectivity analyses between two regions. We have amended the Discussion to acknowledge this point:

“We note, however, that our analysis does not test the possibility is that a third (unobserved) variable could jointly affect both regions, doing so differentially on effort vs. delay trials.”

https://doi.org/10.7554/eLife.11945.028

Article and author information

Author details

  1. Laurence T Hunt

    1. Sobell Department of Motor Neuroscience, University College London, London, United Kingdom
    2. Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
    Contribution
    LTH, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    laurence.hunt@ucl.ac.uk
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8393-8533
  2. Timothy EJ Behrens

    1. Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
    2. Oxford Centre for Functional MRI of the Brain, Nuffield Department of Clinical Neuroscience, Oxford University, John Radcliffe Hospital, Oxford, United Kingdom
    Contribution
    TEJB, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    TEJB: Senior editor, eLife.
  3. Takayuki Hosokawa

    1. Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, United States
    2. Department of Psychology, University of California, Berkeley, Berkeley, United States
    3. Laboratory of Systems Neuroscience, Graduate School of Life Sciences, Tohoku University, Sendai, Japan
    Contribution
    TH, Acquisition of data, Drafting or revising the article
    Competing interests
    No competing interests declared.
  4. Jonathan D Wallis

    1. Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, United States
    2. Department of Psychology, University of California, Berkeley, Berkeley, United States
    Contribution
    JDW, Conception and design, Drafting or revising the article
    Competing interests
    No competing interests declared.
  5. Steven W Kennerley

    1. Sobell Department of Motor Neuroscience, University College London, London, United Kingdom
    2. Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, United States
    3. Department of Psychology, University of California, Berkeley, Berkeley, United States
    Contribution
    SWK, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    No competing interests declared.

Funding

Wellcome Trust (098830/Z/12/Z)

  • Laurence T Hunt

National Institute of Mental Health (R01-MH097990)

  • Jonathan D Wallis

National Institute on Drug Abuse (R21-DA035209)

  • Jonathan D Wallis

James S. McDonnell Foundation (JSMF220020372)

  • Timothy EJ Behrens

Wellcome Trust (WT104765MA)

  • Timothy EJ Behrens

Wellcome Trust (096689/Z/11/Z)

  • Steven W Kennerley

National Institute of Mental Health (F32MH081521)

  • Steven W Kennerley

Wellcome Trust (WT088312MA)

  • Timothy EJ Behrens

Wellcome Trust (WT080540MA)

  • Laurence T Hunt

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank H Barron, P Dayan, R Dolan, K Harris, M Rushworth, P Smittenaar and M Woolrich for comments on an earlier draft of this manuscript.

Ethics

Human subjects: All human subjects provided informed consent, including consent to publish. Ethical approval for this study was obtained from NHS Oxfordshire Research Ethics Committee C, approval reference 08/H0606/46.

Animal experimentation: Ethical approval was obtained for this study. All procedures were in accord with the National Institute of Health guidelines (Assurance Number A3084-01) and the recommendations of the U.C. Berkeley Animal Care and Use Committee (Protocol Number R283).

Reviewing Editor

  1. Michael J Frank, Brown University, United States

Publication history

  1. Received: September 29, 2015
  2. Accepted: November 18, 2015
  3. Accepted Manuscript published: December 11, 2015 (version 1)
  4. Version of Record published: January 11, 2016 (version 2)

Copyright

© 2015, Hunt et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,631
    Page views
  • 902
    Downloads
  • 20
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Developmental Biology
    2. Neuroscience
    Luis Sánchez-Guardado, Carlos Lois
    Short Report Updated
    1. Neuroscience
    William Heffley, Court Hull
    Research Article