1. Neuroscience
Download icon

Breaking down hierarchies of decision-making in primates

  1. Alexandre Hyafil  Is a corresponding author
  2. Rubén Moreno-Bote
  1. CBC, DTIC, Universitat Pompeu Fabra, Spain
  2. Universitat Pompeu Fabra, Spain
Short Report
  • Cited 2
  • Views 1,038
  • Annotations
Cite this article as: eLife 2017;6:e16650 doi: 10.7554/eLife.16650

Abstract

Possible options in a decision often organize as a hierarchy of subdecisions. A recent study concluded that perceptual processes in primates mimic this hierarchical structure and perform subdecisions in parallel. We argue that a flat model that directly selects between final choices accounts more parsimoniously for the reported behavioral and neural data. Critically, a flat model is characterized by decision signals integrating evidence at different hierarchical levels, in agreement with neural recordings showing this integration in localized neural populations. Our results point to the role of experience for building integrated perceptual categories where sensory evidence is merged prior to decision.

https://doi.org/10.7554/eLife.16650.001

eLife digest

Should you go for coffee with Jules, or go to the movie theater with Jim? Both options require you to make additional decisions, for example, which café would you go to, or what movie could you see? Many of our day-to-day decisions have multiple layers of sub-decisions embedded within them that are not necessarily independent. Our opinions of the cafés in town and the movies showing at the theater may influence our decision over whom to spend the afternoon with.

In 2015, researchers at the Netherlands Institute for Neuroscience performed experiments in macaques to try to work out how the brain makes these decisions. The monkeys learned to choose between two visual stimuli (decision 1). The outcome of decision 1 determined whether the animals then had to make decision 2 or decision 3. The results suggested that the monkeys initially made all three comparisons independently and in parallel, before combining the evidence to select their overall strategy. This process is referred to as hierarchical decision-making. In the analogy above, one would compare the relative merits of Jules versus Jim, café A versus café B, and a horror movie versus a comedy at the same time before deciding what to do.

Hyafil and Moreno Bote have now reanalyzed the data published in 2015 using new computer simulations. This second analysis suggests the results are in fact more consistent with an alternative model of decision-making called a flat model, in which the brain compares all of the final options simultaneously (Jules + café A; Jules + café B; Jim + horror movie; Jim + comedy) before choosing between them.

Making decisions by comparing the final outcomes becomes easier as the brain learns through experience to associate stimuli that often occur together. Hyafil and Moreno Bote hypothesize that in response to a new situation, the brain may sometimes start off by using hierarchical decision-making before switching to a more accurate flat model as experience allows.

In response to the findings of Hyafil and Moreno Bote, the researchers who conducted the work reported in 2015 have also reanalyzed the original data, and carried out a new experiment in human volunteers. They argue that the flat model provides a poor fit to the original data and struggles to explain the new data. Future studies can build on these conflicting findings by further exploring the limits of parallel decision-making, which may help us to understand how the brain is able to make multiple decisions while keeping the future consequences in mind.

https://doi.org/10.7554/eLife.16650.002

Introduction

Coffee with Jules (if so, which cafe?) or cinema with Jim (if so, what movie?) ? A recent study by Lorteije and colleagues investigated how perceptual mechanisms implement such hierarchically-structured decisions that fill up our daily lives (Lorteije et al., 2015). Monkeys performed saccades to one of 4 possible targets based on the information provided at the primary branching and at secondary branching points (the 'correct' and 'incorrect' branching points) leading to the targets (Figure 1A). Patterns of responses convincingly indicated that monkeys can integrate information from all branching points in parallel. But does the decision space mimic the hierarchical organization, with parallel decisions going on (about Jules/Jim, coffee place, movie) (Figure 1B), or does it directly compare final options (with Jules at Moe's vs. Sicario with Jim vs. etc.)(Figure 1C)? Lorteije and colleagues report two behavioral and two neural effects that they argue speak unanimously in favor of the former hierarchical model against the latter non-hierarchical ‘flat model’ of decision-making. We show in contrast that all four effects can equally (and more parsimoniously) be explained by the flat model of decision-making, which is also perfectly compatible with the new effects from the same dataset described in the companion paper by Zylberberg and colleagues (Zylberberg et al., 2017).

Hierarchical vs. flat models of perceptual decision during a hierarchically-structured visual task.

(A) Structure of the task. At each trial monkeys must detect the correct option out of four possible responses based on the visual information provided at the primary branching point L1 and at two secondary branching points L2 and L2’. Visual information consists of segments of flickering luminosity at the start of each branch (color segments in our depiction; visual samples changing every 50 ms for a total period of 1000 ms). Animals must make a saccade towards the final point that passes through branches of maximal luminosity; that is, must decide based on information scattered across the visual field. Each of the four responses is categorized as TT, TD, DT or DD depending on whether it corresponds to a target T (correct branch) or distractor D (incorrect branch) at first and second branching points. Detailed description of the task can be found in [Lorteije et al., 2015]. (B) Hierarchical decision model of perception. In the hierarchical model, parallel decision processes run at each branching point (L1, L2 and L2’) and are integrated into a motor response at a later stage. It can be implemented as a race model composed of three races, each with two possible sub-choices. (C) Flat decision model of perception. In the flat model, the decision space is composed of the four possible final responses, so for each response the animal must sum the information provided at the corresponding primary and secondary branches (here depicted by the sum of the two luminosity signals). (D) Implementation of the flat decision model. (Left panel) Four units coding for the four possible responses integrate information from both L1 and L2 branches leading to that response, as represented by the pattern of connections from sensory units (coding for instantaneous sensory value). Connectivity include self-excitation as well as homogeneous cross-inhibition between all units. Black and red arrows indicate respectively excitatory and inhibitory connections. (Right panel) Simulation for one trial, depicting the activity of each unit across time during perceptual integration. Activity is bounded to positive values (rectification). When activity of one unit reaches the decision threshold, the related response is selected.

https://doi.org/10.7554/eLife.16650.003

Results

First, stimulus difficulty at the primary branch (L1) was shown to have no influence on the performance at the secondary branch (L2), a result that was replicated by a race-model implementation of hierarchical decision-making, but not of the flat decision-making. However, this result seems a particular feature of their specific choice of implementation of the race model (Vickers, 1979; Drugowitsch et al., 2014), which included neither inhibition between option representations, nor activation rectification (i.e. enforcing non-negative unit activation), two classical ingredients of race models (Usher and McClelland, 2001; Tsetsos et al., 2012; Churchland and Ditterich, 2012). Inhibition between option representations and rectification may underlie the well-known reduction of choice-related neural activity when more choice alternatives are provided (Churchland and Ditterich, 2012). We simulated a race model implementation of the flat model with both rectified activation, cross-inhibition and self-excitation, whereby each of the four options competed during accumulation of evidence (Figure 1D, see Material and methods). Self-excitation models the instability present at the initial point of the race in attractor models of decision-making (Roxin and Ledberg, 2008) and can be at the heart of well-known urgency signals to speed up decisions (Drugowitsch et al., 2012). Noise in the model did not scale with stimulus intensity, in line with recent results suggesting that noise in perceptual accumulation tasks is associated with the accumulation rather than with the sensory process (Drugowitsch et al., 2016), and as it is typically assumed in drift-diffusion models of decision making (Gold and Shadlen, 2007). We used a non-absorbing decision threshold (i.e. activity after the decision as not bounded), but found the same results for simulations with an absorbing bound. Parameters were tuned to reproduce qualitatively the proportion for each of the four possible response types, as well as the impact of each sample in the stream and the psychometric curve for both L1 and L2 decision (Figure 2—figure supplement 2A–D). The impact of L1 difficulty onto L2 for the flat model indeed disappeared when we used this common implementation of the race model (Figure 2—figure supplement 2E).

In the companion paper (Zylberberg et al., 2017), Zylberberg and colleagues question the generality of this finding, arguing that it may not be compatible with the pattern of short reaction times they provide in a new analysis for the same monkey dataset. However, our mathematical analysis (Appendix 1) shows that a clear influence of L1 difficulty onto L2 performance emerges principally when L1 difficulty strongly modulates reaction times, and thus the time of integration of L2 evidence. Short reaction times with low modulation by task difficulty as described in the companion article are thus perfectly compatible with lack of influence of L1 difficulty onto L2 performance. This result was indeed reproduced in a simulation where the threshold was lowered to produce shorter reaction times, and the boundary was eliminated for the first 10 samples (Figure 2A–F). This choice of the threshold implements a form of time-dependent, collapsing bound (Churchland and Ditterich, 2012; Drugowitsch et al., 2012) and it is consistent with the minimum viewing time imposed on monkeys (Figure 2). Without such time-dependent bound, the model can produce either a large proportion of premature responses or very large mean reaction times. The discrepancy between results from these simulations with those performed in the companion paper (who do find dependency of L2 performance on L1 difficulty with short reaction times) may emerge from early responses (<500 ms) in easy trials that are present in their simulations, but prohibited in our simulations by the absence of decision boundary for the first 10 samples.

Figure 2 with 2 supplements see all
Behavioral properties of the flat race model reproduce original monkey data.

The figures presented here correspond to a parameter set adjusted to reproduce reaction times (see Introduction). (A) Response type for the flat model. We simulated a simple race model with four possible options that compete for selection, based on a standard implementation of race models [Drugowitsch et al., 2014]. The histogram shows the distribution of each of the four response types (TT, TD, DT, DD) for each level of trial difficulty from simulations of the race implementation of the flat model. (B) Influence of luminance fluctuations on the L1 decisions. Weights for each of the target (orange) and distractor (blue) samples across a trial for L1 decisions, as measured with logistic regression. The model reproduces the primacy effect observed in monkey behavioural data, whereby earlier samples have larger influence on L1 choices than later samples. Shades indicating 95% confidence intervals (barely visible) are nearly collapsed to the main line. Weights have been normalized. (C) Influence of luminance fluctuations on the L2 decisions. Legend same as B. (D) Psychometric curve. The curve represents the probability that the right segment was selected at the first (L1, red curve) and second (L2, light blue curve) level depending on the strength of the evidence in favor or right vs. left path at the corresponding branching point. Steeper curve indicates better performance at the primary than secondary branches. (E) Influence of L1 difficulty onto L2 performance. Psychometric curve for L2 decision was computed separately for easy and difficult L1 trials. Unlike the flat model implemented in the original study, we find no difference in L2 performance (permutation test; p>0.4), in accordance with observed animal behavior. The null effect emerges because the minimum accumulation time before reaching a decision impedes early response even in the presence of strong evidence in L1 (see Supp. Material). When inhibition was removed, a significance interaction was recovered (Figure 2—figure supplement 1A). (F) Mean reaction times. Reaction times for the 3 types of difficulty level (E: easy. I: intermediate, D:difficult) reproduce those reported in the companion paper by Zylberberg and colleagues (Zylberberg et al., 2017). (G) Influence of L2 and L2’ difficulty onto L1 choice. Psychometric curve is plotted separately for trials where decision is easier at the left than right secondary branch (green curve), and where decision is easier at right than left secondary branch (red curve). Inset represents the distribution of difference in evidence between the two branches. The effect emerges because strong evidence at one secondary branch will bias the race towards selecting the corresponding final option, thus appearing as a bias in L1 decision towards selecting the primary branch leading to this secondary branch. Information provided at secondary branches biases choice at L1 towards selecting the branch that leads to easier secondary branch, as observed in animal behavior. Lower panel represents the difference between the two psychometric curves. (Right Panel): Influence of difficulty of L2 on the L1 decision across time, as estimated from the logistic regression. Grey shades indicate 95% confidence intervals. L1 choice is affected by difference in perceptual difficulty between L2 and L2’ branches in early visual samples.

https://doi.org/10.7554/eLife.16650.004

The second interesting observation from monkey behavior reported by Lorteije and colleagues was that L1 decisions were biased towards the branch that leads to the easier L2 decision. The effect could only be explained in the hierarchical model by invoking an extra modulatory signal passed on from L2 to L1 that must be carefully tuned (and was not implemented in the race model of [Lorteije et al., 2015]). By contrast, such effect is readily accounted for by the flat model, as a strong signal at a secondary branch will boost the chances of selecting the corresponding final option, and thus bias towards selecting the L1 path leading to this option. Indeed, the effect was reproduced in the simulations (Figure 2G). This relates to one important advantage of the flat model over the hierarchical one: the flat model, by integrating the strength of evidence from each level, takes optimally into account uncertainty in each of the decisions, leading to more accurate responses. By contrast, the hierarchical model integrates decisions at each level independently of the level of evidence in support of each decision. This can only be amended by ad hoc mechanisms such as that proposed by Lorteije and colleagues, that probably do not scale up adequately when more than two options are available at each level.

At the physiological level, neural groups in visual cortex integrated information conveyed at the primary and secondary branching points leading to the path in their receptive field. Selection signals were shown to first differentiate options at the level of secondary branches irrespective of whether the branch is the correct or incorrect one (L2 or L2' branching), and subsequently grow larger for the L2 than for L2' branching. In the hierarchical model, this can only be explained by referring to a modulating signal once decision is reached in L1 that differentially modulates the L2 race signals corresponding to the selected and non-selected L1 branches. In the flat model, such dynamics emerges naturally with activation rectification, because activation corresponding to the two incorrect options in the incorrect L1 branch vanish as L1 signals favor the two alternative options, and so their difference is also reduced (Figure 3A). Indeed, activation rectification plays a role in the dynamics of recorded neural responses, that reach a floor value (0) in the final part of the integration period in non-selected branches (Figure 4 of [Lorteije et al., 2015]). Finally, an interaction between L2 and L2' selection signals was reported: clear evidence in favor of one of the two options in L2 reduced the selection signal in L2', and vice versa. In the hierarchical model, this would require complex modulatory signals passing on from L2 to L1 to L2'. By contrast, both effects are observed in the flat model (Figure 3B): strong evidence for one option will decrease the activation of all other three options through inhibition, increase their chance of collapsing at the zero boundary, and thus reduce the selection signal in the opposite secondary branch.

Signal properties of the flat race model reproduce original monkey neural data.

(A) Amplification of L2 signals by L1 choice. Decision signals for L2 (resp. L2’), depicted here in cyan (resp. dark blue) curves, correspond to the difference between the activation of the units coding for the target and distractor in that branch, i.e. TT-TD (resp. DT-DD). Lower panel represents the difference between the two selection signals, i.e. (TT-TD)-(DT-DD). While the two decision signals first increase in parallel, the selection signal rapidly turns much larger for the correct than for the incorrect branches, due to the inhibition between activation of options across different branches. Dynamics closely resembles that observed in neural multi-unit activity in V4. S.e.m. are too small to be visualized. (B) Cross-talk between L2 and L2’ selection signals. Selection signal for L2 was reduced when the first visual samples in L2’ provide clear information in favour of one option (left panel, blue curve for lowerL2' evidence, green curve for higher L2' evidence). The effect readily emerges in the flat model because of inhibition between options across the different branches. The converse effect of L2 difficulty impacting L2’ selection signal was also observed (right panel), and both effects reproduce observation from monkey visual cortex. S.e.m. are too small to be visualized. The horizontal bar indicates samples with significant activity difference (t-test, p<0.05). (C) Selection signals integrate L1 level of evidence. Unit activity is modulated by evidence at both L1 and L2 levels. Mean activity in each four units (classified as TT, TD, DT, DD) for 4 quartiles of strength of L1 evidence. TT and TD units show positive modulations with L1 evidence, while DT and DD units display negative modulations, consistent with the fact that larger L1 evidence biases competition towards the two options in the correct L1 branch. Right panel displays absolute value of Spearman’s rho correlation between strength of evidence and mean activity for all four branches.

https://doi.org/10.7554/eLife.16650.007

Overall we show that all four observations that were taken as evidence in favor of the hierarchical model could be accounted for with a standard race-model implementation of the flat model. These observations were robust and did not rely on fine tuning of the parameters. Table 1 summarizes how each of these observed properties depend on the features of the model. Note that all these features are classical constituents of race models and not ad hoc mechanisms. Overall the flat model provides a more parsimonious explanation of the data, as it does not require appending modulatory signals between parallel decisions as the hierarchical model does.

Table 1

Each of reported behavioral and neural effects and the associated features from the flat decision model required to display such effects.

https://doi.org/10.7554/eLife.16650.008
EffectRequired model features
Independence of L2 from L1 difficultySignal-independent noise, low dependence of RTs on L1 difficulty
Bias of L1 choice by L2 difficultynone
Amplification of L2 signals by L1 choiceActivation rectification
Cross-talk between L2 and L2' signalsActivation rectification

These behavioral and neural effects reported in the original study provide however no univocal evidence in favor of either model. One fundamental difference between the two is indeed that selection signals in the flat model mix evidence from both L1 and L2 branches, while the hierarchical model predicts unmixed selection signals (the influence across branches can only occur posterior to decision). [At this point, an important distinction has to be made in the flat model between localized activations, which indeed mix evidence from both branches, and selection signals at L2, extracted by looking at the difference between two units in the same L1 branch, and which as shown above are largely insensitive to L1 signals]. In the companion paper, Zylberberg et al. now provide data showing that selection signals in at least 3 out of 4 L2 branches mixes evidence from both L1 and L2 branches (their Figure 4). We believe this observation is most compatible with the flat model and by itself rules out the hierarchical model that relies on complete neural segregation of integration of L1 and L2 evidence (although the possibility remains that these level-mixing integrative neural signals are completely non-causal in monkey decisions). Our simulations reproduce these effects: signals in TT is larger for stronger L1 evidence, while signals in DT and DD are weaker (p<10−9, Figure 3C; see Also Appendix 2). As pointed out by Zylberberg and colleagues, our simulations also display a positive modulation in TD signals, unlike the null effect found in monkey data. However, despite a single source of inhibition, the modulation was not equally strong in all four branches: indeed, it was by far the weakest precisely in the TD branch (Figure 3C). This weaker effect may have explained the lack of significance in monkey data. Moreover, modulation in TD could be reduced or abolished if the flat model implied stronger inhibition between options related to the same L1 branch (TT-TD and DT-DD) than between options related to distinct L1 branches (e.g. TT-DT, TT-DD). Stronger inhibition between local circuits is indeed a general pattern of cortical connectivity (Douglas and Martin, 2004; Lund et al., 2003).

Zylberberg and colleagues produced a last analysis of the original dataset, showing that in the flat model errors in L1 are associated with higher sensitivity at L2, whereas monkeys display no such effect. While we acknowledge that this result challenges the current implementation of the flat model, it is at this point equally unknown whether the hierarchical model proposed by the authors could avoid this feature (the reward maximization introduced in the hierarchical model to account for the influence of L2 bias onto L1 choice may produce the same interaction).

Discussion

It is unclear whether selection signals recorded in visual cortex are generated locally or reflect feedback process from higher regions where integration of evidence takes place. The flat model is arguably more consistent with the latter hypothesis, as it implies integration from and inhibition across distant locations in visual field, which are more typically associated with higher cortical regions (Wimmer et al., 2015). In any case, these selection signals certainly represent neural markers for a specific integration process along one subbranch.

One important limitation of flat models is they require to learn high-order representations of the environment that can appropriately integrate evidence from all relevant sensory sources, i.e. the structure of connectivity described in Figure 1D may take time to be acquired (Garrard et al., 1997; McClelland and Rumelhart, 1985). Here, monkeys performed ~60,000 trials each, possibly long enough to learn appropriate global representations linking all visual cues relevant to each response. When such time is not afforded, the system may rely on less efficient strategies such as a hierarchical model. Indeed, the comparison of flat and hierarchical models in this visual task sheds new light to the ancient debate of whether information from distinct sources is integrated before or after the decision stage, analogous respectively to the flat and hierarchical strategies. In multimodal integration such as object localization or motion detection, different modalities provide complementary cues about common objects or features of the environment (object motion, phoneme identity, etc.), so over the lifetime the brain can learn the appropriate crossmodal representations and integrate bimodal information directly over those representations. Indeed, in such context both sources of information are merged prior to decision (Körding and Wolpert, 2004). By contrast, when subjects must detect the presence of either a visual or an auditory cue, requires to mapping arbitrarily two distinct unimodal signals into a single response, the relevant crossmodal representations are not formed, and therefore integration could by default only be formed following unimodal decisions, as it has been found experimentally (Otto and Mamassian, 2012). One important hypothesis we make is that the level of integration could strongly depend on experience, gradually switching from post-decision (i.e. hierarchical) to pre-decision (i.e. flat) integration.

Finally, an important distinction has to be made between the hierarchical vs. flat nature of the integration process, and the serial vs. parallel nature of the integration process. While the former has to deal with the level and the structure of the representations at which integration takes place before reaching a decision, the latter corresponds to whether evidence from different sources can be integrated at the same time in these representations. The flat model is perfectly compatible with serial integration of evidence, for example if attentional constraints limit the capacity to process evidence from distant visual locations, such as when subjects must integrate from three distinct levels (Zylberberg et al., 2012).

In summary, despite premature conclusions, we think that the experimental framework developed by Lorteije and colleagues opens a new promising venue for understanding the level at which evidence is accumulated and individual perceptual decisions are taken in ecological settings.

Methods

We present here the equations governing the flat race model of decision-making used for simulations. Our implementation of the flat model departs from the one in [Lorteije et al., 2015] in the following aspects, which are typically subsumed in many standard implementations: (1) added noise is constant instead of being proportional to signal strength, (2) there is a positive loop (i.e. negative leak) in the integration process and mutual inhibition between competitors and (3) activities cannot be negative (i.e. activation rectification). The variable xti represents the activity of the activation unit i (from 1 to 4) at sample t, and evolves according to:

Δxti=I0+αxt1iβjixt1j+Iti+N(0,σ)

where I0 is the constant input, α is the auto-excitation term (negative if leaking, positive if positive loop providing bistability), β is the inhibition strength, Iti is the sensory evidence at sample t, and the last term represents white noise of variance σ. Inhibition is homogeneous and all-to-all between activations units, that is, each unit inhibits equally the other unit from the same branch and the two units from the alternative branch. Sensory evidence integrates information from the primary and secondary branching points:

Iti=k1ati+k2bti

k1 and k2 represent the sensitivity to evidence at level 1 and 2 respectively, ati represents the information provided at level 1 in favor of the corresponding branch (i.e. the difference between instantaneous luminosity in that branch and luminosity in the alternative branch at sample t), bti represents the information provided at level 2 in favor of the corresponding branch.

We enforce non-negative values for unit activity by using rectification:

xti=max(xt1i+Δxti, 0)

All units are initiated from the same starting point x0i=0.5, and the decision is reached whenever any of the units reaches threshold K (Figure 1C). If no decision is reached after presentation of all samples, the response corresponding to the unit with highest final activity is selected. Parameters are: input strength at primary branching point k1 = 0.016, at secondary branching points k2 = 0.012, auto-excitation α =0.1, inhibition β = 0.07, constant input I0= 0.5, decision threshold = 50, initial values x0i= 0.5, white noise variance = 1. We chose parameters to reproduce qualitatively the monkey behavioral and neuronal data. We simulated 100,000 trials, which is the same order of magnitude as used in the original experiment. All analyses performed on simulated data strictly reproduced those realized in the original experiment. Results from these analyses are presented in Figure 2—figure supplement 2.

In the simulation set of Figure 2, designed to reproduce the pattern of reaction times described in the companion paper, we lowered the decision threshold to 30 to produce shorter reaction times. We introduced a minimum integration time of 10 samples (i.e. no boundary for the first 10 samples), consistent with the minimal time of 500 ms after stimulus onset monkeys had to wait before performing a responses saccade. We also changed inhibition to 0.05 and auto-excitation to 0.12. All other parameters were unchanged.

A matlab script for all simulations and analyses is available at http://bit.ly/hyafilmoreno2017.

Appendix 1

Dependence of L2 performance on L1

Lorteije and colleagues defend that a flat model would necessarily yield a dependence of L2 performance on L1 difficulty. In the flat race model described above, the selection signals at branches L2 (st2=xt1xt2) and at L2' (st2'=xt3xt4) evolve according to:

Δsti=(βα)sti+ Jti+N(0,2σ),

with Jt2=It1It2=2k2bt2 and Jt2'=It3It4=2k2bt2'

The impact of each sample onto L2 selection signal can be directly derived as

sti=2k2u=1t(1+βα)tubui+N(0,γtσ),

where γt=21(1+βα)tαβ

In other words the selection signals at levels 2 are only influenced by information provided at level two branches bti, and not by information provided at level 1 ati. It is thus expected as a general result that L2 selection does not depend on L1 difficulty nor any other L1-related variable, contrary to the predictions of Lorteije and colleagues. We detail below four factors that could in principle go against this conclusion. We found however than in practice none induced a significant bias in our simulations:

  1. the above formula is only valid when activation remain above the non-negative boundary. Nevertheless we observed in our simulations that the independence of L2 performance on L1 difficulty still held, even when the zero boundary played an active role (as is the case for the chosen parameter set).

  2. the decision boundary biases towards earlier choices when L1 task is easy compared to when L1 task is hard. In turn, this would mean lower average integration time for L2 choice and thus a decrement of L2 performance. This is exactly what is observed in the original simulation of the flat race model by Lorteije and colleagues. Evidence for a relatively small decision boundary was taken from the primacy effect, whereby earlier samples had larger impacts onto the decision than later samples in the trial (their Figure 2E). However, such primacy effect can alternatively emerge from inhibition, as soon as it outweighs leak: indeed, the weight of each sample (1+βα)tu decreases when sample position u increases (Suppl. Figure 1A). We can thus replicate primacy effect and independence of L2 on L1 difficulty by implementing a relatively high decision threshold, consistent with findings from another fixed-duration perceptual accumulation task (Brunton et al., 2013). Indeed, very similar results were observed when we used an infinite threshold.

  3. whether L2 decision corresponds to selection signals at L2 or L2' depends ultimately on which unit finally wins the race. Thus, while both L2 decision are independent of L1, the observed L2 selection signal can be influenced by L1. In practice, we found this had no impact for the parameters reported above and in a large part of the parameter space.

  4. when additive noise in the units is stimulus-dependent, i.e. if it grows larger when the intensity of the stimulus is larger (our simulations used stimulus-independent noise). In such case, since easier L1 trials correspond to stronger L1 signals, then easier L1 trials also induce larger noise added to the integration process. This could thus deteriorate L2 selection process.

Appendix 2

Dependence of selection signals on L1 evidence

When the zero bound for unit activity is hypothesized to play no role, activity for each of the 4 units (TT, TD, DT, DD) at each time step can be decomposed as the sum of a term relative to integration of L1 information, a term relative to integration of L2 information, and a noise term:

xtTT=st1+st2+N(0,γtσ)
xtTD=st1st2+N(0,γtσ)
xtDT=st1+st2'+N(0,γtσ)
xtDD=st1st2'+N(0,γtσ)

The first term in each equation induces a positive correlation between evidence at L1 and activity at TT and TD units, and a negative one between evidence at L1 and activity at DT and DD units. Furthermore, the design of the experiment with three different difficulty levels for trials creates correlations between evidence at the different levels, as each difficulty level d associated with a given mean μd and standard deviation ζd for the intensity of the individual samples. The correlation across trials between sensory evidence at two different levels (L1 and L2/L2’) and at two different sample positions (t and t’) is:

ρ(ati,bt'i)= Var(μd)[Var(μd)+<ζd>]

where Var and <.> represent the variance and mean over the different difficulty levels, respectively. Such correlation between samples increases the correlation between TT activity and L1 evidence, but decreases the correlation between TD and L1 evidence, hence the difference observed in Figure 3C (in the other branch, the amplitude of the negative correlation with L1 evidence is enhanced for DD units and decreased for DT units). The other source of heterogeneity of the impact of L1 evidence is the rectification, which impacts more the activity of DD, DT and TD units. Overall, the flat model predicts different levels of modulation of unit activity by evidence in L1 for the four different branches, and an especially low (positive) correlation for the case of the TD unit.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
    Disorders of semantic memory
    1. P Garrard
    2. R Perry
    3. JR Hodges
    (1997)
    Journal of Neurology, Neurosurgery & Psychiatry 62:431–435.
    https://doi.org/10.1136/jnnp.62.5.431
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
    Decision Processes in Visual Perception
    1. D Vickers
    (1979)
    Academic Press.
  18. 18
  19. 19
  20. 20

Decision letter

  1. Joshua I Gold
    Reviewing Editor; University of Pennsylvania, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Breaking down hierarchies in perceptual decision-making" for consideration by eLife. Your article has been favorably evaluated by Timothy Behrens (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

These two papers present an enlightened and useful discussion about the interpretation of results previously published by Lorteije and colleagues. In that prior study, monkeys performed a task that required them to make a saccadic eye movement to the appropriate endpoint of a visual branching pattern with three bifurcations: one at the top, then two others under each branch, resulting in four distinct endpoints. Each bifurcation had a modulating luminance cue at each branch that determined the correct path: always choose the brighter (on average) branch. Through analyses of behavior and the activity of neurons in cortical areas V1 and V4 whose receptive fields corresponded to the locations of the luminance cues, a primary conclusion from that study was that the monkeys solved the task in a hierarchical manner, with decisions about the top- and lower-level branches first occurring in parallel, then combined for a final choice.

That landmark paper spawned several interesting discussions in the field. Hyafil and Moreno-Bote's technical comment encapsulates one of the critical lines of discussion about whether it is possible to truly distinguish a hierarchical decision-making process from a flat process based on the experimental data of Lorteije and colleagues. It is a critical and complex question. The reply by Zylerberg and colleagues adds further to the discussion, presenting several counter-arguments to the claims of Hyafil and Moreno-Bote.

All three reviewers were impressed by the tone and content of the submissions and agree that both represent worthwhile contributions to the literature. As noted by one of the reviewers, this kind of debate is "very valuable and generally underappreciated."

The above comments are included, verbatim, in our decision letters to both groups of submitting authors. We also will now make both initial submissions available to both groups, so that any potential revisions can fully take into account the claims made in the other submission. We will then allow for one more iteration: if and when we receive revised submissions and deem them appropriate, we will then again make each available to the other group for further revisions and clarifications.

Below are summaries of the discussions among the reviewers that are specific to your submission.

Essential revisions:

The reviewers agreed with the key point, that a "flat" mechanism that includes mutual inhibition can account for many features of the data and generally act like a hierarchical process. However, they also raised several concerns that should be addressed:

1) It would be useful to provide more intuitions about the specific assumptions and parameter values in the flat model that give rise to key types of output. It is critical to note that only some instantiations of the flat model are compatible with the data. The explanation in the legend of Figure 2E is useful but inadequate. Dedicating one or two paragraphs of the main text to model parameters and predictions will make the technical comment much more accessible. For example, why and under what parameter regimes inhibition in a flat model makes the accuracy of L2 decision independent of L1 stimulus strength? Likewise, can you provide better intuitions for the effect of scaling of noise with stimulus strength and rectification of the accumulators? Finally, the flat model explains the late influence of L1 on L2 and L2' choices, removing the need for complex interactions imposed by the hierarchical model. However, it also predicts that different L1 stimulus strengths should cause different magnitudes of suppression on TD neural responses, a prediction that does not match the data, as pointed out by Lorteije et al. It will be useful for readers to know that this prediction stems from Hyafil and Moreno-Bote's assumption that each accumulator equally suppresses the other accumulators. It is likely that suppression is not uniform, causing a TT choice to suppress TD less than DD because TD is a closer option to TT in the decision space than DD. Non-uniform suppression has been reported in various sensory processes and is quite likely to apply to multi-choice decisions.

2) A figure that recapitulates the task and the flat race-mode would also be useful.

3) The authors should be clearer on how they think their model relates to sensory and/or decision circuits. A race-model with mutual inhibition seems plausible in a decision-making area, but in a sensory area? Or do the authors assume the dynamics to play out in higher area and then feed back to V1 and V4? Making their argument explicit in the context of a model like that by Wimmer et al. 2015 would be very helpful. This point also seems strongly related to their claim that Figure 4 in Lorteije et al. argues against the hierarchical model because "localized selection signals merge information provided at different levels." Perhaps these sensory neurons are getting decision-related feedback?

4) The conclusion that the flat model is a "more parsimonious" explanation for the data than the model proposed by Lorteije et al. might be reconsidered, or at least clarified in the context of other lines of evidence from other studies that relate to flat versus hierarchical processing.

5) Zylberberg et al. argue that the RTs produced by the flat model presented by Hyafil and Moreno-Bote were unrealistic (although note that the task did not have a true RT design, and the measured RTs had only a weak dependence on signal strength). Can any version of the flat model (e.g., with collapsing bounds) produce the reported RTs?

6) It would be useful if both model code and relevant data files could be made available.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Breaking down hierarchies of decision-making in visual cortex" for further consideration at eLife. Your revised article has been favorably evaluated by Timothy Behrens (Senior Editor) and Joshua Gold (Reviewing Editor).

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below. We also note that our plan is as follows: once we receive an acceptable response to these issues, we will send the updated manuscript to Zylberberg and colleagues, so they have an opportunity to revise their manuscript accordingly. We then will share both papers with both groups, but at that point, if any further changes are desired, they have to be essential and very well justified.

1) Both papers discuss the new analysis by Zylberberg et al. showing that the strength of evidence at L1 affects V4 activity elicited by the L2 and L2' branches. They are in agreement with the finding but disagree with the interpretation. Part of the disagreement appears to stem from misunderstandings from some comments in the Hyafil and Moreno Bote manuscript. Specifically, as quoted by Zylberberg et al., Hyafil and Moreno Bote state in Appendix A that the flat model predicts that "selection signals at levels 2 are only influenced by information provided at level 2 branches […] and not by information provided at level 1." The new analyses by Zylberberg et al. clearly contradict this prediction. However, in their main text, Hyafil and Moreno Bote state that "this observation is most compatible with the flat model and by itself rules out the hierarchical model that relies on complete neural segregation of integration of L1 and L2 evidence." At the very least, please reconcile their statements in the Appendix and the main text.

2) Figure 2 vs. Figure 2—figure supplement 2: why not just show the Figure 2—figure supplement 2 panels in the main figure? Panels H and I could be relegated to the supplement.

https://doi.org/10.7554/eLife.16650.009

Author response

Essential revisions:

The reviewers agreed with the key point, that a "flat" mechanism that includes mutual inhibition can account for many features of the data and generally act like a hierarchical process. However, they also raised several concerns that should be addressed:

1) It would be useful to provide more intuitions about the specific assumptions and parameter values in the flat model that give rise to key types of output. It is critical to note that only some instantiations of the flat model are compatible with the data. The explanation in the legend of Figure 2E is useful but inadequate. Dedicating one or two paragraphs of the main text to model parameters and predictions will make the technical comment much more accessible. For example, why and under what parameter regimes inhibition in a flat model makes the accuracy of L2 decision independent of L1 stimulus strength? Likewise, can you provide better intuitions for the effect of scaling of noise with stimulus strength and rectification of the accumulators?

We agree with the reviewers that the relationship between the observed properties of the modelled network and its underlying ingredients was not clearly presented in the manuscript. We have improved the revised version to point for each observed properties what ingredients are necessary for this to occur: inhibition, rectification, thresholds, etc. Some parts of it were present in supplementary data and has been moved to main text (the rest is in the new Appendix A). Importantly, we have added a table that precisely summarizes this correspondence between features of the model and observed behavioral/neural properties.

Finally, the flat model explains the late influence of L1 on L2 and L2' choices, removing the need for complex interactions imposed by the hierarchical model. However, it also predicts that different L1 stimulus strengths should cause different magnitudes of suppression on TD neural responses, a prediction that does not match the data, as pointed out by Lorteije et al. It will be useful for readers to know that this prediction stems from Hyafil and Moreno-Bote's assumption that each accumulator equally suppresses the other accumulators. It is likely that suppression is not uniform, causing a TT choice to suppress TD less than DD because TD is a closer option to TT in the decision space than DD. Non-uniform suppression has been reported in various sensory processes and is quite likely to apply to multi-choice decisions.

We thank the reviewers for the suggestion than non-uniform inhibition in the flat model may allow us to reproduce this observed null effect in the fourth branch (TD). Non-uniform inhibition is indeed a key component of cortical networks that could play a decisive role in this null effect. Actually, we think that the pattern of inhibition should be stronger local inhibition (from TT to TD) may create reduced activity in TD for stronger L1 evidence (because L1 and L2 evidence are correlated, see below), and cancel the positive modulation with L1 evidence observed in the current model.

Even in the current model, we observed that the modulation was not uniform across all branches but was weakest precisely in the TD branch (2.5-4 times weaker than in other branches according to rho’s). As explained in new Appendix B, this effect was caused by the correlation between the strength of L1 and L2 evidence in the design structure of the sensory samples, as a result of the 3 levels of difficulty. As a result, the null statistical effect in TD branch could simply be due to the relative weakness of the modulation in this branch.

This interesting discussion has been added in a new paragraph in main text, complemented by new Appendix B and Figure 3C.

Beyond this point, our interpretation of new analysis in Figure 4 of Zylberberg et al. radically diverges from that of its authors. It is indeed the clearest signature of the flat model that each unit integrates information from all available sources prior to decision, and so its activity should be sensitive to both L1 and L2 evidence. The new analysis by Zylberberg and colleagues report effect from single localized populations, analogous to single unit activity from the model, thus perfectly compatible with modulation by L1 information. Thus, we would like to stress that the fact that selection signals in (at least) 3 of the 4 branches are modulated by evidence provided at both L1 and L2 branches is strongly suggestive of the flat model, and directly contradicts the hierarchical model. Indeed, the most essential element of hierarchical model is that evidence from different branches is not integrated within the same neural populations; integration and modulatory effects only occur prior to decision. This discussion has also been added in our manuscript.

2) A figure that recapitulates the task and the flat race-mode would also be useful.

Thanks for the suggestion. We have added in Figure 1 a representation of the implementation of the flat race-mode as well as one exemplar simulation (Figure 1C) In the figure legend, we now point to the original manuscript by Lorteije et al. for further description of the task.

3) The authors should be clearer on how they think their model relates to sensory and/or decision circuits. A race-model with mutual inhibition seems plausible in a decision-making area, but in a sensory area? Or do the authors assume the dynamics to play out in higher area and then feed back to V1 and V4? Making their argument explicit in the context of a model like that by Wimmer et al. 2015 would be very helpful. This point also seems strongly related to their claim that Figure 4 in Lorteije et al. argues against the hierarchical model because "localized selection signals merge information provided at different levels." Perhaps these sensory neurons are getting decision-related feedback?

We agree that the point about localized selection signals merging information provided at different levels was not a good one, mostly because it was not clear from that figure whether what is integrated is perceptual evidence or local decisions. The new figure from Zylberberg points to the latter process, as discussed above. The sentence was accordingly removed.

We thank the reviewers for pointing to the Wimmer et al. paper. As we now briefly discuss in the manuscript, it is not clear whether selection signals recorded in visual cortex emerge locally or reflect feedback from higher areas (this is clearly discussed in Lorteije et al. paper). The flat model assumes integration of evidence from across the visual field, and inhibition between the associated representations, and is thus more compatible with the view that integration takes place at a higher cortical area.

4) The conclusion that the flat model is a "more parsimonious" explanation for the data than the model proposed by Lorteije et al. might be reconsidered, or at least clarified in the context of other lines of evidence from other studies that relate to flat versus hierarchical processing.

Thanks for the suggestions. We would like to say that we do not mean that the flat model offers a more parsimonious model for all sorts of hierarchical tasks, and thus we have make sure that we do not make such a claim in our manuscript. However, we think that the flat model offers a more parsimonious explanation for the data of the presented experiment, in that all described effects can be explained by classical ingredients of the race model and not ad hoc mechanisms. Following the reviewers’ suggestion, and in light of this comment and the manuscript by Zylberberg et al., we have clarified our point in the final paragraphs of the manuscript about what the comparative advantages of flat vs. hierarchical model, and under what circumstances we hypothesize that each model will prevail.

In particular, we stress now what is believed is an important distinction between the hierarchical vs. flat nature of the integration process, and the serial vs. parallel nature of the timing of integration. The evidence from previous studies related in the first part of Zylberberg et al.’s manuscript only addresses the second question (when integration is serial or parallel), but speaks very little to the underlying structure of the representations that integrate evidence. Thus, we do not rule out that a flat model integrating evidence serially is also present in these other tasks.

5) Zylberberg et al. argue that the RTs produced by the flat model presented by Hyafil and Moreno-Bote were unrealistic (although note that the task did not have a true RT design, and the measured RTs had only a weak dependence on signal strength). Can any version of the flat model (e.g., with collapsing bounds) produce the reported RTs?

We had no access to RTs beforehand so did not place much attention to it our first simulations. Actually, as pointed out by the reviewers, this is not a true RT design but rather a mixed RT-fixed duration one, as monkeys had to wait for at least 500 ms (10 samples) before responding, but could wait even more to accumulate more evidence. We have now taken this into account by removing the decision boundary for the first 10 samples, and then a fixed (finite) threshold (Figure 2—figure supplement 2). The value of the finite threshold was adapted to yield approximately the same mean reaction times depending on task difficulty as reported by Zylberberg et al. Apart from the reported RTs, all previously reported behavioral and neural effects were reproduced, including the lack of modulation of L2 performance by L1 difficulty (Figure 2—figure supplement 2).

We would like to add that Zylberberg et al. report that after a simple modulation of a fixed threshold in the flat model (unlike the double threshold strategy we implemented), L2 performance was not independent anymore or L1 difficulty. We believe that the discrepancies between their simulations and ours stems from the distribution of early responses in the two models. As commented in point 1, modulation of L2 performance by L1 difficulty emerges mostly when L1 difficulty has a large impact on reaction times, and hence on the timing of L2 integration. With a single fixed threshold as used in simulation by Zylberberg and colleagues, there is a significant portion of easy L1 trials with early responses (RT<500 ms), which robustly degrades L2 performance. By contrast, by ensuring that decisions integrate at least the 10 first samples, we make sure that there is a minimum integration window for L2 decision. Note that this time-dependent bound is a form of collapsing bound, so we agree with the reviewer that indeed that is a plausible mechanism that can explain the observed RTs. We would like to conclude by saying that we have provided the results for the modified version of the network as supplementary figure, and left the original as main figure. We think this presentation better reflects the discussion that has taken place between the two manuscripts. We could also provide the new results in the main figures in place of the former ones, if judged more adequate by editors and reviewers.

6) It would be useful if both model code and relevant data files could be made available.

This is a good point. We are working on adapting the code to be readily usable by any matlab user. It will comprise both generation of synthetic data and its subsequent analysis (there is no middle way data files). It will be accessible within the next few days at the following link: bit.ly/flatdecision

[Editors' note: further revisions were requested prior to acceptance, as described below.]

[…] 1) Both papers discuss the new analysis by Zylberberg et al. showing that the strength of evidence at L1 affects V4 activity elicited by the L2 and L2' branches. They are in agreement with the finding but disagree with the interpretation. Part of the disagreement appears to stem from misunderstandings from some comments in the Hyafil and Moreno Bote manuscript. Specifically, as quoted by Zylberberg et al., Hyafil and Moreno Bote state in Appendix A that the flat model predicts that "selection signals at levels 2 are only influenced by information provided at level 2 branches […] and not by information provided at level 1." The new analyses by Zylberberg et al. clearly contradict this prediction. However, in their main text, Hyafil and Moreno Bote state that "this observation is most compatible with the flat model and by itself rules out the hierarchical model that relies on complete neural segregation of integration of L1 and L2 evidence." At the very least, please reconcile their statements in the Appendix and the main text.

We would like to thank the reviewers for making this point, as indeed there has been some confusion between our two manuscripts.

Specifically, we believe that Zylberberg et al. made a confusion between two things: “unit activity”, i.e. the localized selection signal in each of the four units; and “selection signals for L2” that we compute by taking the difference between the activities of two units within the same branch. While in the latter the influence of L1 evidence is effectively removed (as described in Annex A), in the former influence of L1 is essential: it is indeed the clearest signature of the flat model that each unit integrates information from all available sources prior to decision, and so its activity should be sensitive to both L1 and L2 evidence. The new analysis by Zylberberg and colleagues report effect from single localized populations, analogous to single unit activity from the model, in which we indeed predict modulation by L1 information.

We have now made this important distinction clearer in the manuscript, by adding the following note in the corresponding section:

“At this point, an important distinction has to be made in the hierarchical model between localized activations, which indeed mix evidence from both branches, and selection signals at L2, extracted by looking at the difference between two units in the same L1 branch, and which as shown above are largely insensitive to L1 signals.”

2) Figure 2 vs. Figure 2—figure supplement 2: why not just show the Figure 2—figure supplement 2 panels in the main figure? Panels H and I could be relegated to the supplement.

Actually we could not decide in our previous submission which was the better presentation: to put either the original analysis (with incorrect reaction times distribution) or the new analysis in the main figure. We thought that the former solution would allow to present the very data that Zylberberg et al. are commenting on in the companion paper. But we agree that the alternative option that the reviewers present is more straightforward. Changes were made accordingly.

https://doi.org/10.7554/eLife.16650.010

Article and author information

Author details

  1. Alexandre Hyafil

    CBC, DTIC, Universitat Pompeu Fabra, Barcelona, Spain
    Contribution
    AH, Conceptualization, Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    alexandre.hyafil@gmail.com
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0566-651X
  2. Rubén Moreno-Bote

    1. CBC, DTIC, Universitat Pompeu Fabra, Barcelona, Spain
    2. Serra Húnter Fellow Programme, Universitat Pompeu Fabra, Barcelona, Spain
    Contribution
    RM-B, Supervision, Investigation, Writing—original draft, Project administration, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.

Funding

European Commission (Marie Curie IEF - CEMNET (agreement 629613))

  • Alexandre Hyafil

Ministerio de Economía y Competitividad (PSI2013-44811-P)

  • Rubén Moreno-Bote

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank Jan Drugowitsch for his comments on the manuscript.

Reviewing Editor

  1. Joshua I Gold, University of Pennsylvania, United States

Publication history

  1. Received: April 5, 2016
  2. Accepted: March 24, 2017
  3. Version of Record published: June 26, 2017 (version 1)

Copyright

© 2017, Hyafil et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,038
    Page views
  • 153
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Marieke MB Hoekstra et al.
    Research Article Updated
    1. Evolutionary Biology
    2. Neuroscience
    Ian F Miller et al.
    Research Article Updated