Binocular rivalry reveals an out-of-equilibrium neural dynamics suited for decision-making

  1. Robin Cao
  2. Alexander Pastukhov
  3. Stepan Aleshin
  4. Maurizio Mattia
  5. Jochen Braun  Is a corresponding author
  1. Cognitive Biology, Center for Behavioral Brain Sciences, Germany
  2. Gatsby Computational Neuroscience Unit, United Kingdom
  3. Istituto Superiore di Sanità, Italy

Abstract

In ambiguous or conflicting sensory situations, perception is often ‘multistable’ in that it perpetually changes at irregular intervals, shifting abruptly between distinct alternatives. The interval statistics of these alternations exhibits quasi-universal characteristics, suggesting a general mechanism. Using binocular rivalry, we show that many aspects of this perceptual dynamics are reproduced by a hierarchical model operating out of equilibrium. The constitutive elements of this model idealize the metastability of cortical networks. Independent elements accumulate visual evidence at one level, while groups of coupled elements compete for dominance at another level. As soon as one group dominates perception, feedback inhibition suppresses supporting evidence. Previously unreported features in the serial dependencies of perceptual alternations compellingly corroborate this mechanism. Moreover, the proposed out-of-equilibrium dynamics satisfies normative constraints of continuous decision-making. Thus, multistable perception may reflect decision-making in a volatile world: integrating evidence over space and time, choosing categorically between hypotheses, while concurrently evaluating alternatives.

Introduction

In deducing the likely physical causes of sensations, perception goes beyond the immediate sensory evidence and draws heavily on context and prior experience (von Helmholtz, 1867; Barlow et al., 1972; Gregory, 1980; Rock, 1983). Numerous illusions in visual, auditory, and tactile perception – all subjectively compelling, but objectively false – attest to this extrapolation beyond the evidence. In natural settings, perception explores alternative plausible causes of sensory evidence by active readjustment of sensors (‘active perception,’ Mirza et al., 2016; Yang et al., 2018; Parr and Friston, 2017a). In general, perception is thought to actively select plausible explanatory hypotheses, to predict the sensory evidence expected for each hypothesis from prior experience, and to compare the observed sensory evidence at multiple levels of scale or abstraction (‘analysis by synthesis,’ ‘predictive coding,’ ‘hierarchical Bayesian inference,’ Yuille and Kersten, 2006, Rao and Ballard, 1999, Parr and Friston, 2017b, Pezzulo et al., 2018). Active inference engages the entire hierarchy of cortical areas involved in sensory processing, including both feedforward and feedback projections (Bar, 2009; Larkum, 2013; Shipp, 2016; Funamizu et al., 2016; Parr et al., 2019).

The dynamics of active inference becomes experimentally observable when perceptual illusions are ‘multistable’ (Leopold and Logothetis, 1999). In numerous ambiguous or conflicting situations, phenomenal experience switches at irregular intervals between discrete alternatives, even though the sensory scene is stable (Necker, 2009; Wheatstone, 1838; Rubin, 1958; Attneave, 1971; Ramachandran and Anstis, 2016; Pressnitzer and Hupe, 2006; Schwartz et al., 2012). Multistable illusions are enormously diverse, involving visibility or audibility, perceptual grouping, visual depth or motion, and many kinds of sensory scenes, from schematic to naturalistic. Average switching rates differ greatly and range over at least two orders of magnitude (Cao et al., 2016), depending on sensory scene, perceptual grouping (Wertheimer, 1912; Koffka, 1935; Ternus, 1926), continuous or intermittent presentation (Leopold and Logothetis, 2002; Maier et al., 2003), attentional condition (Pastukhov and Braun, 2007), individual observer (Pastukhov et al., 2013c; Denham et al., 2018; Brascamp et al., 2019), and many other factors.

In spite of this diversity, the stochastic properties of multistable phenomena appear to be quasi-universal, suggesting that the underlying mechanisms may be general. Firstly, average dominance duration depends in a characteristic and counterintuitive manner on the strength of dominant and suppressed evidence (‘Levelt’s propositions I–IV,’ Levelt, 1965; Brascamp et al., 2006; Klink et al., 2016; Kang, 2009; Brascamp et al., 2015; Moreno-Bote et al., 2010). Secondly, the statistical distribution of dominance durations shows a stereotypical shape, resembling a gamma distribution with shape parameter r3-4 (‘scaling property,’ Cao et al., 2016; Fox and Herrmann, 1967; Blake et al., 1971; Borsellino et al., 1972; Walker, 1975; De Marco et al., 1977; Murata et al., 2003; Brascamp et al., 2005; Pastukhov and Braun, 2007; Denham et al., 2018; Darki and Rankin, 2021). Thirdly, the durations of successive dominance periods are correlated positively, over at least two or three periods (Fox and Herrmann, 1967; Walker, 1975; Van Ee, 2005; Denham et al., 2018).

Here, we show that these quasi-universal characteristics are comprehensively and quantitatively reproduced, indeed guaranteed, by an interacting hierarchy of birth-death processes operating out of equilibrium. While the proposed mechanism combines some of the key features of previous models, it far surpasses their explanatory power.

Several possible mechanisms have been proposed for perceptual dominance, the triggering of reversals, and the stochastic timing of reversals. That a single, coherent interpretation typically dominates phenomenal experience is thought to reflect competition (explicit or implicit) at the level of explanatory hypotheses (e.g., Dayan, 1998), sensory inputs (e.g., Lehky, 1988), or both (e.g., Wilson, 2003). That a dominant interpretation is occasionally supplanted by a distinct alternative has been attributed to fatigue processes (e.g., neural adaptation, synaptic depression, Laing and Chow, 2002), spontaneous fluctuations (‘noise,’ e.g., Wilson, 2007, Kim et al., 2006), stochastic sampling (e.g., Schrater and Sundareswara, 2006), or combinations of these (e.g., adaptation and noise, Shpiro et al., 2009; Seely and Chow, 2011; Pastukhov et al., 2013c). The characteristic stochasticity (gamma-like distribution) of dominance durations has been attributed to Poisson counting processes (e.g., birth-death processes, Taylor and Ladridge, 1974; Gigante et al., 2009; Cao et al., 2016) or stochastic accumulation of discrete samples (Murata et al., 2003; Schrater and Sundareswara, 2006; Sundareswara and Schrater, 2008; Weilnhammer et al., 2017).

‘Dynamical’ models combining competition, adaptation, and noise capture well the characteristic dependence of dominance durations on input strength (‘Levelt’s propositions’) (Laing and Chow, 2002; Wilson, 2007; Ashwin and Aureliu, 2010), especially when inputs are normalized (Moreno-Bote et al., 2007; Moreno-Bote et al., 2010; Cohen et al., 2019), and when the dynamics emphasize noise (Shpiro et al., 2009; Seely and Chow, 2011; Pastukhov et al., 2013c). However, such models do not preserve distribution shape over the full range of input strengths (Cao et al., 2016; Cohen et al., 2019). On the other hand, ‘sampling’ models based on discrete random processes preserve distribution shape (Taylor and Ladridge, 1974; Murata et al., 2003; Schrater and Sundareswara, 2006; Sundareswara and Schrater, 2008; Cao et al., 2016; Weilnhammer et al., 2017), but fail to reproduce the dependence on input strength. Neither type of model accounts for the sequential dependence of dominance durations (Laing and Chow, 2002).

Here, we reconcile ‘dynamical’ and ‘sampling’ approaches to multistable perception, extending an earlier effort (Gigante et al., 2009). Importantly, every part of the proposed mechanism appears to be justified normatively in that it may serve to optimize perceptual choices in a general behavioral situation, namely, continuous inference in uncertain and volatile environments (Bogacz, 2007; Veliz-Cuba et al., 2016). We propose that sensory inputs are represented by birth-death processes in order to accumulate sensory information over time and in a format suited for Bayesian inference (Ma et al., 2006; Pouget et al., 2013). Further, we suggest that explanatory hypotheses are evaluated competitively, with a hypothesis attaining dominance (over phenomenal experience) when its support exceeds the alternatives by a certain finite amount, consistent with optimal decision-making between multiple alternatives (Bogacz, 2007). Finally, we assume that a dominant hypothesis suppresses its supporting evidence, as required by ‘predictive coding’ implementations of hierarchical Bayesian inference (Pearl, 1988; Rao and Ballard, 1999; Hohwy et al., 2008). In contrast to many previous models, we do not require a local mechanisms of fatigue, adaptation, or decay.

Based on these assumptions, the proposed mechanism reproduces dependence on input strength, as well as distribution of dominance durations and positive sequential dependence. Additionally, it predicts novel and unsuspected dynamical features confirmed by experiment.

Results

Below we introduce each component of the mechanism and its possible normative justification, before describing out-of-equilibrium dynamics resulting from the interaction of all components. Subsequently, we compare model predictions with multistable perception of human observers, specifically, the dominance statistics of binocular rivalry (BR) at various combinations of left- and right-eye contrasts (Figure 1a).

Proposed mechanism of binocular rivalry.

(a) When the left and right eyes see incompatible images in the visual field, phenomenal appearance reverses at irregular intervals, sometimes being dominated by one image and sometimes by the other (gray and white regions). Sir Charles Wheatstone studied this multistable percept with a mirror stereoscope (not as shown!). (b) Spiking neural network implementation of a ‘local attractor.’ An assembly of 150 neurons (schematic, dark gray circle) interacts competitively with multiple other assemblies (light gray circles). Population activity of the assembly explores an effective energy landscape (right) with two distinct steady states (circles), separated by a ridge (diamond). Driven by noise, activity transitions occasionally between ‘on’ and ‘off’ states (bottom), with transition rates ν± depending sensitively on external input to the assembly (not shown). Here, ν+=ν1Hz. Spike raster shows 10 representative neurons. (c) Nested attractor dynamics (central schematic) that quantitatively reproduces the dynamics of binocular rivalry (left and right columns). Independently bistable variables (‘local attractors,’ small circles) respond probabilistically to input, transitioning stochastically between on- and off-states (red/blue and white, respectively). The entire system comprises four pools, with 25 variables each, linked by excitatory and inhibitory projections. Phenomenal appearance is decided by competition between decision pools R and R forming ‘non-local attractors’ (cross-inhibition wcomp and self-excitation wcoop). Visual input c and c accumulates, respectively, in evidence pools E and E and propagates to decision pools (feedforward selective excitation wexc and indiscriminate inhibition winh). Decision pools suppress associated evidence pools (feedback selective suppression wsupp). The time course of the number of active variables (active count) is shown for decision pools (top left and right) and evidence pools (bottom left and right), representing the left eye (red traces) and the right eye image (blue traces). The state of individual variables (black horizontal traces in left and middle columns) and of perceptual dominance (gray and white regions) is also shown. In decision pools, almost all variables become active (black trace) or inactive (no trace) simultaneously. In evidence pools, only a small fraction of variables is active at any given time. (d) Fractional activity dynamics of decision pools R and R (top, red and blue traces) and evidence pools E and E (bottom, red and blue traces). Reversals of phenomenal appearance are also indicated (gray and white regions).

Hierarchical dynamics

Bistable assemblies: ‘local attractors’

As operative units of sensory representation, we postulate neuronal assemblies with bistable ‘attractor’ dynamics. Effectively, assembly activity moves in an energy landscape with two distinct quasi-stable states – dubbed ‘on’ and ‘off’ – separated by a ridge (Figure 1b). Driven by noise, assembly activity mostly remains near one quasi-stable state (‘on’ or ‘off’), but occasionally ‘escapes’ to the other state (Kramers, 1940; Hanggi et al., 1990; Deco and Hugues, 2012; Litwin-Kumar and Doiron, 2012; Huang and Doiron, 2017).

An important feature of ‘attractor’ dynamics is that the energy of quasi-stable states depends sensitively on external input. Net positive input destabilizes (i.e., raises the potential of) the ‘off’ state and stabilizes (i.e., lowers the potential of) the ‘on’ state. Transition rates ν± are even more sensitive to external input as they depend approximately exponentially on the height of the energy ridge (‘activation energy’).

Figure 1b illustrates ‘attractor’ dynamics for an assembly of 150 spiking neurons with activity levels of approximately 7Hz and 21Hz per neuron in the ‘off’ and ‘on’ states, respectively. Full details are provided in Appendix 1, section Metastable population dynamics, and Appendix 1—figure 2.

Binary stochastic variables

Our model is independent of neural details and relies exclusively on an idealized description of ‘attractor’ dynamics. Specifically, we reduce bistable assemblies to discretely stochastic, binary activity variables x(t){0,1}, which activate and inactivate with Poisson rates ν+ and ν, respectively. These rates ν±(s) vary exponentially and anti-symmetrically with increments or decrements of activation energy Δu=u(s)+u0:

(1) ν+=ν2exp(Δu2),ν=ν2exp(Δu2)

where u0 and ν are baseline potential and baseline rate, respectively, and where the input-dependent part u(s)=ws varies linearly with input s, with synaptic coupling constant w (see Appendix 1, section Metastable population dynamics and Appendix 1—figure 2e).

Pool of N binary variables

An extended network, containing N individually bistable assemblies with shared input s, reduces to a ‘pool’ of N binary activity variables xi(t){0,1} with identical rates ν±(s). Although all variables are independently stochastic, they are coupled through their shared input s. The number of active variables n(t)=ixi(t) or, equivalently, the active fraction x(t)=n(t)/N, forms a discretely stochastic process (‘birth-death’ or ‘Ehrenfest’ process; Karlin and McGregor, 1965).

Relaxation dynamics

While activity x(t) develops discretely and stochastically according to Equation 5 (Materials and methods), its expectation x(t) develops continuously and deterministically,

(2) x˙=(1x)ν+xν

relaxing with characteristic time τx=1ν++ν towards asymptotic value x=ν+ν++ν. As rates ν± change with input s (Equation 1), we can define the functions τs=Υ(s) and x=Φ(s) (see Materials and methods). Characteristic time τx is longest for small input s0 and shortens for larger positive or negative input |s|0. The asymptotic value x ranges over the interval (0, 1) and varies sigmoidally with input s, reaching half-activation for s=u0/w.

Quality of representation

Pools of bistable variables belong to a class of neural representations particularly suited for Bayesian integration of sensory information (Beck et al., 2008; Pouget et al., 2013). In general, summation of activity is equivalent to optimal integration of information, provided that response variability is Poisson-like, and response tuning differs only multiplicatively (Ma et al., 2006; Ma et al., 2008). Pools of bistable variables closely approximate these properties (see Appendix 1, section Quality of representation: Suitability for inference).

The representational accuracy of even a comparatively small number of bistable variables can be surprisingly high. For example, if normally distributed inputs drive the activity of initially inactive pools of bistable variables, pools as used in the present model (N=25, w=2.5) readily capture 90% of the Fisher information (see Appendix 1, section Quality of representation: Integration of noisy samples).

Conflicting evidence

Any model of BR must represent the conflicting evidence from both eyes (e.g., different visual orientations), which supports alternative perceptual hypotheses (e.g., distinct grating patterns). We assume that conflicting evidence accumulates in two separate pools of N=25 bistable variables, E and E, (‘evidence pools,’ Figure 1c). Fractional activations e(t) and e(t) develop stochastically following Equation 5 (Materials and methods). Transition rates νe± and νe± vary exponentially with activation energy (Equation 1), with baseline potential ue0 and baseline rate νe. The variable components of activation energy, ue and ue, are synaptically modulated by image contrasts, c and c:

(3) ue=wvisI,ue=wvisI

where wvis is a coupling constant and I=f(c)[0,1] is a monotonic function of image contrast c (see Materials and methods).

Competing hypotheses: ‘non-local attractors’

Once evidence for, and against, alternative perceptual hypotheses (e.g., distinct grating patterns) has been accumulated, reaching a decision requires a sensitive and reliable mechanism for identifying the best supported hypothesis and amplifying the result into a categorical read-out. Such a winner-take-all decision (Koch and Ullman, 1985) is readily accomplished by a dynamical version of biased competition (Deco and Rolls, 2005; Wang, 2002; Deco et al., 2007; Wang, 2008).

We assume that alternative perceptual hypotheses are represented by two further pools of N=25 bistable variables, R and R, forming two ‘non-local attractors’ (‘decision pools,’ Figure 1c). Similar to previous models of decision-making and attentional selection (Deco and Rolls, 2005; Wang, 2002; Deco et al., 2007; Wang, 2008), we postulate recurrent excitation within pools, but recurrent inhibition between pools, to obtain a ‘winner-take-all’ dynamics. Importantly, we assume that ‘evidence pools’ project to ‘decision pools’ not only in the form of selective excitation (targeted at the corresponding decision pool), but also in the form of indiscriminate inhibition (targeting both decision pools), as suggested previously (Ditterich et al., 2003; Bogacz et al., 2006).

Specifically, fractional activations r(t) and r(t) develop stochastically according to Equation 5 (Materials and methods). Transition rates νs± and νs± vary exponentially with activation energy (Equation 1), with baseline difference ur0 and baseline rate νr. The variable components of activation energy, ur and ur, are synaptically modulated by evidence and decision activities:

(4) ur=wexcewinh(e+e)+wcooprwcomprur=wexcewinh(e+e)+wcooprwcompr

where coupling constants wexc, winh, wcoop, wcomp reflect feedforward excitation, feedforward inhibition, lateral cooperation within decision pools, and lateral competition between decision pools, respectively.

This biased competition circuit expresses a categorical decision by either raising r towards unity (and lowering r towards zero) or vice versa. The choice is random when visual input is ambiguous, II, but becomes deterministic with growing input bias |II|§gt;0 . This probabilistic sensitivity to input bias is reliable and robust under arbitrary initial conditions of e, e, r and r (see Appendix 1, section Categorical choice with Appendix 1—figure 3).

Feedback suppression

Finally, we assume feedback suppression, with each decision pool selectively targeting the corresponding evidence pool. A functional motivation for this systematic bias against the currently dominant appearance is given momentarily. Its effects include curtailment of dominance durations and ensuring that reversals occur from time to time. Specifically, we modify Equation 3 to

(3a) ue=wvisf(c)wsupprue=wvisf(c)wsuppr

where wsupp is a coupling constant.

Previous models of BR (Dayan, 1998; Hohwy et al., 2008) have justified selective feedback suppression of the evidence supporting a winning hypothesis in terms of ‘predictive coding’ and ‘hierarchical Bayesian inference’ (Rao and Ballard, 1999; Lee and Mumford, 2003). An alternative normative justification is that, in volatile environments, where the sensory situation changes frequently (‘volatility prior’), optimal inference requires an exponentially growing bias against evidence for the most likely hypothesis (Veliz-Cuba et al., 2016). Note that feedback suppression applies selectively to evidence for a winning hypothesis and is thus materially different from visual adaptation (Wark et al., 2009), which applies indiscriminately to all evidence present.

Reversal dynamics

A representative example of the joint dynamics of evidence and decision pools is illustrated in Figure 1c,d, both at the level of pool activities e(t), e(t), r(t), r(t), and at the level of individual bistable variables x(t). The top row shows decision pools R and R, with instantaneous active counts, Nr(t) and Nr(t) and active/inactive states of individual variables x(t). The bottom row shows evidence pools E and E, with instantaneous active counts, Ne(t) and Ne(t) and active/inactive states of individual variables x(t). Only a small fraction of evidence variables is active at any one time.

Phenomenal appearance reverses when the differential activity Δe=ee of evidence pools, E and E, contradicts sufficiently strongly the differential activity Δr=rr of decision pools, R and R, such that the steady state of decision pools is destabilized (see further below and Figure 4). As soon as the reversal has been effected at the decision level, feedback suppression lifts from the newly non-dominant evidence and descends upon the newly dominant evidence. Due to this asymmetric suppression, the newly non-dominant evidence recovers, whereas the newly dominant evidence habituates. This opponent dynamics progresses, past the point of equality ss, until differential evidence activity Δe once again contradicts differential decision activity Δr. Whereas the activity of decision pools varies in phase (or counterphase) with perceptual appearance, the activity of evidence pools changes in quarterphase (or negative quarterphase) with perceptual appearance (e.g., Figures 1c,d,2a), consistent with some previous models (Gigante et al., 2009; Albert et al., 2017; Weilnhammer et al., 2017).

Joint dynamics of evidence habituation and recovery.

Exponential development of evidence activities is governed by input-dependent asymptotic values and characteristic times. (a) Fractional activities e (blue traces) and e (red traces) of evidence pools E and E, respectively, over several dominance periods for unequal stimulus contrast (c=78,c=18). Stochastic reversals of finite system (N=25 units per pool, left) and deterministic reversals of infinite system (N, right). Perceptual dominance (decision activity) is indicated along the upper margin (red or blue stripe). Dominance evidence habituates (dom), and non-dominant evidence recovers (sup), until evidence contradicts perception sufficiently (black vertical lines) to trigger a reversal (gray and white regions). (b) Development of stronger-input evidence e (blue) and weaker-input evidence e (red) over two successive dominance periods (c=1516,c=116). Activities recover, or habituate, exponentially until reversal threshold Δrev is reached. Thin curves extrapolate to the respective asymptotic values, e and e. Dominance durations depend on distance Δ and on characteristic times τe and τe. Left: incrementing non-dominant evidence e (dashed curve) raises upper asymptotic value e and shortens dominance T by ΔT. Right: incrementing dominant evidence e (dashed curve) raises lower asymptotic value e and shortens dominance T by ΔT. (c) Increasing asymptotic activity difference Δ accelerates the development of differential activity and curtails dominance periods T, T (and vice versa). As the dependence is hyperbolic, any change to Δ disproportionately affects longer dominance periods. If T§gt;T, then ΔT§gt;ΔT (and vice versa).

Binocular rivalry

To compare predictions of the model described above to experimental observations, we measured spontaneous reversals of BR for different combinations of image contrast. BR is a particularly well-studied instance of multistable perception (Wheatstone, 1838; Diaz-Caneja, 1928; Levelt, 1965; Leopold and Logothetis, 1999; Brascamp et al., 2015). When conflicting images are presented to each eye (e.g., by means of a mirror stereoscope or of colored glasses, see Materials and methods), the phenomenal appearance reverses from time to time between the two images (Figure 1a). Importantly, the perceptual conflict involves also representations of coherent (binocular) patterns and is not restricted to eye-specific (monocular) representations (Logothetis et al., 1996; Kovács et al., 1996; Bonneh et al., 2001; Blake and Logothetis, 2002).

Specifically, our experimental observations established reversal sequences for 5×5 combinations of image contrast, cdom,csup{116,18,14,12,1}. During any given dominance period, cdom is the contrast of the phenomenally dominant image and csup the contrast of the other, phenomenally suppressed image (see Materials and methods). We analyzed these observations in terms of mean dominance durations T, higher moments cV and γ1/cV of the distribution of dominance durations, and sequential correlation cc1 of successive dominance durations.

Additional aspects of serial dependence are discussed further below.

As described in Materials and methods, we fitted 11 model parameters to reproduce observations with more than 50 degrees of freedom: 5×5 mean dominance durations T, 5×5 coefficients of variation cV, one value of skewness γ1/cV=2, and one correlation coefficient cc1=0.06. The latter two values were obtained by averaging over 5×5 contrast combinations and rounding. Importantly, minimization of the fit error, by random sampling of parameter space with a stochastic gradient descent, resulted in a three-dimensional manifold of suboptimal solutions. This revealed a high degree of redundancy among the 11 model parameters (see Materials and methods). Accordingly, we estimate that the effective number of degrees of freedom needed to reproduce the desired out-of-equilibrium dynamics was between 3 and 4. Model predictions and experimental observations are juxtaposed in Figures 3 and 4.

Dependence of mean dominance duration on dominant and suppressed image contrast (‘Levelt’s propositions’).

(a) Mean dominance duration T (color scale), as a function of dominant contrast cdom and suppressed contrast csup, in model (left) and experiment (right). (b) Model prediction (solid traces) and experimental observation (dashed traces and symbols) compared. Levelt I and II: weak increase of T with cdom when csup=1 (red traces and symbols), and strong decrease with csup when cdom=1 (brown traces and symbols). Levelt III: symmetric increase of T with cdom (orange traces and symbols) and decrease with csup (brown traces and symbols), when cdom+cdom=1. Alternation rate (green traces and symbols) peaks at equidominance and decreases symmetrically to either side. (c) Levelt IV: decrease of T with image contrast, when csup=cdom. (d) Predicted dependence of sequential correlation cc1 (color scale) on cdom and csup. (e) Model prediction (black trace, N=25) and experimental observation (blue trace and symbols, mean ± SEM, Spearman’s rank correlation ρ), when csup=cdom. Also shown is a second model prediction (red trace, N=40).

Shape of dominance distribution depends only weakly on image contrast (‘scaling property’).

Distribution shape is parametrized by coefficient of variation cv and relative skewness γ1/cV. (a) Coefficient of variation cv (color scale), as a function of dominant contrast cdom and suppressed contrast csup, in model (left) and experiment (right). (b) Model prediction (solid traces) and experimental observation (dashed traces and symbols) compared. Left: increase of cv with cdom (red traces and symbols), and symmetric decrease with csup (brown traces and symbols), when csup=1. Right: weak dependence when cdom=csup (black traces and symbols). (c) Predicted dependence of relative skewness γ1/cV (gray scale) on cdom and csup. (d) Model prediction (solid traces), when cdom=csup (black) and cdom=1csup (orange and brown) and experimental observation when cdom=csup (blue dashed trace and symbols, mean ± SEM).

The complex and asymmetric dependence of mean dominance durations on image contrast — aptly summarized by Levelt’s ‘propositions’ I to IV (Levelt, 1965; Brascamp et al., 2015) — is fully reproduced by the model (Figure 3). Here, we use the updated definition of Brascamp et al., 2015: increasing the contrast of one image increases the fraction of time during which this image dominates appearance (‘predominance,’ Levelt I). Counterintuitively, this is due more to shortening dominance of the unchanged image than to lengthening dominance of the changed image (Levelt II, Figure 3b, left panel). Mean dominance durations grow (and alternation rates decline) symmetrically around equal predominance as contrast difference cdomcsup increases (Levelt III, Figure 3b, right panel). Mean dominance durations shorten when both image contrasts cdom=csup increase (Levelt IV, Figure 3c).

Successive dominance durations are typically correlated positively (Fox and Herrmann, 1967; Walker, 1975; Pastukhov et al., 2013c). Averaging over all contrast combinations, observed and fitted correlation coefficients were comparable with cc1=0.06±0.06 (mean and standard deviation). Unexpectedly, both observed and fitted correlations coefficients increased systematically with image contrast (ρ=0.9, p§lt;.01), growing from cc1=0.02±0.05 at cdom=csup=116 to 0.21±0.06 at cdom=cdom=1 (Figure 3e, blue symbols). It is important to that this dependence was not fitted. Rather, this previously unreported dependence constitutes a model prediction that is confirmed by observation.

The distribution of dominance durations typically takes a characteristic shape (Cao et al., 2016; Fox and Herrmann, 1967; Blake et al., 1971; Borsellino et al., 1972; Walker, 1975; De Marco et al., 1977; Murata et al., 2003; Brascamp et al., 2005; Pastukhov and Braun, 2007; Denham et al., 2018), approximating a gamma distribution with shape parameter r34, or coefficient of variation cV=1/r0.50.6. The fitted model fully reproduces this ‘scaling property’ (Figure 4). The observed coefficient of variation remained in the range cV0.050.06 for nearly all contrast combinations (Figure 4b). Unexpectedly, both observed and fitted values increased above, or decreased below, this range at extreme contrast combinations (Figure 4b, left panel). Along the main diagonal cdom=csup , where observed values had smaller error bars, both observed and fitted values of skewness were γ1/cV2 and thus approximated a gamma distribution (Figure 4d, blue symbols).

Specific contribution of evidence and decision levels

What are the reasons for the surprising success of the model in reproducing universal characteristics of multistable phenomena, including the counterintuitive input dependence (‘Levelt’s propositions’), the stereotypical distribution shape (‘scaling property’), and the positive sequential correlation (as detailed in Figures 3 and 4)? Which level of model dynamics is responsible for reproducing different aspects of BR dynamics?

Below, we describe the specific contributions of different model components. Specifically, we show that the evidence level of the model reproduces ‘Levelt’s propositions I–III’ and the ‘scaling property,’ whereas the decision level reproduces ‘Levelt’s proposition IV.’ A non-trivial interaction between evidence and decision levels reproduces serial dependencies. Additionally, we show that this interaction predicts further aspects of serial dependencies – such as sensitivity to image contrast – that were not reported previously, but are confirmed by our experimental observations.

Levelt’s propositions I, II, and III

The characteristic input dependence of average dominance durations emerges in two steps (as in Gigante et al., 2009). First, inputs and feedback suppression shape the birth-death dynamics of evidence pools E and E (by setting disparate transition rates ν±, following Equation 3’ and Equation 1). Second, this sets in motion two opponent developments (habituation of dominant evidence activity and recovery of non-dominant evidence activity, both following Equation 2) that jointly determine dominance duration.

To elucidate this mechanism, it is helpful to consider the limit of large pools (N) and its deterministic dynamics (Figure 2), which corresponds to the average stochastic dynamics. In this limit, periods of dominant evidence E or E start and end at the same levels (estart=estart and eend=eend), because reversal thresholds Δrev are the same for evidence difference ee and ee (see section Levelt IV below).

The rates at which evidence habituates or recovers depend, in the first instance, on asymptotic levels e and e (Equation 1 and 2, Figure 2b and Appendix 1—figure 4). In general, dominance durations depend on distance Δ between asymptotic levels: the further apart these are, the faster the development and the shorter the duration. As feedback suppression inverts the sign of the opponent developments, dominant evidence decreases (habituates) while non-dominant evidence increases (recovers). Due to this inversion, Δ is roughly proportional to enondomedom+wsupp. It follows that the distance Δ is smaller and the reversal dynamics slower when dominant input is stronger, and vice versa. It further follows that incrementing one input (and raising the corresponding asymptotic level) speeds up recovery or slows down habituation, shortening or lengthening periods of non-dominance and dominance, respectively (Levelt I).

In the second instance, rates of habituation or recovery depend on characteristic times τe and τe (Equation 1 and 2). When these rates are unequal, dominance durations depend more sensitively on the slower process. This is why dominance durations depend more sensitively on non-dominant input (Levelt II): recovery of non-dominant evidence is generally slower than habituation of dominant evidence, independently of which input is weaker or stronger. The reason is that the respective effects of characteristic times τe and τe and asymptotic levels e and e are synergistic for weaker-input evidence (in both directions), whereas they are antagonistic for stronger-input evidence (see Appendix 1, section Deterministic dynamics: Evidence pools and Appendix 1—figure 4).

In general, dominance durations depend hyperbolically on Δ (Figure 2c and Equation 7 in Appendix 1). Dominance durations become infinite (and reversals cease) when Δ falls below the reversal threshold Δrev. This hyperbolic dependence is also why alternation rate peaks at equidominance (Levelt III): increasing the difference between inputs always lengthens longer durations more than it shortens shorter durations, thus lowering alternation rate.

Distribution of dominance durations

For all combinations of image contrast, the mechanism accurately predicts the experimentally observed distributions of dominance durations. This is owed to the stochastic activity of pools of bistable variables.

Firstly, dominance distributions retain nearly the same shape, even though average durations vary more than threefold with image contrast (see also Appendix 1—figure 6a,b). This ‘scaling property’ is due to the Poisson-like variability of birth-death processes (see Appendix 1, section Stochastic dynamics). Generally, when a stochastic accumulation approaches threshold, the rates of both accumulation and dispersion of activity affect the distribution of first-passage-times (Cao et al., 2014; Cao et al., 2016). In the special case of Poisson-like variability, the two rates vary proportionally and preserve distribution shape (see also Appendix 1—figure 6c,d).

Secondly, predicted distributions approximate gamma distributions with scale factor r34. As shown previously (Cao et al., 2014; Cao et al., 2016), this is due to birth-death processes accumulating activity within a narrow range (i.e., evidence difference Δe0.2). In this low-threshold regime, the first-passage-times of birth-death processes are both highly variable and gamma distributed, consistent with experimental observations.

Thirdly, the predicted variability (coefficients of variation) of dominance periods varies along the c+c=1 axis, being larger for longer than for shorter dominance durations (Figure 4a,b). The reason is that stochastic development becomes noise-dominated. For longer durations, stronger-input evidence habituates rapidly into a regime where random fluctuations gain importance (see also Appendix 1—figure 4a,b).

Levelt’s proposition IV

The model accurately predicts how dominance durations shorten with higher image contrast c=c (Levelt IV). Surprisingly, this reflects the dynamics of decision pools R and R (Figure 5).

Competitive dynamics of decision pools ensures Levelt IV.

(a) The joint stable state of decision pools (here r1 and r0) can be destabilized by sufficiently contradictory evidence, e§gt;e. (b) Effective potential U(e,e,r,r) (colored curves) and steady states r (colored dots) for different levels of contradictory input, Δe=ee. Increasing Δe destabilizes the steady state and shifts r rightward (curved arrow). The critical value rcrit (dotted vertical line), at which the steady state turns unstable, is reached when Δe reaches the reversal threshold Δrev. At this point, a reversal ensues with r1 and r0. (c) The reversal threshold Δrev diminishes with combined evidence e+e. In the deterministic limit, Δrev decreases linearly with e¯=(e+e)/2 (dashed red line). In the stochastic system, the average evidence bias Δe at the time of reversals decreases similarly with the average evidence mean e¯ (black dots). Actual values of Δe at the time of reversals are distributed around these average values (gray shading). (d) Average evidence mean e¯ (left) and average evidence bias Δe (middle) at the time of reversals as a function of image contrast c and c. Decrease of average evidence bias Δe with contrast shortens dominance durations (Levelt IV). At low contrast (blue dot), higher reversal thresholds Δrev result in less frequent reversals (bottom right, gray and white regions) whereas, at high contrast (red dot), lower reversal thresholds lead to more frequent reversals (top right).

Here again it is helpful to consider the deterministic limit of large pools (N). In this limit, a dominant decision state r1 is destabilized when a contradictory evidence difference Δe=ee exceeds a certain threshold value Δrev (Figure 5b and Appendix 1, section Deterministic dynamics: Decision pools). Due to the combined effect of excitatory and inhibitory feedforward projections, wexc and winh (Equation 4 and Figure 5a), this average reversal threshold decreases with mean evidence activity e¯=(e+e)/2. Simulations of the fully stochastic model (N=25) confirm this analysis (Figure 5c). As average evidence activity e¯ increases with image contrast, the average evidence bias Δe at the time of reversals decreases, resulting in shorter dominance periods (Figure 5d).

Serial dependence

The proposed mechanism predicts positive correlations between successive dominance durations, a well-known characteristic of multistable phenomena (Fox and Herrmann, 1967; Walker, 1975; Van Ee, 2005; Denham et al., 2018). In addition, it predicts further aspects of serial dependence not reported previously.

In both model and experimental observations, a long dominance period tends to be followed by another long period, and a short dominance period by another short period (Figure 6). In the model, this is due to mean evidence activity e¯=(e+e)/2 fluctuating stochastically above and below its long-term average. The autocorrelation time of these fluctuations increases monotonically with image contrast and, for high contrast, spans multiple dominance periods (see Appendix 1, section Characteristic times and Appendix 1—figure 7). Note that fluctuations of e¯ diminish as the number of bistable variables increases and vanishe in the deterministic limit N.

Serial dependency predicted by model and confirmed by experimental observations.

(a) Conditional expectation of dominance duration T±n (top) and of average mean evidence activity, e¯±n (bottom), in model simulations with maximal stimulus contrast (c=c=1). Dominance periods T0 were grouped into octiles, from longest (yellow) to shortest (black). For each octile, the average duration T±n of preceding and following dominance periods, as well as the average mean evidence activity e¯±n at the end of each period, is shown. All times in multiples of the overall average duration, T, and activities in multiples of the overall average activity e¯. (b) Example reversal sequence from model. Bottom: stochastic development of evidence activities e and e (red and blue traces), with large, joint fluctuations raising or lowering mean activity e¯=(e+e)/2 above or below long-term average (dashed line). Top left: episode with e¯ above average, lower Δrev, and shorter dominance periods. Top right: episode with e¯ below average, higher Δrev, and longer dominance durations. (c) Examples of reversal sequences from human observers (c=c=1 and c=c=1/2). (d) Positive lagged correlations predicted by model (mean, middle) and confirmed by experimental observations (mean ± std, top). Alternative model (Laing and Chow, 2002) with adaptation and noise (mean, bottom), fitted to reproduce the values of T, cv, γ1, and cc1 predicted by the present model (blue stars).

Crucially, fluctuations of mean evidence e¯ modulate both reversal threshold Δrev and dominance durations T, as illustrated in Figure 6a,b. To obtain Figure 6a, dominance durations were grouped into quantiles and the average duration T0 of each quantile was compared to the conditional expectation of preceding and following durations T±n (upper graph). For the same quantiles (compare color coding), average evidence activity e¯0 was compared to the conditional expectation e¯±n at the end of preceding and following periods (lower graph). Both the inverse relation between T±n and e¯±n and the autocorrelation over multiple dominance periods are evident.

This source of serial dependency – comparatively slow fluctuations of e¯ and Δrev – predicts several qualitative characteristics not reported previously and now confirmed by experimental observations. First, sequential correlations are predicted (and observed) to be strictly positive at all lags (next period, one-after-next period, and so on) (Figure 6d). In other words, it predicts that several successive dominance periods are shorter (or longer) than average.

Second, due to the contrast dependence of autocorrelation time, sequential correlations are predicted (and observed) to increase with image contrast (Figure 6d). The experimentally observed degree of contrast dependence is broadly consistent with pool sizes between N=25 and N=40 (black and red curves in Figure 3e). Larger pools with hundreds of bistable variables do not express the observed dependence on contrast (not shown).

Third, for high image contrast, reversal sequences are predicted (and observed) to contain extended episodes with dominance periods that are short or extended episodes with periods that are long (Figure 6c). When quantified in terms of a ‘burstiness index,’ the degree of inhomogeneity in predicted and observed reversal sequences is comparable (see Appendix 1, section Burstiness and Appendix 1—figure 8).

Many previous models of BR (e.g., Laing and Chow, 2002) postulated selective adaptation of competing representations to account for serial dependency. However, selective adaptation is an opponent process that favors positive correlations between different dominance periods, but negative correlations between same dominance periods. To demonstrate this point, we fitted such a model to reproduce our experimental observations (T, cV, γ1, and cc1) for five image contrasts c=c. As expected, the alternative model predicts negative correlations cc2 for same dominance periods (Figure 6d, right panel), contrary to what is observed.

Discussion

We have shown that many well-known features of BR are reproduced, and indeed guaranteed, by a particular dynamical mechanism. Specifically, this mechanism reproduces the counterintuitive input dependence of dominance durations (‘Levelt’s propositions’), the stereotypical shape of dominance distributions (‘scaling property’), and the positive sequential correlation of dominance periods. The explanatory power of the proposed mechanism is considerably higher than that of previous models. Indeed, the observations explained exhibited more effective degrees of freedom (approximately 14) than the mechanism itself (between 3 and 4).

The proposed mechanism is biophysically plausible in terms of the out-of-equilibrium dynamics of a modular and hierarchical network of spiking neurons (see also further below). Individual modules idealize the input dependence of attractor transitions in assemblies of spiking neurons. All synaptic effects superimpose linearly, consistent with extended mean-field theory for neuronal networks (Amit and Brunel, 1997; Van Vreeswijk and Sompolinski, 1996). The interaction between ‘rivaling’ sets of modules (‘pools’) results in divisive normalization, which is consistent with many cortical models (Carandini and Heeger, 2011; Miller, 2016).

It has long been suspected that multistable phenomena in visual, auditory, and tactile perception may share a similar mechanistic origin. As the features of BR explained here are in fact universal features of multistable phenomena in different modalities, we hypothesize that similar out-of-equilibrium dynamics of modular networks may underlie all multistable phenomena in all sensory modalities. In other words, we hypothesize that this may be a general mechanism operating in many perceptual representations.

Dynamical mechanism

Two principal alternatives have been considered for the dynamical mechanism of perceptual decision-making: drift-diffusion models (Luce, 1986; Ratcliff and Smith, 2004) and recurrent network models (Wang, 2008; Wang, 2012). The mechanism proposed here combines both alternatives: at its evidence level, sensory information is integrated, over both space and time, by ‘local attractors’ in a discrete version of a drift-diffusion process. At its decision level, the population dynamics of a recurrent network implements a winner-take-all competition between ‘non-local attractors.’ Together, the two levels form a ‘nested attractor’ system (Braun and Mattia, 2010) operating perpetually out of equilibrium.

A recurrent network with strong competition typically ‘normalizes’ individual responses relative to the total response (Miller, 2016). Divisive normalization is considered a canonical cortical computation (Carandini and Heeger, 2011), for which multiple rationales can be found. Here, divisive normalization is augmented by indiscriminate feedforward inhibition. This combination ensures that decision activity rapidly and reliably categorizes differential input strength, largely independently of total input strength.

Another key feature of the proposed mechanism is that a ‘dominant’ decision pool applies feedback suppression to the associated evidence pool. Selective suppression of evidence for a winning hypothesis features in computational theories of ‘hierarchical inference’ (Rao and Ballard, 1999; Lee and Mumford, 2003; Parr and Friston, 2017b; Pezzulo et al., 2018), as well as in accounts of multistable perception inspired by such theories (Dayan, 1998; Hohwy et al., 2008; Weilnhammer et al., 2017). A normative reason for feedback suppression arises during continuous inference in uncertain and volatile environments, where the accumulation of sensory information is ongoing and cannot be restricted to appropriate intervals (Veliz-Cuba et al., 2016). Here, optimal change detection requires an exponentially rising bias against evidence for the most likely state, ensuring that even weak changes are detected, albeit with some delay.

The pivotal feature of the proposed mechanism are pools of bistable variables or ‘local attractors.’ Encoding sensory inputs in terms of persistent ‘activations’ of local attractors assemblies (rather than in terms of transient neuronal spikes) creates an intrinsically retentive representation: sites that respond are also sites that retain information (for a limited time). Our results are consistent with a few tens of bistable variables in each pool. In the proposed mechanism, differential activity of two pools accumulates evidence against the dominant appearance until a threshold is reached and a reversal ensues (see also Barniv and Nelken, 2015; Nguyen et al., 2020). Conceivably, this discrete non-equilibrium dynamics might instantiate a variational principle of inference such as ‘maximum caliber’ (Pressé et al., 2013; Dixit et al., 2018).

Emergent features

The components of the proposed mechanism interact to guarantee the statistical features that characterize BR and other multistable phenomena. Discretely stochastic accumulation of differential evidence against the dominant appearance ensures sensitivity of dominance durations to non-dominant input. It also ensures the invariance of relative variability (‘scaling property’) and gamma-like distribution shape of dominance durations. Due to a non-trivial interaction with the competitive decision, discretely stochastic fluctuations of evidence-level activity express themselves in a serial dependency of dominance durations. Several features of this dependency were unexpected and not reported previously, for example, the sensitivity to image contrast and the ‘burstiness’ of dominance reversals (i.e., extended episodes in which dominance periods are consistently longer or shorter than average). The fact that these predictions are confirmed by our experimental observations provides further support for the proposed mechanism.

Relation to previous models

How does the proposed mechanism compare to previous ‘dynamical’ models of multistable phenomena? It is of similar complexity as previous minimal models (Laing and Chow, 2002; Wilson, 2007; Moreno-Bote et al., 2010) in that it assumes four state variables at two dynamical levels, one slow (accumulation) and one fast (winner-take-all competition). It differs in reversing their ordering: visual input impinges first on the slow level, which then drives the fast level. It also differs in that stochasticity dominates the slow dynamics (as suggested by van Ee, 2009), not the fast dynamics. However, the most fundamental difference is discreteness (pools of bistable variables), which shapes all key dynamical properties.

Unlike many previous models (e.g., Laing and Chow, 2002; Wilson, 2007; Moreno-Bote et al., 2007; Moreno-Bote et al., 2010; Cohen et al., 2019), the proposed mechanism does not include adaptation (stimulation-driven weakening of evidence), but a phenomenologically similar feedback suppression (perception-driven weakening of evidence). Evidence from perceptual aftereffects supports the existence of both stimulation- and perception-driven adaptation, albeit at different levels of representation. Aftereffects in the perception of simple visual features – such as orientation, spatial frequency, or direction of motion (Blake and Fox, 1974; Lehmkuhle and Fox, 1975; Wade and Wenderoth, 1978) – are driven by stimulation rather than by perceived dominance, whereas aftereffects in complex features – such as spiral motion, subjective contours, rotation in depth (Wiesenfelder and Blake, 1990; Van der Zwan and Wenderoth, 1994; Pastukhov et al., 2014a) – typically depend on perceived dominance. Several experimental observations related to BR have been attributed to stimulation-driven adaptation (e.g., negative priming, flash suppression, generalized flash suppression; Tsuchiya et al., 2006). The extent to which a perception-driven adaptation could also explain these observations remains an open question for future work.

Multistable perception induces a positive priming or ‘sensory memory’ (Pearson and Clifford, 2005; Pastukhov and Braun, 2008; Pastukhov et al., 2013a), which can stabilize a dominant appearance during intermittent presentation (Leopold et al., 2003; Maier et al., 2003; Sandberg et al., 2014). This positive priming exhibits rather different characteristics (e.g., shape-, size- and motion-specificity, inducement period, persistence period) than the negative priming/adaptation of rivaling representations (de Jong et al., 2012; Pastukhov et al., 2013a; Pastukhov and Braun, 2013b; Pastukhov et al., 2014a; Pastukhov et al., 2014b; Pastukhov, 2016). To our mind, this evidence suggest that sensory memory is mediated by additional levels of representation and not by self-stabilization of rivaling representations, as has been suggested (Noest et al., 2007; Leptourgos, 2020). To incorporate sensory memory, the present model would have to be extended to include three hierarchical levels (evidence, decision, and memory), as previously proposed by Gigante et al., 2009.

BR arises within local regions of the visual field, measuring approximately 0.25 to 0.5 in the fovea (Leopold, 1997; Logothetis, 1998). No rivalry ensues when the stimulated locations in the left and right eye are more distant from each other. The computational model presented here encompasses only one such local region, and therefore cannot reproduce spatially extended phenomena such as piecemeal rivalry (Blake et al., 1992) or traveling waves (Wilson et al., 2001). To account for these phenomena, the visual field would have to be tiled with replicant models linked by grouping interactions (Knapen et al., 2007; Bressloff and Webber, 2012).

A particularly intriguing previous model (Wilson, 2003) postulated a hierarchy with competing and adapting representations in eight state variables at two separate levels, one lower (monocular) and another higher (binocular) level. This ‘stacked’ architecture could explain the fascinating experimental observation that one image can continue to dominate (dominance durations 2s) even when images are rapidly swapped between eyes (period 1/3 s) (Kovács et al., 1996; Logothetis et al., 1996). We expect that our hierarchical model could also account for this phenomenon if it were to be replicated at two successive levels. It is tempting to speculate that such ‘stacking’ might have a normative justification in that it might subserve hierarchical inference (Yuille and Kersten, 2006; Hohwy et al., 2008; Friston, 2010).

Another previous model (Li et al., 2017) used a hierarchy with 24 state variables at three separate levels to show that a stabilizing influence of selective visual attention could also explain slow rivalry when images are swapped rapidly. Additionally, this rather complex model reproduced the main features of Levelt’s propositions, but did not consider scaling property and sequential dependency. The model shared some of the key features of the present model (divisive inhibition, differential excitation-inhibition), but added a multiplicative attentional modulation. As the present model already incorporates the ‘biased competition’ that is widely thought to underlie selective attention (Sabine and Ungerleider, 2000; Reynolds and Heeger, 2009), we expect that it could reproduce attentional effects by means of additive modulations.

Continuous inference

The notion that multistable phenomena such as BR reflect active exploration of explanatory hypotheses for sensory evidence has a venerable history (von Helmholtz, 1867; Barlow et al., 1972; Gregory, 1980; Leopold and Logothetis, 1999). The mechanism proposed here is in keeping with that notion: higher-level ‘explanations’ compete for control (‘dominance’) of phenomenal appearance in terms of their correspondence to lower-level ‘evidence.’ An ‘explanation’ takes control if its correspondence is sufficiently superior to that of rival ‘explanations.’ The greater the superiority, the longer control is retained. Eventually, alternative ‘explanations’ seize control, if only briefly. This manner of operation is also consistent with computational theories of ‘analysis by synthesis’ or ‘hierarchical inference,’ although there are many differences in detail (Rao and Ballard, 1999; Parr and Friston, 2017b; Pezzulo et al., 2018).

Interacting with an uncertain and volatile world necessitates continuous and concurrent evaluation of sensory evidence and selection of motor action (Cisek and Kalaska, 2010; Gold and Stocker, 2017). Multistable phenomena exemplify continuous decision-making without external prompting (Braun and Mattia, 2010). Sensory decision-making has been studied extensively, mostly in episodic choice-task, and the neural circuits and activity dynamics underlying episodic decision-making – including representations of potential choices, sensory evidence, and behavioral goals – have been traced in detail (Cisek and Kalaska, 2010; Gold and Shadlen, 2007; Wang, 2012; Krug, 2020). Interestingly, there seems to be substantial overlap between choice representations in decision-making and in multistable situations (Braun and Mattia, 2010).

Continuous inference has been studied extensively in auditory streaming paradigms (Winkler et al., 2012; Denham et al., 2014). The auditory system seems to continually update expectations for sound patterns on the basis of recent experience. Compatible patterns are grouped together in auditory awareness, and incompatible patterns result in spontaneous reversals between alternatives. Many aspects of this rich phenomenology are reproduced by computational models driven by some kind of ‘prediction error’ (Mill et al., 2013). The dynamics of two recent auditory models (Barniv and Nelken, 2015; Nguyen et al., 2020) are rather similar to the model presented here: while one sound pattern dominates awareness, evidence against this pattern is accumulated at a subliminal level.

Relation to neural substrate

What might be the neural basis of the bistable variables/‘local attractors’ proposed here? Ongoing activity in sensory cortex appears to be low-dimensional, in the sense that the activity of neurons with similar response properties varies concomitantly (‘shared variability,’ ‘noise correlations,’ Ponce-Alvarez et al., 2012, Mazzucato et al., 2015, Engel et al., 2016, Rich and Wallis, 2016, Mazzucato et al., 2019). This shared variability reflects the spatial clustering of intracortical connectivity (Muir and Douglas, 2011; Okun et al., 2015; Cossell et al., 2015; Lee et al., 2016; Rosenbaum et al., 2017) and unfolds over moderately slow time scales (in the range of 100 ms to 500 ms) both in primates and rodents (Ponce-Alvarez et al., 2012; Mazzucato et al., 2015; Cui et al., 2016; Engel et al., 2016; Rich and Wallis, 2016; Mazzucato et al., 2019).

Possible dynamical origins of shared and moderately slow variability have been studied extensively in theory and simulation (for reviews, see Miller, 2016; Huang and Doiron, 2017; La Camera et al., 2019). Networks with weakly clustered connectivity (e.g., 3% rewiring) can express a metastable attractor dynamics with moderately long time scales (Litwin-Kumar and Doiron, 2012; Doiron and Litwin-Kumar, 2014; Schaub et al., 2015; Rosenbaum et al., 2017). In a metastable dynamics, individual (connectivity-defined) clusters transition spontaneously between distinct and quasi-stationary activity levels (‘attractor states’) (Tsuda, 2001; Stern et al., 2014).

Evidence for metastable attractor dynamics in cortical activity is accumulating steadily (Mattia et al., 2013; Mazzucato et al., 2015; Rich and Wallis, 2016; Engel et al., 2016; Marcos et al., 2019; Mazzucato et al., 2019). Distinct activity states with exponentially distributed durations have been reported in sensory cortex (Mazzucato et al., 2015; Engel et al., 2016), consistent with noise-driven escape transitions (Doiron and Litwin-Kumar, 2014; Huang and Doiron, 2017). And several reports are consistent with external input modulating cortical activity mostly indirectly, via the rate of state transitions (Fiser et al., 2004; Churchland et al., 2010; Mazzucato et al., 2015; Engel et al., 2016; Mazzucato et al., 2019).

The proposed mechanism assumes bistable variables with noise-driven escape transitions, with transition rates modulated exponentially by external synaptic drive. Following previous work (Cao et al., 2016), we show this to be an accurate reduction of the population dynamics of metastable networks of spiking neurons.

Unfortunately, the spatial structure of the ‘shared variability’ or ‘noise correlations’ in cortical activity described above is poorly understood. However, we estimate that the cortical representation of our rivaling display involves approximately 400 mm2 and 200 mm2 of cortical surface in cortical areas V1 and V4, respectively (Winawer and Witthoft, 2015; Winawer and Benson, 2021). Accordingly, in each of these two cortical areas, the neural representation of rivaling stimulation can comfortably accommodate several thousand recurrent local assemblies, each capable of expressing independent collective dynamics (i.e., ‘classic columns’ comprising several ‘minicolumns’ with distinct stimulus selectivity Nieuwenhuys R, 1994, Kaas, 2012). Thus, our model assumes that the representation of two rivaling images engages approximately 1–2% of the available number of recurrent local assemblies.

Neurophysiological correlates of BR

Neurophysiological correlates of BR have been studied extensively, often by comparing reversals of phenomenal appearance during binocular stimulation with physical alternation (PA) of monocular stimulation (e.g., Leopold and Logothetis, 1996; Scheinberg and Logothetis, 1997; Logothetis, 1998; Wilke et al., 2006; Aura et al., 2008; Keliris et al., 2010; Panagiotaropoulos et al., 2012; Bahmani et al., 2014; Xu et al., 2016; Kapoor et al., 2020; Dwarakanath et al., 2020). At higher cortical levels, such as inferior temporal cortex (Scheinberg and Logothetis, 1997) or prefrontal cortex (Panagiotaropoulos et al., 2012; Kapoor et al., 2020; Dwarakanath et al., 2020), BR and PA elicit broadly comparable neurophysiological responses that mirror perceptual appearance. Specifically, activity crosses its average level at the time of each reversal, roughly in phase with perceptual appearance (Scheinberg and Logothetis, 1997; Kapoor et al., 2020). In primary visual cortex (area V1), where many neurons are dominated by input from one eye, neurophysiological correlates of BR and PA diverge in an interesting way: whereas modulation of spiking activity is weaker during BR than PA (Leopold and Logothetis, 1996; Logothetis, 1998; Wilke et al., 2006; Aura et al., 2008; Keliris et al., 2010), measures thought to record dendritic inputs are modulated comparably under both conditions (Aura et al., 2008; Keliris et al., 2010; Bahmani et al., 2014; Yang et al., 2015; Xu et al., 2016). A stronger divergence is observed at an intermediate cortical level (visual area V4), where neurons respond to both eyes. Whereas some units modulate their spiking activity comparably during BR and PA (i.e., increased activity when preferred stimulus becomes dominant), other units exhibit the opposite modulation during BR (i.e., reduced activity when preferred stimulus gains dominance) (Leopold and Logothetis, 1996; Logothetis, 1998; Wilke et al., 2006). Importantly, at this intermediate cortical level, activity crosses its average level well before and after each reversal (Leopold and Logothetis, 1996; Logothetis, 1998), roughly in quarter phase with perceptual appearance.

Some of these neurophysiological observations are directly interpretable in terms of the model proposed here. Specifically, activity modulation at higher cortical levels (inferotemporal cortex, prefrontal cortex) could correspond to ‘decision activity,’ predicted to vary in phase with perceptual appearance. Similarly, activity modulation at intermediate cortical levels (area V4) could correspond to ‘evidence activity,’ which is predicted to vary in quarter phase with perceptual appearance. This identification would also be consistent with the neurophysiological evidence for attractor dynamics in columns of area V4 (Engel et al., 2016). The subpopulation of area V4 with opposite modulation could mediate feedback suppression from decision levels. If so, our model would predict this subpopulation to vary in counterphase with perceptual appearance. Finally, the fascinating interactions observed within primary visual cortex (area V1) are well beyond the scope of our simple model. Presumably, a ‘stacked’ model with two successive levels of competitive interactions at monocular and binocular levels or representation (Wilson, 2003; Li et al., 2017) would be required to account for these phenomena.

Conclusion

As multistable phenomena and their characteristics are ubiquitous in visual, auditory, and tactile perception, the mechanism we propose may form a general part of sensory processing. It bridges neural, perceptual, and normative levels of description and potentially offers a ‘comprehensive task-performing model’ (Kriegeskorte and Douglas, 2018) for sensory decision-making.

Materials and methods

Psychophysics

Request a detailed protocol

Six practiced observers participated in the experiment (four males, two females). Informed consent, and consent to publish, was obtained from all observers, and ethical approval Z22/16 was obtained from the Ethics Commission of the Faculty of Medicine of the Otto-von-Guericke University, Magdeburg. Stimuli were displayed on an LCD screen (EIZO ColorEdge CG303W, resolution 2560×1600 pixels, viewing distance was 104 cm, single pixel subtended 0.014, refresh rate 60 Hz) and were viewed through a mirror stereoscope, with viewing position being stabilized by chin and head rests. Display luminance was gamma-corrected and average luminance was 50 cd/m2.

Two grayscale circular orthogonally oriented gratings (+45 and 45) were presented foveally to each eye. Gratings had diameter of 1.6, spatial period 2 cyc/deg. To avoid a sharp outer edge, grating contrast was modulated with Gaussian envelope (inner radius 0.6, σ=0.2). Tilt and phase of gratings was randomized for each block. Five contrast levels were used: 6.25, 12.5, 25, 50, and 100%. Contrast of each grating was systematically manipulated, so that each contrast pair was presented in two blocks (50 blocks in total). Blocks were 120s long and separated by a compulsory 1 min break. Observers reported on the tilt of the visible grating by continuously pressing one of two arrow keys. They were instructed to press only during exclusive visibility of one of the gratings, so that mixed percepts were indicated by neither key being pressed (25% of total presentation time). To facilitate binocular fusion, gratings were surrounded by a dichoptically presented square frame (outer size 9.8°, inner size 2.8°).

Dominance periods of ‘clear visibility’ were extracted in sequence from the final 90s of each block and the mean linear trend was subtracted from all values. Values from the initial 30s were discarded. To make comparable the dominance periods of different observers, values were rescaled by the ratio of the all-condition-all-observer average (2.5s) and the all-condition average of each observer (2.5±1.3s). Finally, dominance periods from symmetric conditions (cleft,cright) with cleft=cright were combined into a single category (cdom,csup), where cdom (csup) was the contrast viewed by the dominant (suppressed) eye. The number of observed dominance periods ranged from 900 to 1700 per contrast combination (1300±240).

For the dominance periods T observed in each condition, first, second, and third central moments were computed, as well as coefficient of variation cV and skewness γ1 relative to coefficient of variation:

μ1=T,μ2=T2T,μ3=T33TT2+2T3
cV=μ2μ1,γ1cV=μ3μ1μ22

The expected standard error of the mean for distribution moments is 2% for the mean, 3% for the coefficient of variation, and 12% for skewness relative to coefficient of variation, assuming 1000 gamma-distributed samples.

Coefficients of sequential correlations were computed from pairs of periods (Ti,Tj) with opposite dominance (first and next: ‘lag’ ji=1), pairs of periods with same dominance (first and next but one: ‘lag’ ji=2), and so on,

cck=TiTiTjTj(Ti2Ti2)(Tj2Tj2)

where T and T2 are mean duration and mean square duration, respectively. The expected standard deviation of the coefficient of correlation is 0.03, assuming 1000 gamma-distributed samples.

To analyze ‘burstiness,’ we adapted a statistical measure used in neurophysiology (Compte et al., 2003). First, sequences of dominance periods were divided into all possible subsets of k{2,3,,16} successive periods and mean durations computed for each subset. Second, heterogeneity was assessed by computing, for each size k, the coefficient of variation cV over mean durations, compared to the mean and variance of the corresponding coefficient of variation for randomly shuffled sequences of dominance periods. Specifically, a ‘burstiness index’ was defined for each subset size k as.

BI(k)=cVcVshufflecV2shufflecVshuffle2

where cV is the coefficient of variation over subsets of size k and where cVshuffle and cV2shuffle are, respectively, mean and mean square of the coefficients of variation from shuffled sequences.

Model

Request a detailed protocol

The proposed mechanism for BR dynamics relies on discretely stochastic processes (‘birth-death’ or generalized Ehrenfest processes). Bistable variables x{0,1} transition between active and inactive states with time-varying Poisson rates ν+(t) (activation) and ν(t) (inactivation). Two ‘evidence pools’ of N such variables, E and E, represent two kinds visual evidence (e.g., for two visual orientations), whereas two ‘decision pools,’ R and R, represent alternative perceptual hypotheses (e.g., two grating patterns) (see also Appendix 1—figure 1). Thus, instantaneous dynamical state is represented by four active counts ne,ne,nr,nr[0,N] or, equivalently, by four active fractions e,e,r,r[0,1].

The development of pool activity over time is described by a master equation for probability Pn(t) of the number n(t)[0,N] active variables.

(5) tPn(t)=(Nn+1)ν+Pn1(t)+(n+1)νPn+1(t)[(Nn)ν++nν]Pn(t)

For constant ν±, the distribution Pn(t) is binomial at all times Karlin and McGregor, 1965, van Kampen, 1981. The time development of the number of active units nX(t) in pool X is an inhomogeneous Ehrenfest process and corresponds to the count of activations, minus the count of deactivations,

ΔnX(t)=B(NnX,ν+Δt)activationsB(nX,νΔt)inactivations

where B(n,νΔt) is a discrete random variable drawn from a binomial distribution with trial number n and success probability νΔt.

All variables of a pool have identical transition rates, which depend exponentially on the ‘potential difference’ Δu=u+u0 between states, with a input-dependent component u and a baseline component u0:

νs±=νs2e±(ue+ue0)/2,νs±=νs2e±(ue+ue0)/2νr±=νr2e±(ur+ur0)/2,νr±=νr2e±(ur+ur0)/2

where νe and νr are baseline rates and ue0 and ur0 baseline components. The input-dependent components of effective potentials are modulated linearly by synaptic couplings

us=wvisf(c)wsupprus=wvisf(c)wsupprur=wexcewinh(e+e)+wcooprwcomprur=wexcewinh(e+e)+wcooprwcompr

Visual inputs are I=f(c) and I=f(c), respectively, where

f(c)=ln(1+c/γ)ln(1+1/γ){0,1}

is a monotonically increasing, logarithmic function of image contrast, with parameter γ.

Degrees of freedom

Request a detailed protocol

The proposed mechanism has 11 independent parameters – 6 synaptic couplings, 2 baseline rates, 2 baseline potentials, 1 contrast nonlinearity – which were fitted to experimental observations. A 12th parameter – pool size – remained fixed.

SymbolDescriptionValue
NPool size25
1/veBaseline rate, evidence1.95 ± 0.10 s
1/vrBaseline rate, decision0.018 ± 0.010 s
ue0Baseline potential, evidence-1.65 ± 0.24
ur0Baseline potential, decision-4.94 ± 0.67
wvisVisual input coupling1.780 ± 0.092
wexcFeedforward excitation152.2 ± 3.7
winhFeedforward inhibition32.10 ± 2.3
wcompLateral competition33.4 ± 1.2
wcoopLateral cooperation15.21± 0.59
wsuppFeedback suppression2.34 ± 0.14
γContrast nonlinearity0.071 ± 0.011

Fitting procedure

Request a detailed protocol

The experimental dataset consisted of two 5 × 5 arrays Xiexp for mean T and coefficient of variation cV, plus two scalar values for skewness γ1=2 and correlation coefficient cc1=0.06. The two scalar values corresponded to the (rounded) average values observed over the 5 × 5 combinations of image contrast. In other words, the fitting procedure prescribed contrast dependencies for the first two distribution moments, but not for correlation coefficients.

The fit error Efit was computed as a weighted sum of relative errors

Efit=i=14wiδi/i=14wi,δi=|XimodXiexpX¯iexp|

with weighting w=[1,1,1,1/4] emphasizing distribution moments.

Approximately 400 minimization runs were performed, starting from random initial configurations of model parameters. For the optimal parameter set, the resulting fit error for the mean observer dataset was approximately 13%. More specifically, the fit errors for mean dominance T, coefficient of variation cV, relative skewness γ1/cV, and correlation coefficients cc1 and cc2 were 9.8, 7.9, 8.7, 70, and 46%, respectively. Here, fit errors for relative skewness and correlation coefficients were computed for the isocontrast conditions, where experimental observations were least noisy.

To confirm that resulting fit was indeed optimal and could not be further improved, we studied the behavior of the fit error in the vicinity of the optimal parameter set. For each parameter αi, 30 values αi(j) were picked in the direct vicinity of the optimal parameter αiopt (Appendix 1—figure 9). The resulting scatter plot of value pairs αi(j) and fit error Efit(j) was approximated by a quadratic function, which provided 95% confidence intervals for αi(j). For all parameters except νr, the estimated quadratic function was convex and the coefficient of the Hessian matrix associated with the fit error was positive. Additionally, the estimated extremum of each parabola was close to the corresponding optimal parameter, confirming that the parameter set was indeed optimal (Appendix 1—figure 9).

To minimize fit error, we repeated a stochastic gradient descent from randomly chosen initial parameter. Interestingly, the ensemble of suboptimal solutions found by this procedure populated a low-dimensional manifold of the parameter space in three principal components accounted for 95% of the positional variance. Thus, models that reproduce experimental observations with varying degrees of freedom exhibit only 3–4 effective degrees of freedom. We surmise that this is due, on the one hand, to the severe constraints imposed by our model architecture (e.g., discrete elements, exponential input dependence of transition rates) and, on the other hand, by the requirement that the dynamical operating regime behaves as a relaxation oscillator.

In support of this interpretation, we note that our 5 × 5 experimental measurements of T and cV were accurately described by ‘quadric surfaces’ (z=a1+a2x+a3y+a4x2+a5xy+a6y2) with six coefficients each. Together with the two further measurements of γ1/cV and cc1, our experimental observations accordingly exhibited approximately 6×2+2=14 effective degrees of freedom. This number was sufficient to constrain the 3–4 dimensional manifold of parameters, where the model operated as a relaxation oscillator with a particular dynamics, specifically, a slow-fast dynamics associated, respectively, with the accumulation and reversal phases of BR.

Alternative model

Request a detailed protocol

As an alternative model (Laing and Chow, 2002), a combination of competition, adaptation, and image-contrast-dependent noise was fitted to reproduce four 5 × 5 arrays Xiexp for mean T, coefficient of variation cV, skewness γ1, and correlation coefficient cc1. Fit error Efit was computed as the average of relative errors

Efit=1ni=1nδi,δi=|XimodXiexpX¯iexp|

For purposes of comparison, a weighted fit error with weighting w=[1,1,1,1/4] was computed, as well.

The model comprised four state variables and independent colored noise:

τr r˙1,2=r1,2+F(βr2,1ϕaa1,2+I1,2+n1,2)τa a˙1,2=a1,2+r1,2τn n˙1,2=n1,2+σ1,22τnξ(t)

where F(x)=[1+exp(x/κ)]1 is a nonlinear activation function and ξ(t) is white noise.

Additionally, both input I1,2 and noise amplitude σ1,2 were assumed to depend nonlinearly on image contrast c1,2:

I1,2=f(c1,2)=bIc1,2kI,σ1,2=g(c1,2)=bσc1,2kσ

This coupling between input and noise amplitude served stabilizes the shape of dominance distributions over different image contrasts (‘scaling property’).

Parameters for competition β = 10, activity time constant τr=50 ms, noise time constant τn=500 ms, and activation function k=0.1 were fixed. Parameters for adaptation strength ϕa[1,100], adaptation time constant τa[1,00], contrast dependence of input bI[1,5], kI[0.1,5], and contrast dependence of noise amplitude bσ[0.1,1], kσ[0.1,1] were explored within the ranges indicated.

The best fit (determined with a genetic algorithm) was as follows: ϕa=18.39, τa=22.78, kI=1.52, bI=2.92, kσ=0.57, bσ=0.19. The fit errors for mean dominance T, coefficient of variation cV, skewness γ1, and correlation coefficient cc1 were, respectively, 11.3, 8.3, 20, and 55%. The fit error for correlation coefficient cc2 was 180% (because the model predicted negative values). The combined average for T, cV, and γ1 was 13.2%. The fit error obtained with weighting w=(1,1,1,1/4) was 16.4%.

For Figure 6d, the alternative model was fitted only to observations at equal image contrast, c=c: mean dominance T, coefficient of variation cV, skewness γ1, and correlation coefficient cc1. The combined average fit error for T, cV, and γ1 was 11.2%. The combined average for all four observables was 22%.

Spiking network simulation

Request a detailed protocol

To illustrate a possible neural realization of ‘local attractors,’ we simulated a competitive network with eight identical assemblies of excitatory and inhibitory neurons, which collectively expresses a spontaneous and metastable dynamics (Mattia et al., 2013). One assembly (denoted as ‘foreground’) comprised 150 excitatory leaky-integrate-and-fire neurons, which were weakly coupled to the 1050 excitatory neurons of the other assemblies (denoted as ‘background’), as well as 300 inhibitory neurons. Note that background assemblies are not strictly necessary and are included only for the sake of verisimilitude. The connection probability between any two neurons was c=2/3. Excitatory synaptic efficacy between neurons in the same assembly and in two different assemblies was Jintra=0.612mV and Jinter=0.403mV, respectively. Inhibitory synaptic efficacy was JI=1.50mV, and the efficacy of excitatory synapses onto inhibitory neurons was JIE=0.560mV. Finally, ‘foreground’ neurons, ‘background neurons,’ and ‘inhibitory neurons’ each received independent Poisson spike trains of 2400Hz, 2280Hz and 2400Hz, respectively. Other settings were as in Mattia et al., 2013. As a result of these settings, ‘foreground’ activity transitioned spontaneously between an ‘off’ state of approximately 4Hz and an ‘on’ state of approximately 40Hz.

Appendix 1

Model schematics

Appendix 1—figure 1
Proposed mechanism of binocular rivalry dynamics (schematic).

Bistable variables are represented by white (inactive) or red (active) circles. Four pools, each with N=25 variables, are shown: two evidence pools E and E, with active counts ne(t) and ne(t), and two decision pools, R and R, with active counts nr(t) and nr(t). Excitatory and inhibitory synaptic couplings include selective feedforward excitation wexc, indiscriminate feedforward inhibition winh, recurrent excitation wcoop, and mutual inhibition wcomp of decision pools, as well as selective feedback suppression wsupp of evidence pools. Visual input to evidence pools f(c) and f(c) is a function of image contrast c and c.

Metastable attractor dynamics

Appendix 1—figure 2
Metastable dynamics of spiking neural network.

(a) Eight assemblies of excitatory neurons (schematic, light and dark gray disks) and one pool of inhibitory neurons (white disc) interact competitively with recurrent random connectivity. We focus on one ‘foreground’ assembly (dark gray), with firing rate νfore and selective external input Δνext. (b) ‘Foreground’ activity explores an effective energy landscape with two distinct steady states (circles), separated by ridge points (diamonds). As this landscape changes with external input Δνext, transition rates ν± between ‘on’ and ‘off’ states also change with external input. (c) Simulation to establish transition rates ν± of foreground assembly. External input Δνext is stepped periodically between 44Hz and 4Hz. Spiking activity of 10 representative excitatory neurons in a single trial, population activity over 25 trials, thresholded population activity over 25 trials, and activation probability (fraction of ‘on’ states). (d) Relaxation dynamics in response to step change of Δνext, with ‘on’ transitions (left) and ‘off’ transitions (right). (e) Average state transition rates ν± vary anti-symmetrically and exponentially with external input: ν+2.2 Hzexp(+0.8 5sΔνext) and ν0.5 Hzexp(0.79 sΔνext) (red and blue lines).

We postulate assemblies or clusters of neurons with recurrent random connectivity as operative units of sensory representations. In our model, such assemblies are reduced to binary variables with Poisson transitions. Our key assumption is that the rates ν± of activation and inactivation events are modulated exponentially by synaptic input (Equation 1):

ν±=νe±(ws+u0)

Here, we show that these assumptions are a plausible reduction of recurrently connected assemblies of spiking neurons.

Following earlier work, we simulated a competitive network with eight identical assemblies of excitatory and inhibitory neurons (Appendix 1—figure 2a), configured to collectively express a metastable activity dynamics (Mattia et al., 2013). Here, we are interested particularly in the activity dynamics of one excitatory assembly (dubbed ‘foreground’), which expresses two quasi-stable ‘attractor’ states: an ‘on’ state with high activity. In the context of the metastable network, the ‘foreground’ assembly is bistable in that it transitions spontaneously between ‘on’ and ‘off’ states. Such state transitions are noise-driven escape events from an energy well and therefore occur with Poisson-like rates ν+ (activation) and ν (inactivation). Figure 1b and Appendix 1—figure 2b illustrate this energy landscape for the ‘diffusion limit’ of very large assemblies, where quasi-stable activity levels are νfore45Hz for the ‘on’ state and νfore4Hz for the ‘off’ state. For small assemblies with fewer neurons, the difference between ‘on’ and ‘off’ states is less pronounced.

To establish the dependence of transition rates on external input to the ‘foreground’ assembly, we stepped external input rate Δνext between two values selected from a range Δνext[120Hz,50Hz] and monitored the resulting spiking activity in individual neurons, as well as activity νfore of the entire population (Appendix 1—figure 2c, upper and middle panels). Comparing population activity to a suitable threshold, we identified ‘on’ and ’off’ states of the ‘foreground’ assembly (Appendix 1—figure 2c, lower panel), as well as the probability of ‘on’ or ‘off’ states at different points in time following a step in Δνext (Appendix 1—figure 2d). From the hazard rate (temporal derivative of probability), we then estimated the rates ν± of state transitions shown in Appendix 1—figure 2d. Transition rates ν± vary approximately anti-symmetrically and exponentially with external input Δνext. In the present example, ν+2.2 Hzexp(+0.85 sΔνext) and ν0.5 Hzexp(0.79 sΔνext) (Appendix 1—figure 2e, red and blue lines). This Arrhenius–Van’t-Hoff-like dependence of escape rates is a consequence of the approximately linear dependence of activation energy on external input. Escape kinetics is typical for attractor systems and motivates Equation 1.

Quality of representation

Accumulation of information

A birth-death process – defined as N bistable variables with transition rates ν±=ve±ws, where ν is a baseline rate and w a coupling constant – accumulates and retains information about input s, performing as a ‘leaky integrator’ with a characteristic time scale [Braun and Mattia, 2010]. Specifically, the value of s may be inferred from fractional activity x(t) at time t, if coupling w and baseline rate ν are known. The inverse variance of the maximum likelihood estimate is given by the Fisher information

(1) Jx(s,t)=N[sx]2x(1x)

Its value grows with time, approaching Jx=Nw2/cosh2(ws/2) for t. For small inputs s0, the Fisher information increases monotonically as Jx(t)(Nw2/4)tanh(νt/2). Surprisingly, the upper bound of JxNw2/4 depends linearly on pool size N, but quadratically on coupling w. Thus, stronger coupling substantially improves encoding accuracy (of input s).

The rate at which Fisher information is accumulated by a pool is set by the baseline transition rate ν. An initially inactive pool, with n0 = 0, accumulates Fisher information at an initial rate of tJx|t=0=(νNw2/4)ews/2. Thus, any desired rate of gaining Fisher information may be obtained by choosing an appropriate value for ν. However, unavoidably, after an input s has ceased (and was replaced by another), information about s is lost at the same rate.

Appendix 1—figure 3
Information retained by stochastic pool activity from normally distributed inputs.

Inputs sN(μ,σ) provide Fisher information Js=1σ2 about mean μ. Stochastic activity n(t) of a birth-death process (N{10,20,40,80} and w=2.5) driven by such inputs accumulates Fisher information Jn(t) about mean μ. (a) Accumulation over input interval t=[0,1] of fractional information Jrel(t)=Jn(t)σ2 by an initially inactive pool of size N. (b) Information about μ retained by summed activity n^=n1++n4 of four independent pools (all initially inactive and of size N) receiving concurrently four independent inputs (sN(μ,σ)) over an interval t=[0,1]. Retained fraction Jrel=Jn^(1)σ2/4 depends on pool size N and input variance σ2. (c) Information about μ retained by activity n of one pool (initially inactive and of size N) receiving successively four independent inputs (sN(μ,σ)) over an interval t=[0,4]. Retained fraction Jrel=Jn(4)σ2/4 depends on pool size N and input variance σ2.

Integration of noisy samples

Birth-death processes are able to encode also noisy sensory inputs, capturing much of the information provided. When an initially inactive pool receives an input s over time t, stochastic activity n(t) gradually accumulates information about the value of s. Normally distributed inputs sN(μ,σ) provide Fisher information Js=1/σ2 about mean µ. Pool activity n(t) accumulates Fisher information Jn(t) about input mean µ, which may be compared to Js. Comparatively small pools with strong coupling (e.g., N=25, w=2.5) readily capture 90% of the information provided (Appendix 1—figure 5a).

Moreover, pools readily permit information from multiple independent inputs to be combined over space and/or time. For example, the combined activity of four pools (N=25, w=2.5), which receive concurrently four independent samples, captures approximately 80% of the information provided, and a single pool receiving four samples in succession still retains approximately 60% of the information provided (Appendix 1—figure 5b,c). In the latter case, retention is compromised by the ‘leaky’ nature of stochastic integration. Whether signals are being integrated over space or time, the retained fraction of information is highest for inputs of moderate and larger variance σ2 (Appendix 1—figure 5b,c). This is because inputs with smaller variance are degraded more severely by the internal noise of a birth-death process (i.e., stochastic activations and inactivations).

Suitability for inference

Summation of heterogeneous neural responses can be equivalent to Bayesian integration of sensory information [Beck et al., 2008; Pouget et al., 2013]. In general, this is the case when response variability is ‘Poisson-like’ and response tuning differs only multiplicatively [Ma et al., 2006; Ma et al., 2008]. We now show that bistable stochastic variables xi(t), with heterogeneous transition rates νi±(s), satisfy these conditions as long as synaptic coupling w is uniform.

Assuming initially inactive variables, xi(0)=0, incremental responses xi(Δt) after a short interval Δt are binomially distributed about mean xi(Δt), which is approximately

xi(Δt)Δtdxidt|t=0=νiΔt/2Φiews/2f(s)

where ϕi=νiΔt/2 reflects (possibly heterogeneous) response tuning and f(s)=ews/2 represents a common response function which depends only on synaptic coupling w. The Fisher information, about s, of individual responses is

Ji(s)=[sxi]2xi[1xi]  ϕif2(s)f(s),xi1

as long as expected activation xi is small. The Fisher information of summed responses ixi is

Jsum[f2(s)iϕi]2f(s)iϕi=f2(s)f(s)iϕi=iJi(s)

and equals the combined Fisher information of individual responses. Accordingly, the summation of bistable activities with heterogeneous transition rates νi optimally integrates information, provided expected activations remain small, xi1, and synaptic coupling w is uniform.

Categorical choice

The ‘biased competition’ circuit proposed here expresses a categorical decision by either raising r towards unity (and lowering r towards zero) or vice versa. Here, we describe its stochastic steady-state response to constant visual inputs I=f(c) and I=f(c) and for arbitrary initial conditions of e, e, r and r (Appendix 1—figure 3). Note that, for purposes of this analysis, evidence activity e, e was not subject to feedback suppression.

The choice is random when the input is ambiguous, II, but quickly becomes deterministic with growing input bias |II|§gt;0. Importantly, the choice is consistently determined by visual input for all initial conditions. The 75% performance level is reached for biases |II|0.04to0.06.

Mutual inhibition wcomp controls the width of the ambiguous region around I=I, and self-excitation wcoop ensures a categorical decision even for small I,I0. The balance between feedforward excitation wexc and inhibition winh eliminates decision failures for all but the largest values of I,I§gt;0.7 and reduces the degree to which sensitivity to differential input |II| varies with total input I+I.

For particularly high values of input I,I§gt;0.7, no categorical decision is reached and activities of both r and r grow above 0.5. In the full model, such inconclusive outcomes are eliminated by feedback suppression.

Deterministic dynamics

In the deterministic limit of N, fractional pool activity x equals its expectation x and the relaxation dynamics of Equation 2 becomes

τxdxdt=x+x

with characteristic time τx=1ν++ν=Υ(Δu) and asymptotic values x=ν+ν++ν=Φ(Δu), where Δu is the potential difference. Input dependencies of characteristic time and of asymptotic value follow from Equation 1:

Υ(s)=1νsechΔu2,Φ(s)=[1+eΔu]1

Evidence pools

The relaxation dynamics of evidence pools is given by Equation 2 and Equation 3′. As shown in the next section, reversals occur when evidence difference |ee| reaches a reversal threshold Δrev. For example, a dominance period of evidence e begins with estart=estart+Δrev and ends when the concurrent habituation of e and recovery of e have inverted the situation to eend=eend+Δrev (Appendix 1—figures 4). Once the deterministic limit has settled into a limit cycle, all dominance periods start from, and end at, the same evidence levels.

Appendix 1—figure 4
Decision response to fixed input I, I, for random initial conditions of e, e, r, r.

(a) Expected differential steady-state activation |rr| of decision level. Steady-state activity r+r1 implies a categorical decision with activity 1 of one pool and activity of another. (b) Probability that decision correctly reflects input bias (r§gt;r if I§gt;I), and vice versa.

If pool R has just become dominant, so that r1 and r0, the state-dependent potential differences are

uewvisf(c)uewvisf(c)wcoop

and the deterministic development is

τededt=e+e,τededt=e+e

with asymptotic values

e=Φ(ue+ue0),e=Φ(ue+ue0)

and characteristic times

τe=Υ(ue+ue0),τe=Υ(ue+ue0),
Appendix 1—figure 5
Exponential habituation and recovery of evidence activities.

Dominance durations depend on distance between asymptotic values and on characteristic times. (a, b) Development of evidence e (blue) and e (red), over two successive dominance periods. Input c=15/16 is stronger, input c=1/16 weaker. Activities recover, or habituate, exponentially until reversal threshold Δrev is reached. Thin curves extrapolate to the respective asymptotic values, e and e. (a) Evidence e (with weaker input c) is dominant. Incrementing input c to non-dominant evidence e shortens dominance T. (b) Evidence e (with stronger input c) is dominant. Incrementing input c to e extends dominance T. (c–f) Contrast dependence of relaxation dynamics, as a function of differential contrast cc, for c+c=1. Values when evidence e is dominant (dom, thick solid curves), and when it is non-dominant (sup, thick dotted curves). Values for e are mirror symmetric (about vertical midline cc=0). (c) Effective potential Δue+ue. (d) Characteristic time τe. (e) Relaxation range ereve (bottom left, thin curves erev, thick curves e). (f) Effective rate ρe of development. Symbols and arrows correspond to subfigures (a, b) and represent recovery (up arrow) or habituation (down arrow) of stronger-input evidence (blue) or weaker-input evidence (red). Underlying color patches indicate dominance of stronger-input evidence (blue patches) or of weaker-input evidence (red patches). Dominance durations depend more sensitively on the slower development, with smaller ρ, which generally is the recovery of non-dominant evidence (up arrows).

The starting points of the development, erev and erev (dashed lines in Appendix 1—figure 4a,b), depend mostly on total input c+c and only little on input difference cc. Accordingly, for a given level of total input c+c, the situation is governed by the distance between asymptotic evidence levels Δ=ee and by characteristic times τe, τe.

The dependence on input bias cc of effective potential Δue+ue, characteristic time τe, and asymptotic value e is illustrated in Appendix 1—figure 4c–e. The potential range of relaxation is ereve and ereve, where reversal levels erev and erev can be obtained numerically.

Dominance durations depend more sensitively on the slower of the two concurrent processes as it sets the pace of the combined development. The initial rates ρe and ρe after a reversal of the two opponent relaxations

ρe=d|e|dt=|ereve|τe,ρe=d|e|dt=|ereve|τe

provide a convenient proxy for relative rate. As shown in app. Figure 4f, when stronger-input evidence e dominates, recovery of weaker-input evidence (red up arrow on blue background) is slower than habituation of stronger-input evidence (blue down arrow on blue background). Conversely, when weaker-input evidence e dominates, recovery of stronger-input evidence (blue up arrow on red background) is slower than habituation of weaker-input evidence (red down arrow on red background). In short, dominance durations always depend more sensitively on the recovery of the currently non-dominant evidence than on the habituation of the currently dominant evidence.

If the two evidence populations E, E have equal and opposite potential differences, Δue=Δue, then they also have equal and opposite activation and inactivation rates (Equation 1)

νe+=νe=ν+,νe=νe+=ν

and identical characteristic times τe (recovery of E) and τe (habituation of E). In this special case, the two processes may be combined and the development of evidence difference Δe=ee is

τΔdΔe(t)dt=Δe(t)+Δ
τΔ=1ν++ν,Δ=ν+νν++ν

Starting from Δe(0)=Δrev, we consider the first-passage-time of Δe(t) through +Δrev. If a crossing is certain (i.e. when ν+ν++ν>Δrev), the first-passage-time T writes

(2) T=τΔln(Δ+ΔrevΔΔrev)

A similar hyperbolic dependence obtains also in all other cases. When the distance between asymptotic levels Δ falls below the reversal threshold Δrev, dominance durations become infinite and reversals cease.

The hyperbolic dependence of dominance durations, illustrated in Appendix 1—figure 4d, has an interesting implication. Consider the point of equidominance, at which both dominance durations are equal and of moderate duration. Increasing the difference between image contrasts (e.g., increasing cc+Δc and decreasing ccΔc) increases Δ during the dominance of e and decreases it during the dominance of e. Due to the hyperbolic dependence, longer dominance periods lengthen more (TT+ΔT) than shorter dominance periods shorten (TTΔT), consistent with the contemporary formulation of Levelt III [Brascamp et al., 2015].

Decision pools

We wish to analyze steady-state conditions for decision pools R, R, as illustrated in Appendix 1—figure 4a,b. From Equation 4, we can write

r=ϕ(e,e,r,r),r=ϕ(e,e,r,r),

Under certain conditions – in particular, for sufficient self-coupling wcoop – the steady-state equations admit more than one solution: a low-activity fixed point with r0, and a high-activity fixed point with r1. Importantly, the low-activity fixed point can be destabilized when evidence activities change.

Consider a non-dominant decision pool R with fractional activity r=nr/N0 and its dominant rival pool R with fractional activity r=nr/N1. The steady-state condition then becomes

rϕ[wcoop(rxeff)]xeff=wcompwexce+winh(e+e)urwcoop

For certain values xeffxcrit, the low-activity fixed point becomes unstable, causing a sudden upward activation of pool R and eventually a perceptual reversal. We call rcrit the steady-state value of r at the point of disappearance.

We can now define a threshold Δrev in terms of the value of evidence bias Δe=ee which ensures that xeffxcrit:

Δrev2wexc(wcompxcritwcoopur)2wexc(wexc2winh)e+e2

We find that the threshold value Δrev decreases linearly with average evidence e¯=(e+e)/2, so that higher evidence activity necessarily entails lower thresholds (dashed red line in Figure 5c).

For wcoop=15.21, we find xcrit=0.24006, rcrit=0.0708, and Δrev=0.45541.1564e¯.

Appendix 1—figure 6
Birth-death dynamics of evidence pools ensures gamma-like distribution and ‘scaling property’ (invariance of distribution shape).

(a) Representative examples for the time development of evidence bias Δe=ee between reversals (i.e., between Δrev and approximately +Δrev). (b) Dominance distributions for c=c=1/16 (blue), c=c=1/4 (green), and c=c=1 (yellow). Distribution mean μ changes approximately threefold, but coefficient of variation cV and skewness γ1 are nearly invariant (inset), largely preserving distribution shape. (c) Development of expectation Δx between reversals (schematic). Left: a Poisson variable process, such as the difference Δx between two birth-death processes. Mean Δx grows linearly with t (lines, with slopes μ, μ) and variance (ΔxΔx)2 grows linearly with t (dashed curves, with scaling factors σ, σ). Constants μ and σ change with stimulus contrast (blue and red). Proportionality μσ2 ensures constant dispersion of Δx at threshold (δx=δx), and, consequently, a dispersion of threshold-crossing times that grows linearly with mean threshold-crossing time (δt/trev=δt/trev=const), preserving distribution shape. Right: a process with constant variance, σ=σ. Dispersion of Δx at threshold increases with threshold-crossing time (δx§gt;δx) and dispersion of threshold-crossing times grows supra-linearly with mean threshold-crossing time (δt/trev<δt/trev§lt;δt/trev), broadening distribution shape.

Potential landscape

In Figure 5b, we illustrate the steady-state condition r=ϕ[wcoop(rxeff)] in terms of an effective potential landscape U(x). The functional form of this landscape was obtained by integrating ‘restoring force’ F(x) over activity x:

F(x)=Φ[wcoop(xxeff)]x,u(x)=xeffxF(x)dx

Stochastic dynamics

Poisson-like variability

The discretely stochastic process x(t){0,1N,2N,...,1} has a continuously stochastic ‘diffusion limit,’ xdiff(t), for N, with identical mean xdiff=x and variance xdiff2xdiff2=x2x2. This diffusion limit is a Cox–Ingersoll process and its dynamical equation

x˙diff=(1xdiff)ν+xdiffν+(1xdiff)ν++xdiffνN ξ(t),

where ξ(t) is white noise, reveals that its increments N x˙diff (and thus also the increments of the original discrete process) exhibit Poisson-like variability. Specifically, in the low-activity regime, xdiff1, both mean and variance of increments approximate activation rate N ν+:

Nx˙=Nx˙diffNν+,N2x˙2Nx˙2=N2x˙diff2Nx˙diff2Nν+.

Gamma-distributed first-passage times

When the input to a pool of bistable variables undergoes a step change, the active fraction x(t) transitions stochastically between old and new steady states, xold and xnew (set by old and new input values, respectively). The time that elapses until fractional activity crosses an intermediate ‘threshold’ level θ (xold<θ<xnew) is termed a ‘first-passage-time.’ In a low-threshold regime, birth-death processes exhibit a particular and highly unusual distribution of first-passage times.

Specifically, the distribution of first-passage-times assumes a characteristic, gamma-like shape for a wide range of value triplets (xold, θ, xnew) [Cao et al., 2014]: skewness γ1 takes a stereotypical value γ12cV, the coefficient of variation cV remains constant (as long as the distance between xold and θ remains the same), whereas the distribution mean may assume widely different values. This gamma-like distribution shape is maintained even when shared input changes during the transition (e.g., when bistable variables are coupled to each other) [Cao et al., 2014].

Importantly, only a birth-death process (e.g., a pool of bistable variables) guarantees a gamma-like distribution of first-passage-times under different input conditions [25]. Many other discretely stochastic processes (e.g., Poisson process) and continuously stochastic processes (e.g., Wiener, Ornstein–Uhlenbeck, Cox–Ingersoll) produce inverse Gaussian distributions with γ13 cv. Models combining competition, adaptation, and noise can produce gamma-like distributions, but require different parameter values for every input condition (see Materials and methods: Alternative model).

Scaling property

In the present model, first-passage-times reflect the concurrent development of two opponent birth-death processes (pools of N=25 binary variables). Dominance periods begin with newly non-dominant evidence e well below newly dominant evidence e, Δe=eeΔrev, and end with the former well above the latter, Δe+Δrev (Appendix 1—figure 6a). The combination of two small pools with N=25 approximates a single large pool with N=25. When image contrast changes, distribution shape remains nearly the same, with a coefficient of variation cV0.6 and a gamma-like skewness γ12, even though mean μ of first-passage-times changes substantially (Appendix 1—figure 6b).

This ‘scaling property’ (preservation of distribution shape) is owed to the Poisson-like variability of birth-death processes (see above, Appendix 1—figure 6c). Poisson-like variability implies that accumulation rate μ and dispersion rate σ2 are proportional, μσ2. This proportionality ensures that activity at threshold disperses equally widely for different accumulation rates (i.e., for different input strengths), preserving the shape of first-passage-time distributions [Cao et al., 2016].

Appendix 1—figure 7
Characteristic times of evidence activity.

(a) Characteristic times τe, τe for different image contrast c=c, when evidence pool is dominant (dom) and non-dominant (sup). (b) Autocorrelation of evidence activity e, e as a function of image contrast (color) and latency, expressed in multiples of average dominance duration T. (c) Autocorrelation of joint evidence activity e¯=(e+e)/2 as a function of image contrast (color) and latency. Note that autocorrelation time lengthens substantially for high image contrast.

Characteristic times

As mentioned previously, the characteristic times of pools of bistable variables are not fixed but vary with input (Equation 2). In our model, the characteristic times of evidence activities lengthen with increasing input contrast and shorten with feedback suppression (Appendix 1—figure 7a). Characteristic times are reflected also in the temporal autocorrelation, which averages over periods of dominance and non-dominance alike. Autocorrelation times lengthen with increasing input contrast, both in absolute terms and relative to the average dominance duration (Appendix 1—figure 7b).

Importantly, the autocorrelation time of mean evidence activity e¯=(e+e)/2 is even longer, particularly for high input contrast (Appendix 1—figure 7c). The reason is that spontaneous fluctuations of e¯ are constrained not only by birth-death dynamics, but additionally by the reversal dynamics that keeps evidence activities e and e close together (i.e., within reversal threshold Δrev). As a result, the characteristic timescale of spontaneous fluctuations of e¯ lengthens with input contrast. The amplitude of such fluctuations also grows with contrast (not shown).

The slow fluctuations of e¯ induce mirror-image fluctuations of reversal threshold Δrev and thus are responsible for the serial dependency of reversal sequences (see Deterministic dynamics: Decision pools).

Burstiness

Appendix 1—figure 8
Burstiness of reversal sequences predicted by model and confirmed by experimental observations.

(a) Burstiness index (BI) (mean) for n successive dominance periods in experimentally observed reversal sequences, for contrasts 12.5% (green), 50% (yellow), and 100% (red). (b) BI for reversal sequences generated by model (mean ± std).

The proposed mechanism predicts that reversal sequences include episodes with several successive short (or long) dominance periods. It further predicts that this inhomogeneity increases with image contrast. Such an inhomogeneity may be quantified in terms of a ‘burstiness index’ (BI), which compares the variability of the mean for sets of n successive periods to the expected variability for randomly shuffled reversal sequences. In both model and experimental observations, this index rises far above chance (over broad range of n) for high image contrast (Appendix 1—figure 8). The degree of inhomogeneity expressed by the model at high image contrast is comparable to that observed experimentally, even though the model was neither designed nor fitted to reproduce non-stationary aspects of reversal dynamics. This correspondence between model and experimental observation compellingly corroborates the proposed mechanism.

Robustness of fit

The parameter values associated with the global minimum of the fit error define the model used throughout the article. As described in Materials and methods, we explored the vicinity of this parameter set by individually varying each parameter within a certain neighborhood. This allowed us to estimate 95% confidence intervals for each parameter value. The results are illustrated in Appendix 1—figure 9.

Appendix 1—figure 9
Dependence of fit error on individual parameter values (with all other parameter values fixed).

30 equally spaced values were tested (blue dots) and fitted by a quadratic function (red solid curve, with 95% confidence intervals indicated by dotted curves). For each parameter, both the optimal value (red cross) and the extremum of the parabolic fit (green circle) are shown.

Note that optimal parameter values (red crosses) are consistently near extrema of the parabolic fits (green circles), indicating the robustness of the fit. Note further that instead of the parameter pair wvis and ue0, we show the related parameter pair α and β, which is defined through the relations wvis=αln((1+γ)/γ) and ue0=αlnγ+β.

The code used to analyze optimization statistics is available in the folder ‘analyzeOptimizationStatistics’ of the Github repository provided with this article (https://github.com/mauriziomattia/2021.BistablePerceptionModel) copy archived at.

Data availability

Source data is provided for Figures 2 and 3. Source code for the binocular rivalry model is provided in a Github repository (https://github.com/mauriziomattia/2021.BistablePerceptionModel) copy archived at https://archive.softwareheritage.org/swh:1:rev:f70e9e45ddb64cef7fc9a3ea57f0b7a04dfc6729.

References

    1. Diaz-Caneja E
    (1928)
    Sur L’Alterance Binoculaire
    Ann Occul (Paris) 12:721–731.
    1. Gregory RL
    (1980) Perceptions as hypotheses
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 290:181–197.
    https://doi.org/10.1098/rstb.1980.0090
    1. Koch C
    2. Ullman S
    (1985)
    Shifts in selective visual attention: Towards the underlying neural circuitry
    Human Neurobiology 4:219–227.
  1. Book
    1. Koffka K
    (1935)
    Principles of Gestalt Psychology
    Harcourt Brace.
    1. Lee TS
    2. Mumford D
    (2003) Hierarchical Bayesian inference in the visual cortex
    Journal of the Optical Society of America. A, Optics and Image Science 20:1434–1448.
    https://doi.org/10.1364/JOSAA.20.001434
  2. Book
    1. Leopold DA
    (1997)
    Brain Mechanisms of Visual Awareness Using Perceptual Ambiguity to Investigate the Neural Basis of Image Segmentation and Grouping. Ph.D. Thesis
    Houston, Texas: Baylor College of Medicine.
  3. Book
    1. Levelt WJM
    (1965)
    On Binocular Rivalry
    Leiden: Van Gorkum Comp.
    1. Logothetis NK
    (1998) Single units and conscious vision
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 353:1801–1818.
    https://doi.org/10.1098/rstb.1998.0333
  4. Book
    1. Luce RD
    (1986)
    Response Times: Their Role in Inferring Elementary Mental Organization
    New York: Oxford University Press.
  5. Book
    1. Pearl J
    (1988)
    Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
    Morgan Kaufmann.
  6. Book
    1. Rock I
    (1983)
    The Logic of Perception
    MIT Press.
  7. Book
    1. Rubin E
    (1958)
    Figure and ground
    In: Beardslee DC, Wertheimer M, editors. Readings in Perception. Van Nostrand. pp. 194–203.
  8. Book
    1. van Kampen NG
    (1981)
    Stochastic Processes in Physics and Chemistry
    Amsterdam: North-Holland Physics Publishing.
  9. Book
    1. von Helmholtz H
    (1867)
    Handbuch Der Physiologischen Optik
    Leopold Voss.
    1. Wertheimer M
    (1912)
    Zeitschrift für Psychologie mit Zeitschrift für angewandte Psychologie
    Experimentelle Studien Über Das Sehen von Bewegung 61:161–165.
    1. Wheatstone C
    (1838)
    Contributions to the physiology of vision: On some remarkable, and hitherto unobserved, phenomena of binocular vision
    Philosophical Transactions of the Royal Society A 128:371–394.
    1. Wiesenfelder H
    2. Blake RR
    (1990)
    The neural site of binocular rivalry relative to the analysis of motion in the human visual system
    The Journal of Neuroscience 10:3880–3888.

Article and author information

Author details

  1. Robin Cao

    1. Cognitive Biology, Center for Behavioral Brain Sciences, Magdeburg, Germany
    2. Gatsby Computational Neuroscience Unit, London, United Kingdom
    3. Istituto Superiore di Sanità, Rome, Italy
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Supervision, Visualization, Writing – original draft, Writing – review and editing
    Competing interests
    none
  2. Alexander Pastukhov

    Cognitive Biology, Center for Behavioral Brain Sciences, Magdeburg, Germany
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, Visualization
    Competing interests
    none
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8738-8591
  3. Stepan Aleshin

    Cognitive Biology, Center for Behavioral Brain Sciences, Magdeburg, Germany
    Contribution
    Formal analysis, Investigation, Methodology, Software
    Competing interests
    none
  4. Maurizio Mattia

    Istituto Superiore di Sanità, Rome, Italy
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, Visualization
    Contributed equally with
    Jochen Braun
    Competing interests
    None
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2356-4509
  5. Jochen Braun

    Cognitive Biology, Center for Behavioral Brain Sciences, Magdeburg, Germany
    Contribution
    Conceptualization, Funding acquisition, Methodology, Software, Supervision, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Maurizio Mattia
    For correspondence
    jochen.braun@ovgu.de
    Competing interests
    none
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8886-078X

Funding

European Commission (FP7-269459)

  • Jochen Braun

Deutsche Forschungsgemeinschaft (BR 987/3-1)

  • Jochen Braun

Deutsche Forschungsgemeinschaft (BR 987/4-1)

  • Jochen Braun

H2020 European Research Council (45539)

  • Maurizio Mattia

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Funding from EU FP7-269459 Coronet, DFG BR 987/3-1, DFG 987/4-1, and

EU Human Brain Project SGA3-945539.

The authors thank Andrew Parker and Maike S Braun for helpful comments.

Ethics

Human subjects: Six practised observers participated in the experiment (4 male, 2 female). Informed consent, and consent to publish, was obtained from all observers and ethical approval Z22/16 was obtained from the Ethics Commisson of the Faculty of Medicine of the Otto-von-Guericke University, Magdeburg.

Version history

  1. Received: July 29, 2020
  2. Accepted: May 24, 2021
  3. Version of Record published: August 9, 2021 (version 1)

Copyright

© 2021, Cao et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,434
    views
  • 263
    downloads
  • 11
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Robin Cao
  2. Alexander Pastukhov
  3. Stepan Aleshin
  4. Maurizio Mattia
  5. Jochen Braun
(2021)
Binocular rivalry reveals an out-of-equilibrium neural dynamics suited for decision-making
eLife 10:e61581.
https://doi.org/10.7554/eLife.61581

Share this article

https://doi.org/10.7554/eLife.61581

Further reading

    1. Genetics and Genomics
    2. Neuroscience
    Bohan Zhu, Richard I Ainsworth ... Javier González-Maeso
    Research Article

    Genome-wide association studies have revealed >270 loci associated with schizophrenia risk, yet these genetic factors do not seem to be sufficient to fully explain the molecular determinants behind this psychiatric condition. Epigenetic marks such as post-translational histone modifications remain largely plastic during development and adulthood, allowing a dynamic impact of environmental factors, including antipsychotic medications, on access to genes and regulatory elements. However, few studies so far have profiled cell-specific genome-wide histone modifications in postmortem brain samples from schizophrenia subjects, or the effect of antipsychotic treatment on such epigenetic marks. Here, we conducted ChIP-seq analyses focusing on histone marks indicative of active enhancers (H3K27ac) and active promoters (H3K4me3), alongside RNA-seq, using frontal cortex samples from antipsychotic-free (AF) and antipsychotic-treated (AT) individuals with schizophrenia, as well as individually matched controls (n=58). Schizophrenia subjects exhibited thousands of neuronal and non-neuronal epigenetic differences at regions that included several susceptibility genetic loci, such as NRG1, DISC1, and DRD3. By analyzing the AF and AT cohorts separately, we identified schizophrenia-associated alterations in specific transcription factors, their regulatees, and epigenomic and transcriptomic features that were reversed by antipsychotic treatment; as well as those that represented a consequence of antipsychotic medication rather than a hallmark of schizophrenia in postmortem human brain samples. Notably, we also found that the effect of age on epigenomic landscapes was more pronounced in frontal cortex of AT-schizophrenics, as compared to AF-schizophrenics and controls. Together, these data provide important evidence of epigenetic alterations in the frontal cortex of individuals with schizophrenia, and remark for the first time on the impact of age and antipsychotic treatment on chromatin organization.

    1. Neuroscience
    Aedan Yue Li, Natalia Ladyka-Wojcik ... Morgan Barense
    Research Article

    Combining information from multiple senses is essential to object recognition, core to the ability to learn concepts, make new inferences, and generalize across distinct entities. Yet how the mind combines sensory input into coherent crossmodal representations - the crossmodal binding problem - remains poorly understood. Here, we applied multi-echo fMRI across a four-day paradigm, in which participants learned 3-dimensional crossmodal representations created from well-characterized unimodal visual shape and sound features. Our novel paradigm decoupled the learned crossmodal object representations from their baseline unimodal shapes and sounds, thus allowing us to track the emergence of crossmodal object representations as they were learned by healthy adults. Critically, we found that two anterior temporal lobe structures - temporal pole and perirhinal cortex - differentiated learned from non-learned crossmodal objects, even when controlling for the unimodal features that composed those objects. These results provide evidence for integrated crossmodal object representations in the anterior temporal lobes that were different from the representations for the unimodal features. Furthermore, we found that perirhinal cortex representations were by default biased towards visual shape, but this initial visual bias was attenuated by crossmodal learning. Thus, crossmodal learning transformed perirhinal representations such that they were no longer predominantly grounded in the visual modality, which may be a mechanism by which object concepts gain their abstraction.