Abstract
In ambiguous or conflicting sensory situations, perception is often ‘multistable’ in that it perpetually changes at irregular intervals, shifting abruptly between distinct alternatives. The interval statistics of these alternations exhibits quasiuniversal characteristics, suggesting a general mechanism. Using binocular rivalry, we show that many aspects of this perceptual dynamics are reproduced by a hierarchical model operating out of equilibrium. The constitutive elements of this model idealize the metastability of cortical networks. Independent elements accumulate visual evidence at one level, while groups of coupled elements compete for dominance at another level. As soon as one group dominates perception, feedback inhibition suppresses supporting evidence. Previously unreported features in the serial dependencies of perceptual alternations compellingly corroborate this mechanism. Moreover, the proposed outofequilibrium dynamics satisfies normative constraints of continuous decisionmaking. Thus, multistable perception may reflect decisionmaking in a volatile world: integrating evidence over space and time, choosing categorically between hypotheses, while concurrently evaluating alternatives.
Introduction
In deducing the likely physical causes of sensations, perception goes beyond the immediate sensory evidence and draws heavily on context and prior experience (von Helmholtz, 1867; Barlow et al., 1972; Gregory, 1980; Rock, 1983). Numerous illusions in visual, auditory, and tactile perception – all subjectively compelling, but objectively false – attest to this extrapolation beyond the evidence. In natural settings, perception explores alternative plausible causes of sensory evidence by active readjustment of sensors (‘active perception,’ Mirza et al., 2016; Yang et al., 2018; Parr and Friston, 2017a). In general, perception is thought to actively select plausible explanatory hypotheses, to predict the sensory evidence expected for each hypothesis from prior experience, and to compare the observed sensory evidence at multiple levels of scale or abstraction (‘analysis by synthesis,’ ‘predictive coding,’ ‘hierarchical Bayesian inference,’ Yuille and Kersten, 2006, Rao and Ballard, 1999, Parr and Friston, 2017b, Pezzulo et al., 2018). Active inference engages the entire hierarchy of cortical areas involved in sensory processing, including both feedforward and feedback projections (Bar, 2009; Larkum, 2013; Shipp, 2016; Funamizu et al., 2016; Parr et al., 2019).
The dynamics of active inference becomes experimentally observable when perceptual illusions are ‘multistable’ (Leopold and Logothetis, 1999). In numerous ambiguous or conflicting situations, phenomenal experience switches at irregular intervals between discrete alternatives, even though the sensory scene is stable (Necker, 2009; Wheatstone, 1838; Rubin, 1958; Attneave, 1971; Ramachandran and Anstis, 2016; Pressnitzer and Hupe, 2006; Schwartz et al., 2012). Multistable illusions are enormously diverse, involving visibility or audibility, perceptual grouping, visual depth or motion, and many kinds of sensory scenes, from schematic to naturalistic. Average switching rates differ greatly and range over at least two orders of magnitude (Cao et al., 2016), depending on sensory scene, perceptual grouping (Wertheimer, 1912; Koffka, 1935; Ternus, 1926), continuous or intermittent presentation (Leopold and Logothetis, 2002; Maier et al., 2003), attentional condition (Pastukhov and Braun, 2007), individual observer (Pastukhov et al., 2013c; Denham et al., 2018; Brascamp et al., 2019), and many other factors.
In spite of this diversity, the stochastic properties of multistable phenomena appear to be quasiuniversal, suggesting that the underlying mechanisms may be general. Firstly, average dominance duration depends in a characteristic and counterintuitive manner on the strength of dominant and suppressed evidence (‘Levelt’s propositions I–IV,’ Levelt, 1965; Brascamp et al., 2006; Klink et al., 2016; Kang, 2009; Brascamp et al., 2015; MorenoBote et al., 2010). Secondly, the statistical distribution of dominance durations shows a stereotypical shape, resembling a gamma distribution with shape parameter $r\simeq 34$ (‘scaling property,’ Cao et al., 2016; Fox and Herrmann, 1967; Blake et al., 1971; Borsellino et al., 1972; Walker, 1975; De Marco et al., 1977; Murata et al., 2003; Brascamp et al., 2005; Pastukhov and Braun, 2007; Denham et al., 2018; Darki and Rankin, 2021). Thirdly, the durations of successive dominance periods are correlated positively, over at least two or three periods (Fox and Herrmann, 1967; Walker, 1975; Van Ee, 2005; Denham et al., 2018).
Here, we show that these quasiuniversal characteristics are comprehensively and quantitatively reproduced, indeed guaranteed, by an interacting hierarchy of birthdeath processes operating out of equilibrium. While the proposed mechanism combines some of the key features of previous models, it far surpasses their explanatory power.
Several possible mechanisms have been proposed for perceptual dominance, the triggering of reversals, and the stochastic timing of reversals. That a single, coherent interpretation typically dominates phenomenal experience is thought to reflect competition (explicit or implicit) at the level of explanatory hypotheses (e.g., Dayan, 1998), sensory inputs (e.g., Lehky, 1988), or both (e.g., Wilson, 2003). That a dominant interpretation is occasionally supplanted by a distinct alternative has been attributed to fatigue processes (e.g., neural adaptation, synaptic depression, Laing and Chow, 2002), spontaneous fluctuations (‘noise,’ e.g., Wilson, 2007, Kim et al., 2006), stochastic sampling (e.g., Schrater and Sundareswara, 2006), or combinations of these (e.g., adaptation and noise, Shpiro et al., 2009; Seely and Chow, 2011; Pastukhov et al., 2013c). The characteristic stochasticity (gammalike distribution) of dominance durations has been attributed to Poisson counting processes (e.g., birthdeath processes, Taylor and Ladridge, 1974; Gigante et al., 2009; Cao et al., 2016) or stochastic accumulation of discrete samples (Murata et al., 2003; Schrater and Sundareswara, 2006; Sundareswara and Schrater, 2008; Weilnhammer et al., 2017).
‘Dynamical’ models combining competition, adaptation, and noise capture well the characteristic dependence of dominance durations on input strength (‘Levelt’s propositions’) (Laing and Chow, 2002; Wilson, 2007; Ashwin and Aureliu, 2010), especially when inputs are normalized (MorenoBote et al., 2007; MorenoBote et al., 2010; Cohen et al., 2019), and when the dynamics emphasize noise (Shpiro et al., 2009; Seely and Chow, 2011; Pastukhov et al., 2013c). However, such models do not preserve distribution shape over the full range of input strengths (Cao et al., 2016; Cohen et al., 2019). On the other hand, ‘sampling’ models based on discrete random processes preserve distribution shape (Taylor and Ladridge, 1974; Murata et al., 2003; Schrater and Sundareswara, 2006; Sundareswara and Schrater, 2008; Cao et al., 2016; Weilnhammer et al., 2017), but fail to reproduce the dependence on input strength. Neither type of model accounts for the sequential dependence of dominance durations (Laing and Chow, 2002).
Here, we reconcile ‘dynamical’ and ‘sampling’ approaches to multistable perception, extending an earlier effort (Gigante et al., 2009). Importantly, every part of the proposed mechanism appears to be justified normatively in that it may serve to optimize perceptual choices in a general behavioral situation, namely, continuous inference in uncertain and volatile environments (Bogacz, 2007; VelizCuba et al., 2016). We propose that sensory inputs are represented by birthdeath processes in order to accumulate sensory information over time and in a format suited for Bayesian inference (Ma et al., 2006; Pouget et al., 2013). Further, we suggest that explanatory hypotheses are evaluated competitively, with a hypothesis attaining dominance (over phenomenal experience) when its support exceeds the alternatives by a certain finite amount, consistent with optimal decisionmaking between multiple alternatives (Bogacz, 2007). Finally, we assume that a dominant hypothesis suppresses its supporting evidence, as required by ‘predictive coding’ implementations of hierarchical Bayesian inference (Pearl, 1988; Rao and Ballard, 1999; Hohwy et al., 2008). In contrast to many previous models, we do not require a local mechanisms of fatigue, adaptation, or decay.
Based on these assumptions, the proposed mechanism reproduces dependence on input strength, as well as distribution of dominance durations and positive sequential dependence. Additionally, it predicts novel and unsuspected dynamical features confirmed by experiment.
Results
Below we introduce each component of the mechanism and its possible normative justification, before describing outofequilibrium dynamics resulting from the interaction of all components. Subsequently, we compare model predictions with multistable perception of human observers, specifically, the dominance statistics of binocular rivalry (BR) at various combinations of left and righteye contrasts (Figure 1a).
Hierarchical dynamics
Bistable assemblies: ‘local attractors’
As operative units of sensory representation, we postulate neuronal assemblies with bistable ‘attractor’ dynamics. Effectively, assembly activity moves in an energy landscape with two distinct quasistable states – dubbed ‘on’ and ‘off’ – separated by a ridge (Figure 1b). Driven by noise, assembly activity mostly remains near one quasistable state (‘on’ or ‘off’), but occasionally ‘escapes’ to the other state (Kramers, 1940; Hanggi et al., 1990; Deco and Hugues, 2012; LitwinKumar and Doiron, 2012; Huang and Doiron, 2017).
An important feature of ‘attractor’ dynamics is that the energy of quasistable states depends sensitively on external input. Net positive input destabilizes (i.e., raises the potential of) the ‘off’ state and stabilizes (i.e., lowers the potential of) the ‘on’ state. Transition rates $\nu}^{\pm$ are even more sensitive to external input as they depend approximately exponentially on the height of the energy ridge (‘activation energy’).
Figure 1b illustrates ‘attractor’ dynamics for an assembly of 150 spiking neurons with activity levels of approximately $7\phantom{\rule{thinmathspace}{0ex}}\mathit{H}\mathit{z}$ and $21\phantom{\rule{thinmathspace}{0ex}}\mathit{H}\mathit{z}$ per neuron in the ‘off’ and ‘on’ states, respectively. Full details are provided in Appendix 1, section Metastable population dynamics, and Appendix 1—figure 2.
Binary stochastic variables
Our model is independent of neural details and relies exclusively on an idealized description of ‘attractor’ dynamics. Specifically, we reduce bistable assemblies to discretely stochastic, binary activity variables $x(t)\in \{0,1\}$, which activate and inactivate with Poisson rates $\nu}^{+$ and $\nu}^{$, respectively. These rates ${\nu}^{\pm}(s)$ vary exponentially and antisymmetrically with increments or decrements of activation energy $\mathrm{\Delta}u=u(s)+{u}^{0}$:
where $u}^{0$ and $\nu$ are baseline potential and baseline rate, respectively, and where the inputdependent part $u(s)=ws$ varies linearly with input $s$, with synaptic coupling constant $w$ (see Appendix 1, section Metastable population dynamics and Appendix 1—figure 2e).
Pool of $N$ binary variables
An extended network, containing $N$ individually bistable assemblies with shared input $s$, reduces to a ‘pool’ of $N$ binary activity variables $x}_{i}(t)\in \{0,1\$ with identical rates ${\nu}_{\pm}(s)$. Although all variables are independently stochastic, they are coupled through their shared input $s$. The number of active variables $n(t)=\sum _{i}{x}_{i}(t)$ or, equivalently, the active fraction $x(t)$=$n(t)/N$, forms a discretely stochastic process (‘birthdeath’ or ‘Ehrenfest’ process; Karlin and McGregor, 1965).
Relaxation dynamics
While activity $x(t)$ develops discretely and stochastically according to Equation 5 (Materials and methods), its expectation $\u27e8x(t)\u27e9$ develops continuously and deterministically,
relaxing with characteristic time $\tau}_{x}=\frac{1}{{\nu}^{+}+{\nu}^{}$ towards asymptotic value $x}_{\mathrm{\infty}}=\frac{{\nu}^{+}}{{\nu}^{+}+{\nu}^{}$. As rates $\nu}^{\pm$ change with input $s$ (Equation 1), we can define the functions ${\tau}_{s}=\mathrm{{\rm Y}}(s)$ and ${x}_{\mathrm{\infty}}=\mathrm{\Phi}(s)$ (see Materials and methods). Characteristic time $\tau}_{x$ is longest for small input $s\simeq 0$ and shortens for larger positive or negative input $s\gg 0$. The asymptotic value $x}_{\mathrm{\infty}$ ranges over the interval (0, 1) and varies sigmoidally with input $s$, reaching halfactivation for $s={u}^{0}/w$.
Quality of representation
Pools of bistable variables belong to a class of neural representations particularly suited for Bayesian integration of sensory information (Beck et al., 2008; Pouget et al., 2013). In general, summation of activity is equivalent to optimal integration of information, provided that response variability is Poissonlike, and response tuning differs only multiplicatively (Ma et al., 2006; Ma et al., 2008). Pools of bistable variables closely approximate these properties (see Appendix 1, section Quality of representation: Suitability for inference).
The representational accuracy of even a comparatively small number of bistable variables can be surprisingly high. For example, if normally distributed inputs drive the activity of initially inactive pools of bistable variables, pools as used in the present model ($N=25$, $w=2.5$) readily capture 90% of the Fisher information (see Appendix 1, section Quality of representation: Integration of noisy samples).
Conflicting evidence
Any model of BR must represent the conflicting evidence from both eyes (e.g., different visual orientations), which supports alternative perceptual hypotheses (e.g., distinct grating patterns). We assume that conflicting evidence accumulates in two separate pools of $N=25$ bistable variables, $E$ and $E}^{\mathrm{\prime}$, (‘evidence pools,’ Figure 1c). Fractional activations $e(t)$ and ${e}^{\mathrm{\prime}}(t)$ develop stochastically following Equation 5 (Materials and methods). Transition rates $\nu}_{e}^{\pm$ and $\nu}_{{e}^{\mathrm{\prime}}}^{\pm$ vary exponentially with activation energy (Equation 1), with baseline potential $u}_{e}^{0$ and baseline rate $\nu}_{e$. The variable components of activation energy, $u}_{e$ and $u}_{{e}^{\mathrm{\prime}}$, are synaptically modulated by image contrasts, $c$ and $c}^{\mathrm{\prime}$:
where $w}_{\mathit{v}\mathit{i}\mathit{s}$ is a coupling constant and $I=f(c)\in [0,1]$ is a monotonic function of image contrast $c$ (see Materials and methods).
Competing hypotheses: ‘nonlocal attractors’
Once evidence for, and against, alternative perceptual hypotheses (e.g., distinct grating patterns) has been accumulated, reaching a decision requires a sensitive and reliable mechanism for identifying the best supported hypothesis and amplifying the result into a categorical readout. Such a winnertakeall decision (Koch and Ullman, 1985) is readily accomplished by a dynamical version of biased competition (Deco and Rolls, 2005; Wang, 2002; Deco et al., 2007; Wang, 2008).
We assume that alternative perceptual hypotheses are represented by two further pools of $N=25$ bistable variables, $R$ and $R}^{\mathrm{\prime}$, forming two ‘nonlocal attractors’ (‘decision pools,’ Figure 1c). Similar to previous models of decisionmaking and attentional selection (Deco and Rolls, 2005; Wang, 2002; Deco et al., 2007; Wang, 2008), we postulate recurrent excitation within pools, but recurrent inhibition between pools, to obtain a ‘winnertakeall’ dynamics. Importantly, we assume that ‘evidence pools’ project to ‘decision pools’ not only in the form of selective excitation (targeted at the corresponding decision pool), but also in the form of indiscriminate inhibition (targeting both decision pools), as suggested previously (Ditterich et al., 2003; Bogacz et al., 2006).
Specifically, fractional activations $r(t)$ and ${r}^{\mathrm{\prime}}(t)$ develop stochastically according to Equation 5 (Materials and methods). Transition rates $\nu}_{s}^{\pm$ and $\nu}_{{s}^{\mathrm{\prime}}}^{\pm$ vary exponentially with activation energy (Equation 1), with baseline difference $u}_{r}^{0$ and baseline rate $\nu}_{r$. The variable components of activation energy, $u}_{r$ and $u}_{{r}^{\mathrm{\prime}}$, are synaptically modulated by evidence and decision activities:
where coupling constants $w}_{\mathit{e}\mathit{x}\mathit{c}$, $w}_{\mathit{i}\mathit{n}\mathit{h}$, $w}_{\mathit{c}\mathit{o}\mathit{o}\mathit{p}$, $w}_{\mathit{c}\mathit{o}\mathit{m}\mathit{p}$ reflect feedforward excitation, feedforward inhibition, lateral cooperation within decision pools, and lateral competition between decision pools, respectively.
This biased competition circuit expresses a categorical decision by either raising $r$ towards unity (and lowering $r}^{\mathrm{\prime}$ towards zero) or vice versa. The choice is random when visual input is ambiguous, $I\simeq {I}^{\prime}$, but becomes deterministic with growing input bias $I{I}^{\mathrm{\prime}}\S gt;0$ . This probabilistic sensitivity to input bias is reliable and robust under arbitrary initial conditions of $e$, $e}^{\mathrm{\prime}$, $r$ and $r}^{\mathrm{\prime}$ (see Appendix 1, section Categorical choice with Appendix 1—figure 3).
Feedback suppression
Finally, we assume feedback suppression, with each decision pool selectively targeting the corresponding evidence pool. A functional motivation for this systematic bias against the currently dominant appearance is given momentarily. Its effects include curtailment of dominance durations and ensuring that reversals occur from time to time. Specifically, we modify Equation 3 to
where $w}_{\mathit{s}\mathit{u}\mathit{p}\mathit{p}$ is a coupling constant.
Previous models of BR (Dayan, 1998; Hohwy et al., 2008) have justified selective feedback suppression of the evidence supporting a winning hypothesis in terms of ‘predictive coding’ and ‘hierarchical Bayesian inference’ (Rao and Ballard, 1999; Lee and Mumford, 2003). An alternative normative justification is that, in volatile environments, where the sensory situation changes frequently (‘volatility prior’), optimal inference requires an exponentially growing bias against evidence for the most likely hypothesis (VelizCuba et al., 2016). Note that feedback suppression applies selectively to evidence for a winning hypothesis and is thus materially different from visual adaptation (Wark et al., 2009), which applies indiscriminately to all evidence present.
Reversal dynamics
A representative example of the joint dynamics of evidence and decision pools is illustrated in Figure 1c,d, both at the level of pool activities $e(t)$, ${e}^{\mathrm{\prime}}(t)$, $r(t)$, ${r}^{\mathrm{\prime}}(t)$, and at the level of individual bistable variables $x(t)$. The top row shows decision pools $R$ and $R}^{\mathrm{\prime}$, with instantaneous active counts, $Nr(t)$ and $N{r}^{\mathrm{\prime}}(t)$ and active/inactive states of individual variables $x(t)$. The bottom row shows evidence pools $E$ and $E}^{\mathrm{\prime}$, with instantaneous active counts, $Ne(t)$ and $N{e}^{\mathrm{\prime}}(t)$ and active/inactive states of individual variables $x(t)$. Only a small fraction of evidence variables is active at any one time.
Phenomenal appearance reverses when the differential activity $\mathrm{\Delta}e=e{e}^{\prime}$ of evidence pools, $E$ and $E}^{\mathrm{\prime}$, contradicts sufficiently strongly the differential activity $\mathrm{\Delta}r=r{r}^{\prime}$ of decision pools, $R$ and $R}^{\mathrm{\prime}$, such that the steady state of decision pools is destabilized (see further below and Figure 4). As soon as the reversal has been effected at the decision level, feedback suppression lifts from the newly nondominant evidence and descends upon the newly dominant evidence. Due to this asymmetric suppression, the newly nondominant evidence recovers, whereas the newly dominant evidence habituates. This opponent dynamics progresses, past the point of equality $s\simeq {s}^{\prime}$, until differential evidence activity $\mathrm{\Delta}e$ once again contradicts differential decision activity $\mathrm{\Delta}r$. Whereas the activity of decision pools varies in phase (or counterphase) with perceptual appearance, the activity of evidence pools changes in quarterphase (or negative quarterphase) with perceptual appearance (e.g., Figures 1c,d,2a), consistent with some previous models (Gigante et al., 2009; Albert et al., 2017; Weilnhammer et al., 2017).
Binocular rivalry
To compare predictions of the model described above to experimental observations, we measured spontaneous reversals of BR for different combinations of image contrast. BR is a particularly wellstudied instance of multistable perception (Wheatstone, 1838; DiazCaneja, 1928; Levelt, 1965; Leopold and Logothetis, 1999; Brascamp et al., 2015). When conflicting images are presented to each eye (e.g., by means of a mirror stereoscope or of colored glasses, see Materials and methods), the phenomenal appearance reverses from time to time between the two images (Figure 1a). Importantly, the perceptual conflict involves also representations of coherent (binocular) patterns and is not restricted to eyespecific (monocular) representations (Logothetis et al., 1996; Kovács et al., 1996; Bonneh et al., 2001; Blake and Logothetis, 2002).
Specifically, our experimental observations established reversal sequences for $5\times 5$ combinations of image contrast, $c}_{dom},{c}_{sup}\in \{\frac{1}{16},\frac{1}{8},\frac{1}{4},\frac{1}{2},1\$. During any given dominance period, $c}_{\mathit{d}\mathit{o}\mathit{m}$ is the contrast of the phenomenally dominant image and $c}_{\mathit{s}\mathit{u}\mathit{p}$ the contrast of the other, phenomenally suppressed image (see Materials and methods). We analyzed these observations in terms of mean dominance durations $\u27e8T\u27e9$, higher moments $c}_{V$ and $\gamma}_{1}/{c}_{V$ of the distribution of dominance durations, and sequential correlation $c{c}_{1}$ of successive dominance durations.
Additional aspects of serial dependence are discussed further below.
As described in Materials and methods, we fitted 11 model parameters to reproduce observations with more than 50 degrees of freedom: $5\times 5$ mean dominance durations $\u27e8T\u27e9$, $5\times 5$ coefficients of variation $c}_{V$, one value of skewness ${\gamma}_{1}/{c}_{V}=2$, and one correlation coefficient $c{c}_{1}=0.06$. The latter two values were obtained by averaging over $5\times 5$ contrast combinations and rounding. Importantly, minimization of the fit error, by random sampling of parameter space with a stochastic gradient descent, resulted in a threedimensional manifold of suboptimal solutions. This revealed a high degree of redundancy among the 11 model parameters (see Materials and methods). Accordingly, we estimate that the effective number of degrees of freedom needed to reproduce the desired outofequilibrium dynamics was between 3 and 4. Model predictions and experimental observations are juxtaposed in Figures 3 and 4.
The complex and asymmetric dependence of mean dominance durations on image contrast — aptly summarized by Levelt’s ‘propositions’ I to IV (Levelt, 1965; Brascamp et al., 2015) — is fully reproduced by the model (Figure 3). Here, we use the updated definition of Brascamp et al., 2015: increasing the contrast of one image increases the fraction of time during which this image dominates appearance (‘predominance,’ Levelt I). Counterintuitively, this is due more to shortening dominance of the unchanged image than to lengthening dominance of the changed image (Levelt II, Figure 3b, left panel). Mean dominance durations grow (and alternation rates decline) symmetrically around equal predominance as contrast difference $c}_{\mathit{d}\mathit{o}\mathit{m}}{c}_{sup$ increases (Levelt III, Figure 3b, right panel). Mean dominance durations shorten when both image contrasts $c}_{\mathit{d}\mathit{o}\mathit{m}}={c}_{\mathit{s}\mathit{u}\mathit{p}$ increase (Levelt IV, Figure 3c).
Successive dominance durations are typically correlated positively (Fox and Herrmann, 1967; Walker, 1975; Pastukhov et al., 2013c). Averaging over all contrast combinations, observed and fitted correlation coefficients were comparable with $c{c}_{1}=0.06\pm 0.06$ (mean and standard deviation). Unexpectedly, both observed and fitted correlations coefficients increased systematically with image contrast ($\rho =0.9$, $p\S lt;.01$), growing from $c{c}_{1}=0.02\pm 0.05$ at $c}_{dom}={c}_{sup}=\frac{1}{16$ to $0.21\pm 0.06$ at ${c}_{\mathit{d}\mathit{o}\mathit{m}}={c}_{\mathit{d}\mathit{o}\mathit{m}}=1$ (Figure 3e, blue symbols). It is important to that this dependence was not fitted. Rather, this previously unreported dependence constitutes a model prediction that is confirmed by observation.
The distribution of dominance durations typically takes a characteristic shape (Cao et al., 2016; Fox and Herrmann, 1967; Blake et al., 1971; Borsellino et al., 1972; Walker, 1975; De Marco et al., 1977; Murata et al., 2003; Brascamp et al., 2005; Pastukhov and Braun, 2007; Denham et al., 2018), approximating a gamma distribution with shape parameter $r\simeq 34$, or coefficient of variation ${c}_{V}=1/\sqrt{r}\simeq 0.50.6$. The fitted model fully reproduces this ‘scaling property’ (Figure 4). The observed coefficient of variation remained in the range ${c}_{V}\simeq 0.050.06$ for nearly all contrast combinations (Figure 4b). Unexpectedly, both observed and fitted values increased above, or decreased below, this range at extreme contrast combinations (Figure 4b, left panel). Along the main diagonal $c}_{\mathit{d}\mathit{o}\mathit{m}}={c}_{\mathit{s}\mathit{u}\mathit{p}$ , where observed values had smaller error bars, both observed and fitted values of skewness were ${\gamma}_{1}/{c}_{V}\simeq 2$ and thus approximated a gamma distribution (Figure 4d, blue symbols).
Specific contribution of evidence and decision levels
What are the reasons for the surprising success of the model in reproducing universal characteristics of multistable phenomena, including the counterintuitive input dependence (‘Levelt’s propositions’), the stereotypical distribution shape (‘scaling property’), and the positive sequential correlation (as detailed in Figures 3 and 4)? Which level of model dynamics is responsible for reproducing different aspects of BR dynamics?
Below, we describe the specific contributions of different model components. Specifically, we show that the evidence level of the model reproduces ‘Levelt’s propositions I–III’ and the ‘scaling property,’ whereas the decision level reproduces ‘Levelt’s proposition IV.’ A nontrivial interaction between evidence and decision levels reproduces serial dependencies. Additionally, we show that this interaction predicts further aspects of serial dependencies – such as sensitivity to image contrast – that were not reported previously, but are confirmed by our experimental observations.
Levelt’s propositions I, II, and III
The characteristic input dependence of average dominance durations emerges in two steps (as in Gigante et al., 2009). First, inputs and feedback suppression shape the birthdeath dynamics of evidence pools $E$ and $E}^{\mathrm{\prime}$ (by setting disparate transition rates $\nu}^{\pm$, following Equation 3’ and Equation 1). Second, this sets in motion two opponent developments (habituation of dominant evidence activity and recovery of nondominant evidence activity, both following Equation 2) that jointly determine dominance duration.
To elucidate this mechanism, it is helpful to consider the limit of large pools ($N\to \mathrm{\infty}$) and its deterministic dynamics (Figure 2), which corresponds to the average stochastic dynamics. In this limit, periods of dominant evidence $E$ or $E}^{\mathrm{\prime}$ start and end at the same levels ($e}_{\mathit{s}\mathit{t}\mathit{a}\mathit{r}\mathit{t}}={e}_{\mathit{s}\mathit{t}\mathit{a}\mathit{r}\mathit{t}}^{\prime$ and $e}_{\mathit{e}\mathit{n}\mathit{d}}={{e}^{\prime}}_{\mathit{e}\mathit{n}\mathit{d}$), because reversal thresholds $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$ are the same for evidence difference $e{e}^{\prime}$ and ${e}^{\mathrm{\prime}}e$ (see section Levelt IV below).
The rates at which evidence habituates or recovers depend, in the first instance, on asymptotic levels $e}_{\mathrm{\infty}$ and ${e}^{\mathrm{\prime}}}_{\mathrm{\infty}$ (Equation 1 and 2, Figure 2b and Appendix 1—figure 4). In general, dominance durations depend on distance $\mathrm{\Delta}}_{\mathrm{\infty}$ between asymptotic levels: the further apart these are, the faster the development and the shorter the duration. As feedback suppression inverts the sign of the opponent developments, dominant evidence decreases (habituates) while nondominant evidence increases (recovers). Due to this inversion, $\mathrm{\Delta}}_{\mathrm{\infty}$ is roughly proportional to $e}_{\mathrm{\infty}}^{\mathit{n}\mathit{o}\mathit{n}\mathit{}\mathit{d}\mathit{o}\mathit{m}}{e}_{\mathrm{\infty}}^{\mathit{d}\mathit{o}\mathit{m}}+{w}_{\mathit{s}\mathit{u}\mathit{p}\mathit{p}$. It follows that the distance $\mathrm{\Delta}}_{\mathrm{\infty}$ is smaller and the reversal dynamics slower when dominant input is stronger, and vice versa. It further follows that incrementing one input (and raising the corresponding asymptotic level) speeds up recovery or slows down habituation, shortening or lengthening periods of nondominance and dominance, respectively (Levelt I).
In the second instance, rates of habituation or recovery depend on characteristic times $\tau}_{e$ and $\tau}_{{e}^{\mathrm{\prime}}$ (Equation 1 and 2). When these rates are unequal, dominance durations depend more sensitively on the slower process. This is why dominance durations depend more sensitively on nondominant input (Levelt II): recovery of nondominant evidence is generally slower than habituation of dominant evidence, independently of which input is weaker or stronger. The reason is that the respective effects of characteristic times $\tau}_{e$ and $\tau}_{{e}^{\mathrm{\prime}}$ and asymptotic levels $e}_{\mathrm{\infty}$ and ${e}^{\mathrm{\prime}}}_{\mathrm{\infty}$ are synergistic for weakerinput evidence (in both directions), whereas they are antagonistic for strongerinput evidence (see Appendix 1, section Deterministic dynamics: Evidence pools and Appendix 1—figure 4).
In general, dominance durations depend hyperbolically on $\mathrm{\Delta}}_{\mathrm{\infty}$ (Figure 2c and Equation 7 in Appendix 1). Dominance durations become infinite (and reversals cease) when $\mathrm{\Delta}}_{\mathrm{\infty}$ falls below the reversal threshold $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$. This hyperbolic dependence is also why alternation rate peaks at equidominance (Levelt III): increasing the difference between inputs always lengthens longer durations more than it shortens shorter durations, thus lowering alternation rate.
Distribution of dominance durations
For all combinations of image contrast, the mechanism accurately predicts the experimentally observed distributions of dominance durations. This is owed to the stochastic activity of pools of bistable variables.
Firstly, dominance distributions retain nearly the same shape, even though average durations vary more than threefold with image contrast (see also Appendix 1—figure 6a,b). This ‘scaling property’ is due to the Poissonlike variability of birthdeath processes (see Appendix 1, section Stochastic dynamics). Generally, when a stochastic accumulation approaches threshold, the rates of both accumulation and dispersion of activity affect the distribution of firstpassagetimes (Cao et al., 2014; Cao et al., 2016). In the special case of Poissonlike variability, the two rates vary proportionally and preserve distribution shape (see also Appendix 1—figure 6c,d).
Secondly, predicted distributions approximate gamma distributions with scale factor $r\simeq 34$. As shown previously (Cao et al., 2014; Cao et al., 2016), this is due to birthdeath processes accumulating activity within a narrow range (i.e., evidence difference $\mathrm{\Delta}e\le 0.2$). In this lowthreshold regime, the firstpassagetimes of birthdeath processes are both highly variable and gamma distributed, consistent with experimental observations.
Thirdly, the predicted variability (coefficients of variation) of dominance periods varies along the $c+{c}^{\prime}=1$ axis, being larger for longer than for shorter dominance durations (Figure 4a,b). The reason is that stochastic development becomes noisedominated. For longer durations, strongerinput evidence habituates rapidly into a regime where random fluctuations gain importance (see also Appendix 1—figure 4a,b).
Levelt’s proposition IV
The model accurately predicts how dominance durations shorten with higher image contrast $c={c}^{\mathrm{\prime}}$ (Levelt IV). Surprisingly, this reflects the dynamics of decision pools $R$ and $R}^{\mathrm{\prime}$ (Figure 5).
Here again it is helpful to consider the deterministic limit of large pools ($N\to \mathrm{\infty}$). In this limit, a dominant decision state ${r}^{\mathrm{\prime}}\simeq 1$ is destabilized when a contradictory evidence difference $\mathrm{\Delta}e=e{e}^{\prime}$ exceeds a certain threshold value $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$ (Figure 5b and Appendix 1, section Deterministic dynamics: Decision pools). Due to the combined effect of excitatory and inhibitory feedforward projections, $w}_{\mathit{e}\mathit{x}\mathit{c}$ and $w}_{\mathit{i}\mathit{n}\mathit{h}$ (Equation 4 and Figure 5a), this average reversal threshold decreases with mean evidence activity $\overline{e}=(e+{e}^{\prime})/2$. Simulations of the fully stochastic model ($N=25$) confirm this analysis (Figure 5c). As average evidence activity $\u27e8\overline{e}\u27e9$ increases with image contrast, the average evidence bias $\u27e8\mathrm{\Delta}e\u27e9$ at the time of reversals decreases, resulting in shorter dominance periods (Figure 5d).
Serial dependence
The proposed mechanism predicts positive correlations between successive dominance durations, a wellknown characteristic of multistable phenomena (Fox and Herrmann, 1967; Walker, 1975; Van Ee, 2005; Denham et al., 2018). In addition, it predicts further aspects of serial dependence not reported previously.
In both model and experimental observations, a long dominance period tends to be followed by another long period, and a short dominance period by another short period (Figure 6). In the model, this is due to mean evidence activity $\overline{e}=(e+{e}^{\mathrm{\prime}})/2$ fluctuating stochastically above and below its longterm average. The autocorrelation time of these fluctuations increases monotonically with image contrast and, for high contrast, spans multiple dominance periods (see Appendix 1, section Characteristic times and Appendix 1—figure 7). Note that fluctuations of $\overline{e}$ diminish as the number of bistable variables increases and vanishe in the deterministic limit $N\to \mathrm{\infty}$.
Crucially, fluctuations of mean evidence $\overline{e}$ modulate both reversal threshold $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$ and dominance durations $T$, as illustrated in Figure 6a,b. To obtain Figure 6a, dominance durations were grouped into quantiles and the average duration $\u27e8{T}_{0}\u27e9$ of each quantile was compared to the conditional expectation of preceding and following durations $\u27e8{T}_{\pm n}\u27e9$ (upper graph). For the same quantiles (compare color coding), average evidence activity $\u27e8{\overline{e}}_{0}\u27e9$ was compared to the conditional expectation $\u27e8{\overline{e}}_{\pm n}\u27e9$ at the end of preceding and following periods (lower graph). Both the inverse relation between $\u27e8{T}_{\pm n}\u27e9$ and $\u27e8{\overline{e}}_{\pm n}\u27e9$ and the autocorrelation over multiple dominance periods are evident.
This source of serial dependency – comparatively slow fluctuations of $\overline{e}$ and $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$ – predicts several qualitative characteristics not reported previously and now confirmed by experimental observations. First, sequential correlations are predicted (and observed) to be strictly positive at all lags (next period, oneafternext period, and so on) (Figure 6d). In other words, it predicts that several successive dominance periods are shorter (or longer) than average.
Second, due to the contrast dependence of autocorrelation time, sequential correlations are predicted (and observed) to increase with image contrast (Figure 6d). The experimentally observed degree of contrast dependence is broadly consistent with pool sizes between $N=25$ and $N=40$ (black and red curves in Figure 3e). Larger pools with hundreds of bistable variables do not express the observed dependence on contrast (not shown).
Third, for high image contrast, reversal sequences are predicted (and observed) to contain extended episodes with dominance periods that are short or extended episodes with periods that are long (Figure 6c). When quantified in terms of a ‘burstiness index,’ the degree of inhomogeneity in predicted and observed reversal sequences is comparable (see Appendix 1, section Burstiness and Appendix 1—figure 8).
Many previous models of BR (e.g., Laing and Chow, 2002) postulated selective adaptation of competing representations to account for serial dependency. However, selective adaptation is an opponent process that favors positive correlations between different dominance periods, but negative correlations between same dominance periods. To demonstrate this point, we fitted such a model to reproduce our experimental observations ($T$, $c}_{V$, $\gamma}_{1$, and $c{c}_{1}$) for five image contrasts $c={c}^{\mathrm{\prime}}$. As expected, the alternative model predicts negative correlations $c{c}_{2}$ for same dominance periods (Figure 6d, right panel), contrary to what is observed.
Discussion
We have shown that many wellknown features of BR are reproduced, and indeed guaranteed, by a particular dynamical mechanism. Specifically, this mechanism reproduces the counterintuitive input dependence of dominance durations (‘Levelt’s propositions’), the stereotypical shape of dominance distributions (‘scaling property’), and the positive sequential correlation of dominance periods. The explanatory power of the proposed mechanism is considerably higher than that of previous models. Indeed, the observations explained exhibited more effective degrees of freedom (approximately 14) than the mechanism itself (between 3 and 4).
The proposed mechanism is biophysically plausible in terms of the outofequilibrium dynamics of a modular and hierarchical network of spiking neurons (see also further below). Individual modules idealize the input dependence of attractor transitions in assemblies of spiking neurons. All synaptic effects superimpose linearly, consistent with extended meanfield theory for neuronal networks (Amit and Brunel, 1997; Van Vreeswijk and Sompolinski, 1996). The interaction between ‘rivaling’ sets of modules (‘pools’) results in divisive normalization, which is consistent with many cortical models (Carandini and Heeger, 2011; Miller, 2016).
It has long been suspected that multistable phenomena in visual, auditory, and tactile perception may share a similar mechanistic origin. As the features of BR explained here are in fact universal features of multistable phenomena in different modalities, we hypothesize that similar outofequilibrium dynamics of modular networks may underlie all multistable phenomena in all sensory modalities. In other words, we hypothesize that this may be a general mechanism operating in many perceptual representations.
Dynamical mechanism
Two principal alternatives have been considered for the dynamical mechanism of perceptual decisionmaking: driftdiffusion models (Luce, 1986; Ratcliff and Smith, 2004) and recurrent network models (Wang, 2008; Wang, 2012). The mechanism proposed here combines both alternatives: at its evidence level, sensory information is integrated, over both space and time, by ‘local attractors’ in a discrete version of a driftdiffusion process. At its decision level, the population dynamics of a recurrent network implements a winnertakeall competition between ‘nonlocal attractors.’ Together, the two levels form a ‘nested attractor’ system (Braun and Mattia, 2010) operating perpetually out of equilibrium.
A recurrent network with strong competition typically ‘normalizes’ individual responses relative to the total response (Miller, 2016). Divisive normalization is considered a canonical cortical computation (Carandini and Heeger, 2011), for which multiple rationales can be found. Here, divisive normalization is augmented by indiscriminate feedforward inhibition. This combination ensures that decision activity rapidly and reliably categorizes differential input strength, largely independently of total input strength.
Another key feature of the proposed mechanism is that a ‘dominant’ decision pool applies feedback suppression to the associated evidence pool. Selective suppression of evidence for a winning hypothesis features in computational theories of ‘hierarchical inference’ (Rao and Ballard, 1999; Lee and Mumford, 2003; Parr and Friston, 2017b; Pezzulo et al., 2018), as well as in accounts of multistable perception inspired by such theories (Dayan, 1998; Hohwy et al., 2008; Weilnhammer et al., 2017). A normative reason for feedback suppression arises during continuous inference in uncertain and volatile environments, where the accumulation of sensory information is ongoing and cannot be restricted to appropriate intervals (VelizCuba et al., 2016). Here, optimal change detection requires an exponentially rising bias against evidence for the most likely state, ensuring that even weak changes are detected, albeit with some delay.
The pivotal feature of the proposed mechanism are pools of bistable variables or ‘local attractors.’ Encoding sensory inputs in terms of persistent ‘activations’ of local attractors assemblies (rather than in terms of transient neuronal spikes) creates an intrinsically retentive representation: sites that respond are also sites that retain information (for a limited time). Our results are consistent with a few tens of bistable variables in each pool. In the proposed mechanism, differential activity of two pools accumulates evidence against the dominant appearance until a threshold is reached and a reversal ensues (see also Barniv and Nelken, 2015; Nguyen et al., 2020). Conceivably, this discrete nonequilibrium dynamics might instantiate a variational principle of inference such as ‘maximum caliber’ (Pressé et al., 2013; Dixit et al., 2018).
Emergent features
The components of the proposed mechanism interact to guarantee the statistical features that characterize BR and other multistable phenomena. Discretely stochastic accumulation of differential evidence against the dominant appearance ensures sensitivity of dominance durations to nondominant input. It also ensures the invariance of relative variability (‘scaling property’) and gammalike distribution shape of dominance durations. Due to a nontrivial interaction with the competitive decision, discretely stochastic fluctuations of evidencelevel activity express themselves in a serial dependency of dominance durations. Several features of this dependency were unexpected and not reported previously, for example, the sensitivity to image contrast and the ‘burstiness’ of dominance reversals (i.e., extended episodes in which dominance periods are consistently longer or shorter than average). The fact that these predictions are confirmed by our experimental observations provides further support for the proposed mechanism.
Relation to previous models
How does the proposed mechanism compare to previous ‘dynamical’ models of multistable phenomena? It is of similar complexity as previous minimal models (Laing and Chow, 2002; Wilson, 2007; MorenoBote et al., 2010) in that it assumes four state variables at two dynamical levels, one slow (accumulation) and one fast (winnertakeall competition). It differs in reversing their ordering: visual input impinges first on the slow level, which then drives the fast level. It also differs in that stochasticity dominates the slow dynamics (as suggested by van Ee, 2009), not the fast dynamics. However, the most fundamental difference is discreteness (pools of bistable variables), which shapes all key dynamical properties.
Unlike many previous models (e.g., Laing and Chow, 2002; Wilson, 2007; MorenoBote et al., 2007; MorenoBote et al., 2010; Cohen et al., 2019), the proposed mechanism does not include adaptation (stimulationdriven weakening of evidence), but a phenomenologically similar feedback suppression (perceptiondriven weakening of evidence). Evidence from perceptual aftereffects supports the existence of both stimulation and perceptiondriven adaptation, albeit at different levels of representation. Aftereffects in the perception of simple visual features – such as orientation, spatial frequency, or direction of motion (Blake and Fox, 1974; Lehmkuhle and Fox, 1975; Wade and Wenderoth, 1978) – are driven by stimulation rather than by perceived dominance, whereas aftereffects in complex features – such as spiral motion, subjective contours, rotation in depth (Wiesenfelder and Blake, 1990; Van der Zwan and Wenderoth, 1994; Pastukhov et al., 2014a) – typically depend on perceived dominance. Several experimental observations related to BR have been attributed to stimulationdriven adaptation (e.g., negative priming, flash suppression, generalized flash suppression; Tsuchiya et al., 2006). The extent to which a perceptiondriven adaptation could also explain these observations remains an open question for future work.
Multistable perception induces a positive priming or ‘sensory memory’ (Pearson and Clifford, 2005; Pastukhov and Braun, 2008; Pastukhov et al., 2013a), which can stabilize a dominant appearance during intermittent presentation (Leopold et al., 2003; Maier et al., 2003; Sandberg et al., 2014). This positive priming exhibits rather different characteristics (e.g., shape, size and motionspecificity, inducement period, persistence period) than the negative priming/adaptation of rivaling representations (de Jong et al., 2012; Pastukhov et al., 2013a; Pastukhov and Braun, 2013b; Pastukhov et al., 2014a; Pastukhov et al., 2014b; Pastukhov, 2016). To our mind, this evidence suggest that sensory memory is mediated by additional levels of representation and not by selfstabilization of rivaling representations, as has been suggested (Noest et al., 2007; Leptourgos, 2020). To incorporate sensory memory, the present model would have to be extended to include three hierarchical levels (evidence, decision, and memory), as previously proposed by Gigante et al., 2009.
BR arises within local regions of the visual field, measuring approximately ${0.25}^{\circ}$ to ${0.5}^{\circ}$ in the fovea (Leopold, 1997; Logothetis, 1998). No rivalry ensues when the stimulated locations in the left and right eye are more distant from each other. The computational model presented here encompasses only one such local region, and therefore cannot reproduce spatially extended phenomena such as piecemeal rivalry (Blake et al., 1992) or traveling waves (Wilson et al., 2001). To account for these phenomena, the visual field would have to be tiled with replicant models linked by grouping interactions (Knapen et al., 2007; Bressloff and Webber, 2012).
A particularly intriguing previous model (Wilson, 2003) postulated a hierarchy with competing and adapting representations in eight state variables at two separate levels, one lower (monocular) and another higher (binocular) level. This ‘stacked’ architecture could explain the fascinating experimental observation that one image can continue to dominate (dominance durations $\sim 2\phantom{\rule{thinmathspace}{0ex}}\mathit{s}$) even when images are rapidly swapped between eyes (period $1/3\text{}\mathit{s}$) (Kovács et al., 1996; Logothetis et al., 1996). We expect that our hierarchical model could also account for this phenomenon if it were to be replicated at two successive levels. It is tempting to speculate that such ‘stacking’ might have a normative justification in that it might subserve hierarchical inference (Yuille and Kersten, 2006; Hohwy et al., 2008; Friston, 2010).
Another previous model (Li et al., 2017) used a hierarchy with 24 state variables at three separate levels to show that a stabilizing influence of selective visual attention could also explain slow rivalry when images are swapped rapidly. Additionally, this rather complex model reproduced the main features of Levelt’s propositions, but did not consider scaling property and sequential dependency. The model shared some of the key features of the present model (divisive inhibition, differential excitationinhibition), but added a multiplicative attentional modulation. As the present model already incorporates the ‘biased competition’ that is widely thought to underlie selective attention (Sabine and Ungerleider, 2000; Reynolds and Heeger, 2009), we expect that it could reproduce attentional effects by means of additive modulations.
Continuous inference
The notion that multistable phenomena such as BR reflect active exploration of explanatory hypotheses for sensory evidence has a venerable history (von Helmholtz, 1867; Barlow et al., 1972; Gregory, 1980; Leopold and Logothetis, 1999). The mechanism proposed here is in keeping with that notion: higherlevel ‘explanations’ compete for control (‘dominance’) of phenomenal appearance in terms of their correspondence to lowerlevel ‘evidence.’ An ‘explanation’ takes control if its correspondence is sufficiently superior to that of rival ‘explanations.’ The greater the superiority, the longer control is retained. Eventually, alternative ‘explanations’ seize control, if only briefly. This manner of operation is also consistent with computational theories of ‘analysis by synthesis’ or ‘hierarchical inference,’ although there are many differences in detail (Rao and Ballard, 1999; Parr and Friston, 2017b; Pezzulo et al., 2018).
Interacting with an uncertain and volatile world necessitates continuous and concurrent evaluation of sensory evidence and selection of motor action (Cisek and Kalaska, 2010; Gold and Stocker, 2017). Multistable phenomena exemplify continuous decisionmaking without external prompting (Braun and Mattia, 2010). Sensory decisionmaking has been studied extensively, mostly in episodic choicetask, and the neural circuits and activity dynamics underlying episodic decisionmaking – including representations of potential choices, sensory evidence, and behavioral goals – have been traced in detail (Cisek and Kalaska, 2010; Gold and Shadlen, 2007; Wang, 2012; Krug, 2020). Interestingly, there seems to be substantial overlap between choice representations in decisionmaking and in multistable situations (Braun and Mattia, 2010).
Continuous inference has been studied extensively in auditory streaming paradigms (Winkler et al., 2012; Denham et al., 2014). The auditory system seems to continually update expectations for sound patterns on the basis of recent experience. Compatible patterns are grouped together in auditory awareness, and incompatible patterns result in spontaneous reversals between alternatives. Many aspects of this rich phenomenology are reproduced by computational models driven by some kind of ‘prediction error’ (Mill et al., 2013). The dynamics of two recent auditory models (Barniv and Nelken, 2015; Nguyen et al., 2020) are rather similar to the model presented here: while one sound pattern dominates awareness, evidence against this pattern is accumulated at a subliminal level.
Relation to neural substrate
What might be the neural basis of the bistable variables/‘local attractors’ proposed here? Ongoing activity in sensory cortex appears to be lowdimensional, in the sense that the activity of neurons with similar response properties varies concomitantly (‘shared variability,’ ‘noise correlations,’ PonceAlvarez et al., 2012, Mazzucato et al., 2015, Engel et al., 2016, Rich and Wallis, 2016, Mazzucato et al., 2019). This shared variability reflects the spatial clustering of intracortical connectivity (Muir and Douglas, 2011; Okun et al., 2015; Cossell et al., 2015; Lee et al., 2016; Rosenbaum et al., 2017) and unfolds over moderately slow time scales (in the range of $100\text{}\mathit{m}\mathit{s}$ to $500\text{}\mathit{m}\mathit{s})$ both in primates and rodents (PonceAlvarez et al., 2012; Mazzucato et al., 2015; Cui et al., 2016; Engel et al., 2016; Rich and Wallis, 2016; Mazzucato et al., 2019).
Possible dynamical origins of shared and moderately slow variability have been studied extensively in theory and simulation (for reviews, see Miller, 2016; Huang and Doiron, 2017; La Camera et al., 2019). Networks with weakly clustered connectivity (e.g., 3% rewiring) can express a metastable attractor dynamics with moderately long time scales (LitwinKumar and Doiron, 2012; Doiron and LitwinKumar, 2014; Schaub et al., 2015; Rosenbaum et al., 2017). In a metastable dynamics, individual (connectivitydefined) clusters transition spontaneously between distinct and quasistationary activity levels (‘attractor states’) (Tsuda, 2001; Stern et al., 2014).
Evidence for metastable attractor dynamics in cortical activity is accumulating steadily (Mattia et al., 2013; Mazzucato et al., 2015; Rich and Wallis, 2016; Engel et al., 2016; Marcos et al., 2019; Mazzucato et al., 2019). Distinct activity states with exponentially distributed durations have been reported in sensory cortex (Mazzucato et al., 2015; Engel et al., 2016), consistent with noisedriven escape transitions (Doiron and LitwinKumar, 2014; Huang and Doiron, 2017). And several reports are consistent with external input modulating cortical activity mostly indirectly, via the rate of state transitions (Fiser et al., 2004; Churchland et al., 2010; Mazzucato et al., 2015; Engel et al., 2016; Mazzucato et al., 2019).
The proposed mechanism assumes bistable variables with noisedriven escape transitions, with transition rates modulated exponentially by external synaptic drive. Following previous work (Cao et al., 2016), we show this to be an accurate reduction of the population dynamics of metastable networks of spiking neurons.
Unfortunately, the spatial structure of the ‘shared variability’ or ‘noise correlations’ in cortical activity described above is poorly understood. However, we estimate that the cortical representation of our rivaling display involves approximately $400\text{}{\mathit{m}\mathit{m}}^{2}$ and $200\text{}{\mathit{m}\mathit{m}}^{2}$ of cortical surface in cortical areas V1 and V4, respectively (Winawer and Witthoft, 2015; Winawer and Benson, 2021). Accordingly, in each of these two cortical areas, the neural representation of rivaling stimulation can comfortably accommodate several thousand recurrent local assemblies, each capable of expressing independent collective dynamics (i.e., ‘classic columns’ comprising several ‘minicolumns’ with distinct stimulus selectivity Nieuwenhuys R, 1994, Kaas, 2012). Thus, our model assumes that the representation of two rivaling images engages approximately 1–2% of the available number of recurrent local assemblies.
Neurophysiological correlates of BR
Neurophysiological correlates of BR have been studied extensively, often by comparing reversals of phenomenal appearance during binocular stimulation with physical alternation (PA) of monocular stimulation (e.g., Leopold and Logothetis, 1996; Scheinberg and Logothetis, 1997; Logothetis, 1998; Wilke et al., 2006; Aura et al., 2008; Keliris et al., 2010; Panagiotaropoulos et al., 2012; Bahmani et al., 2014; Xu et al., 2016; Kapoor et al., 2020; Dwarakanath et al., 2020). At higher cortical levels, such as inferior temporal cortex (Scheinberg and Logothetis, 1997) or prefrontal cortex (Panagiotaropoulos et al., 2012; Kapoor et al., 2020; Dwarakanath et al., 2020), BR and PA elicit broadly comparable neurophysiological responses that mirror perceptual appearance. Specifically, activity crosses its average level at the time of each reversal, roughly in phase with perceptual appearance (Scheinberg and Logothetis, 1997; Kapoor et al., 2020). In primary visual cortex (area V1), where many neurons are dominated by input from one eye, neurophysiological correlates of BR and PA diverge in an interesting way: whereas modulation of spiking activity is weaker during BR than PA (Leopold and Logothetis, 1996; Logothetis, 1998; Wilke et al., 2006; Aura et al., 2008; Keliris et al., 2010), measures thought to record dendritic inputs are modulated comparably under both conditions (Aura et al., 2008; Keliris et al., 2010; Bahmani et al., 2014; Yang et al., 2015; Xu et al., 2016). A stronger divergence is observed at an intermediate cortical level (visual area V4), where neurons respond to both eyes. Whereas some units modulate their spiking activity comparably during BR and PA (i.e., increased activity when preferred stimulus becomes dominant), other units exhibit the opposite modulation during BR (i.e., reduced activity when preferred stimulus gains dominance) (Leopold and Logothetis, 1996; Logothetis, 1998; Wilke et al., 2006). Importantly, at this intermediate cortical level, activity crosses its average level well before and after each reversal (Leopold and Logothetis, 1996; Logothetis, 1998), roughly in quarter phase with perceptual appearance.
Some of these neurophysiological observations are directly interpretable in terms of the model proposed here. Specifically, activity modulation at higher cortical levels (inferotemporal cortex, prefrontal cortex) could correspond to ‘decision activity,’ predicted to vary in phase with perceptual appearance. Similarly, activity modulation at intermediate cortical levels (area V4) could correspond to ‘evidence activity,’ which is predicted to vary in quarter phase with perceptual appearance. This identification would also be consistent with the neurophysiological evidence for attractor dynamics in columns of area V4 (Engel et al., 2016). The subpopulation of area V4 with opposite modulation could mediate feedback suppression from decision levels. If so, our model would predict this subpopulation to vary in counterphase with perceptual appearance. Finally, the fascinating interactions observed within primary visual cortex (area V1) are well beyond the scope of our simple model. Presumably, a ‘stacked’ model with two successive levels of competitive interactions at monocular and binocular levels or representation (Wilson, 2003; Li et al., 2017) would be required to account for these phenomena.
Conclusion
As multistable phenomena and their characteristics are ubiquitous in visual, auditory, and tactile perception, the mechanism we propose may form a general part of sensory processing. It bridges neural, perceptual, and normative levels of description and potentially offers a ‘comprehensive taskperforming model’ (Kriegeskorte and Douglas, 2018) for sensory decisionmaking.
Materials and methods
Psychophysics
Request a detailed protocolSix practiced observers participated in the experiment (four males, two females). Informed consent, and consent to publish, was obtained from all observers, and ethical approval Z22/16 was obtained from the Ethics Commission of the Faculty of Medicine of the OttovonGuericke University, Magdeburg. Stimuli were displayed on an LCD screen (EIZO ColorEdge CG303W, resolution $2560\times 1600$ pixels, viewing distance was 104 cm, single pixel subtended $0.014}^{\circ$, refresh rate 60 Hz) and were viewed through a mirror stereoscope, with viewing position being stabilized by chin and head rests. Display luminance was gammacorrected and average luminance was $50\text{}\mathit{c}\mathit{d}/{m}^{2}$.
Two grayscale circular orthogonally oriented gratings ($+{45}^{\circ}$ and ${45}^{\circ}$) were presented foveally to each eye. Gratings had diameter of $1.6}^{\circ$, spatial period $2\text{}\mathit{c}\mathit{y}\mathit{c}/\mathit{d}\mathit{e}\mathit{g}$. To avoid a sharp outer edge, grating contrast was modulated with Gaussian envelope (inner radius $0.6}^{\circ$, $\sigma ={0.2}^{\circ}$). Tilt and phase of gratings was randomized for each block. Five contrast levels were used: 6.25, 12.5, 25, 50, and 100%. Contrast of each grating was systematically manipulated, so that each contrast pair was presented in two blocks (50 blocks in total). Blocks were $120\phantom{\rule{thinmathspace}{0ex}}\mathit{s}$ long and separated by a compulsory 1 min break. Observers reported on the tilt of the visible grating by continuously pressing one of two arrow keys. They were instructed to press only during exclusive visibility of one of the gratings, so that mixed percepts were indicated by neither key being pressed (25% of total presentation time). To facilitate binocular fusion, gratings were surrounded by a dichoptically presented square frame (outer size 9.8°, inner size 2.8°).
Dominance periods of ‘clear visibility’ were extracted in sequence from the final $90\phantom{\rule{thinmathspace}{0ex}}\mathit{s}$ of each block and the mean linear trend was subtracted from all values. Values from the initial $30\phantom{\rule{thinmathspace}{0ex}}\mathit{s}$ were discarded. To make comparable the dominance periods of different observers, values were rescaled by the ratio of the allconditionallobserver average ($2.5\phantom{\rule{thinmathspace}{0ex}}\mathit{s}$) and the allcondition average of each observer ($2.5\pm 1.3\phantom{\rule{thinmathspace}{0ex}}\mathit{s}$). Finally, dominance periods from symmetric conditions $({c}_{\mathit{l}\mathit{e}\mathit{f}\mathit{t}},{c}_{\mathit{r}\mathit{i}\mathit{g}\mathit{h}\mathit{t}})$ with $c}_{\mathit{l}\mathit{e}\mathit{f}\mathit{t}}={c}_{\mathit{r}\mathit{i}\mathit{g}\mathit{h}\mathit{t}$ were combined into a single category $({c}_{\mathit{d}\mathit{o}\mathit{m}},{c}_{\mathit{s}\mathit{u}\mathit{p}})$, where $c}_{\mathit{d}\mathit{o}\mathit{m}$ ($c}_{\mathit{s}\mathit{u}\mathit{p}$) was the contrast viewed by the dominant (suppressed) eye. The number of observed dominance periods ranged from 900 to 1700 per contrast combination ($1300\pm 240$).
For the dominance periods $T$ observed in each condition, first, second, and third central moments were computed, as well as coefficient of variation $c}_{V$ and skewness $\gamma}_{1$ relative to coefficient of variation:
The expected standard error of the mean for distribution moments is 2% for the mean, 3% for the coefficient of variation, and 12% for skewness relative to coefficient of variation, assuming 1000 gammadistributed samples.
Coefficients of sequential correlations were computed from pairs of periods $({T}_{i},{T}_{j})$ with opposite dominance (first and next: ‘lag’ $ji=1$), pairs of periods with same dominance (first and next but one: ‘lag’ $ji=2$), and so on,
where $\u27e8T\u27e9$ and $\u27e8{T}^{2}\u27e9$ are mean duration and mean square duration, respectively. The expected standard deviation of the coefficient of correlation is 0.03, assuming 1000 gammadistributed samples.
To analyze ‘burstiness,’ we adapted a statistical measure used in neurophysiology (Compte et al., 2003). First, sequences of dominance periods were divided into all possible subsets of $k\in \{2,3,\dots ,16\}$ successive periods and mean durations computed for each subset. Second, heterogeneity was assessed by computing, for each size $k$, the coefficient of variation c_{V} over mean durations, compared to the mean and variance of the corresponding coefficient of variation for randomly shuffled sequences of dominance periods. Specifically, a ‘burstiness index’ was defined for each subset size $k$ as.
where $c}_{V$ is the coefficient of variation over subsets of size $k$ and where $\u27e8{c}_{V}{\u27e9}_{\mathit{s}\mathit{h}\mathit{u}\mathit{f}\mathit{f}\mathit{l}\mathit{e}}$ and $\u27e8{c}_{V}^{2}{\u27e9}_{\mathit{s}\mathit{h}\mathit{u}\mathit{f}\mathit{f}\mathit{l}\mathit{e}}$ are, respectively, mean and mean square of the coefficients of variation from shuffled sequences.
Model
Request a detailed protocolThe proposed mechanism for BR dynamics relies on discretely stochastic processes (‘birthdeath’ or generalized Ehrenfest processes). Bistable variables $x\in \{0,1\}$ transition between active and inactive states with timevarying Poisson rates ${\nu}^{+}(t)$ (activation) and ${\nu}^{}(t)$ (inactivation). Two ‘evidence pools’ of $N$ such variables, $E$ and $E}^{\mathrm{\prime}$, represent two kinds visual evidence (e.g., for two visual orientations), whereas two ‘decision pools,’ $R$ and $R}^{\mathrm{\prime}$, represent alternative perceptual hypotheses (e.g., two grating patterns) (see also Appendix 1—figure 1). Thus, instantaneous dynamical state is represented by four active counts ${n}_{e},{n}_{{e}^{\mathrm{\prime}}},{n}_{r},{n}_{{r}^{\mathrm{\prime}}}\in [0,N]$ or, equivalently, by four active fractions $e,{e}^{\mathrm{\prime}},r,{r}^{\mathrm{\prime}}\in [0,1]$.
The development of pool activity over time is described by a master equation for probability ${P}_{n}(t)$ of the number $n(t)\in [0,N]$ active variables.
For constant $\nu}^{\pm$, the distribution ${P}_{n}(t)$ is binomial at all times Karlin and McGregor, 1965, van Kampen, 1981. The time development of the number of active units ${n}_{X}(t)$ in pool $X$ is an inhomogeneous Ehrenfest process and corresponds to the count of activations, minus the count of deactivations,
where $\mathcal{B}\left(n,\nu \mathrm{\Delta}t\right)$ is a discrete random variable drawn from a binomial distribution with trial number $n$ and success probability $\nu \mathrm{\Delta}t$.
All variables of a pool have identical transition rates, which depend exponentially on the ‘potential difference’ $\mathrm{\Delta}u=u+{u}^{0}$ between states, with a inputdependent component $u$ and a baseline component $u}^{0$:
where $\nu}_{e$ and $\nu}_{r$ are baseline rates and $u}_{e}^{0$ and $u}_{r}^{0$ baseline components. The inputdependent components of effective potentials are modulated linearly by synaptic couplings
Visual inputs are $I=f(c)$ and ${I}^{\mathrm{\prime}}=f({c}^{\mathrm{\prime}})$, respectively, where
is a monotonically increasing, logarithmic function of image contrast, with parameter γ.
Degrees of freedom
Request a detailed protocolThe proposed mechanism has 11 independent parameters – 6 synaptic couplings, 2 baseline rates, 2 baseline potentials, 1 contrast nonlinearity – which were fitted to experimental observations. A 12th parameter – pool size – remained fixed.
Symbol  Description  Value 

N  Pool size  25 
1/v_{e}  Baseline rate, evidence  1.95 ± 0.10 s 
1/v_{r}  Baseline rate, decision  0.018 ± 0.010 s 
$u}_{e}^{0$  Baseline potential, evidence  1.65 ± 0.24 
$u}_{r}^{0$  Baseline potential, decision  4.94 ± 0.67 
w_{vis}  Visual input coupling  1.780 ± 0.092 
w_{exc}  Feedforward excitation  152.2 ± 3.7 
w_{inh}  Feedforward inhibition  32.10 ± 2.3 
w_{comp}  Lateral competition  33.4 ± 1.2 
w_{coop}  Lateral cooperation  15.21± 0.59 
w_{supp}  Feedback suppression  2.34 ± 0.14 
γ  Contrast nonlinearity  0.071 ± 0.011 
Fitting procedure
Request a detailed protocolThe experimental dataset consisted of two 5 × 5 arrays $X}_{i}^{\mathit{e}\mathit{x}\mathit{p}$ for mean $\u27e8T\u27e9$ and coefficient of variation $c}_{V$, plus two scalar values for skewness ${\gamma}_{1}=2$ and correlation coefficient $c{c}_{1}=0.06$. The two scalar values corresponded to the (rounded) average values observed over the 5 × 5 combinations of image contrast. In other words, the fitting procedure prescribed contrast dependencies for the first two distribution moments, but not for correlation coefficients.
The fit error $E}_{\mathit{f}\mathit{i}\mathit{t}$ was computed as a weighted sum of relative errors
with weighting $w=[1,1,1,\phantom{\rule{1.0pt}{0ex}}{\displaystyle {\displaystyle 1}}\phantom{\rule{1.0pt}{0ex}}/\phantom{\rule{1.5pt}{0ex}}{\displaystyle {\displaystyle 4}}]$ emphasizing distribution moments.
Approximately 400 minimization runs were performed, starting from random initial configurations of model parameters. For the optimal parameter set, the resulting fit error for the mean observer dataset was approximately 13%. More specifically, the fit errors for mean dominance $\u27e8T\u27e9$, coefficient of variation $c}_{V$, relative skewness $\gamma}_{1}/{c}_{V$, and correlation coefficients $c{c}_{1}$ and $c{c}_{2}$ were 9.8, 7.9, 8.7, 70, and 46%, respectively. Here, fit errors for relative skewness and correlation coefficients were computed for the isocontrast conditions, where experimental observations were least noisy.
To confirm that resulting fit was indeed optimal and could not be further improved, we studied the behavior of the fit error in the vicinity of the optimal parameter set. For each parameter $\alpha}_{i$, 30 values $\alpha}_{i}^{(j)$ were picked in the direct vicinity of the optimal parameter $\alpha}_{i}^{\mathit{o}\mathit{p}\mathit{t}$ (Appendix 1—figure 9). The resulting scatter plot of value pairs $\alpha}_{i}^{(j)$ and fit error $E}_{\mathit{f}\mathit{i}\mathit{t}}^{(j)$ was approximated by a quadratic function, which provided 95% confidence intervals for $\alpha}_{i}^{(j)$. For all parameters except $\nu}_{r$, the estimated quadratic function was convex and the coefficient of the Hessian matrix associated with the fit error was positive. Additionally, the estimated extremum of each parabola was close to the corresponding optimal parameter, confirming that the parameter set was indeed optimal (Appendix 1—figure 9).
To minimize fit error, we repeated a stochastic gradient descent from randomly chosen initial parameter. Interestingly, the ensemble of suboptimal solutions found by this procedure populated a lowdimensional manifold of the parameter space in three principal components accounted for 95% of the positional variance. Thus, models that reproduce experimental observations with varying degrees of freedom exhibit only 3–4 effective degrees of freedom. We surmise that this is due, on the one hand, to the severe constraints imposed by our model architecture (e.g., discrete elements, exponential input dependence of transition rates) and, on the other hand, by the requirement that the dynamical operating regime behaves as a relaxation oscillator.
In support of this interpretation, we note that our 5 × 5 experimental measurements of $\u27e8T\u27e9$ and $c}_{V$ were accurately described by ‘quadric surfaces’ ($z={a}_{1}+{a}_{2}x+{a}_{3}y+{a}_{4}{x}^{2}+{a}_{5}xy+{a}_{6}{y}^{2}$) with six coefficients each. Together with the two further measurements of $\gamma}_{1}/{c}_{V$ and $c{c}_{1}$, our experimental observations accordingly exhibited approximately $6\times 2+2=14$ effective degrees of freedom. This number was sufficient to constrain the 3–4 dimensional manifold of parameters, where the model operated as a relaxation oscillator with a particular dynamics, specifically, a slowfast dynamics associated, respectively, with the accumulation and reversal phases of BR.
Alternative model
Request a detailed protocolAs an alternative model (Laing and Chow, 2002), a combination of competition, adaptation, and imagecontrastdependent noise was fitted to reproduce four 5 × 5 arrays $X}_{i}^{\mathit{e}\mathit{x}\mathit{p}$ for mean $\u27e8T\u27e9$, coefficient of variation $c}_{V$, skewness $\gamma}_{1$, and correlation coefficient $c{c}_{1}$. Fit error $E}_{\mathit{f}\mathit{i}\mathit{t}$ was computed as the average of relative errors
For purposes of comparison, a weighted fit error with weighting $w=[1,1,1,\phantom{\rule{1.0pt}{0ex}}{\displaystyle {\displaystyle 1}}\phantom{\rule{1.0pt}{0ex}}/\phantom{\rule{1.5pt}{0ex}}{\displaystyle {\displaystyle 4}}]$ was computed, as well.
The model comprised four state variables and independent colored noise:
where $F(x)=[1+\mathrm{exp}(x/\kappa ){]}^{1}$ is a nonlinear activation function and $\xi (t)$ is white noise.
Additionally, both input $I}_{1,2$ and noise amplitude $\sigma}_{1,2$ were assumed to depend nonlinearly on image contrast $c}_{1,2$:
This coupling between input and noise amplitude served stabilizes the shape of dominance distributions over different image contrasts (‘scaling property’).
Parameters for competition β = 10, activity time constant $\tau}_{r}=50\text{}\mathit{m}\mathit{s$, noise time constant $\tau}_{n}=500\text{}\mathit{m}\mathit{s$, and activation function $k=0.1$ were fixed. Parameters for adaptation strength ${\varphi}_{a}\in [1,100]$, adaptation time constant ${\tau}_{a}\in [1,00]$, contrast dependence of input ${b}_{I}\in [1,5]$, ${k}_{I}\in [0.1,5]$, and contrast dependence of noise amplitude ${b}_{\sigma}\in [0.1,1]$, ${k}_{\sigma}\in [0.1,1]$ were explored within the ranges indicated.
The best fit (determined with a genetic algorithm) was as follows: ${\varphi}_{a}=18.39$, ${\tau}_{a}=22.78$, ${k}_{I}=1.52$, ${b}_{I}=2.92$, ${k}_{\sigma}=0.57$, ${b}_{\sigma}=0.19$. The fit errors for mean dominance $\u27e8T\u27e9$, coefficient of variation $c}_{V$, skewness $\gamma}_{1$, and correlation coefficient $c{c}_{1}$ were, respectively, 11.3, 8.3, 20, and 55%. The fit error for correlation coefficient $c{c}_{2}$ was 180% (because the model predicted negative values). The combined average for $\u27e8T\u27e9$, $c}_{V$, and $\gamma}_{1$ was 13.2%. The fit error obtained with weighting $w=(1,1,1,\phantom{\rule{1.0pt}{0ex}}{\displaystyle {\displaystyle 1}}\phantom{\rule{1.0pt}{0ex}}/\phantom{\rule{1.5pt}{0ex}}{\displaystyle {\displaystyle 4}})$ was 16.4%.
For Figure 6d, the alternative model was fitted only to observations at equal image contrast, $c={c}^{\mathrm{\prime}}$: mean dominance $\u27e8T\u27e9$, coefficient of variation $c}_{V$, skewness $\gamma}_{1$, and correlation coefficient $c{c}_{1}$. The combined average fit error for $\u27e8T\u27e9$, $c}_{V$, and $\gamma}_{1$ was 11.2%. The combined average for all four observables was 22%.
Spiking network simulation
Request a detailed protocolTo illustrate a possible neural realization of ‘local attractors,’ we simulated a competitive network with eight identical assemblies of excitatory and inhibitory neurons, which collectively expresses a spontaneous and metastable dynamics (Mattia et al., 2013). One assembly (denoted as ‘foreground’) comprised 150 excitatory leakyintegrateandfire neurons, which were weakly coupled to the 1050 excitatory neurons of the other assemblies (denoted as ‘background’), as well as 300 inhibitory neurons. Note that background assemblies are not strictly necessary and are included only for the sake of verisimilitude. The connection probability between any two neurons was $c=2/3$. Excitatory synaptic efficacy between neurons in the same assembly and in two different assemblies was $J}_{\mathrm{i}\mathrm{n}\mathrm{t}\mathrm{r}\mathrm{a}}=0.612\mathit{m}\mathit{V$ and $J}_{\mathrm{i}\mathrm{n}\mathrm{t}\mathrm{e}\mathrm{r}}=0.403\mathit{m}\mathit{V$, respectively. Inhibitory synaptic efficacy was $J}_{I}=1.50\mathit{m}\mathit{V$, and the efficacy of excitatory synapses onto inhibitory neurons was $J}_{IE}=0.560\mathit{m}\mathit{V$. Finally, ‘foreground’ neurons, ‘background neurons,’ and ‘inhibitory neurons’ each received independent Poisson spike trains of $2400\mathit{H}\mathit{z}$, $2280\mathit{H}\mathit{z}$ and $2400\mathit{H}\mathit{z}$, respectively. Other settings were as in Mattia et al., 2013. As a result of these settings, ‘foreground’ activity transitioned spontaneously between an ‘off’ state of approximately $4\phantom{\rule{thinmathspace}{0ex}}\mathit{H}\mathit{z}$ and an ‘on’ state of approximately $40\phantom{\rule{thinmathspace}{0ex}}\mathit{H}\mathit{z}$.
Appendix 1
Model schematics
Metastable attractor dynamics
We postulate assemblies or clusters of neurons with recurrent random connectivity as operative units of sensory representations. In our model, such assemblies are reduced to binary variables with Poisson transitions. Our key assumption is that the rates $\nu}^{\pm$ of activation and inactivation events are modulated exponentially by synaptic input (Equation 1):
Here, we show that these assumptions are a plausible reduction of recurrently connected assemblies of spiking neurons.
Following earlier work, we simulated a competitive network with eight identical assemblies of excitatory and inhibitory neurons (Appendix 1—figure 2a), configured to collectively express a metastable activity dynamics (Mattia et al., 2013). Here, we are interested particularly in the activity dynamics of one excitatory assembly (dubbed ‘foreground’), which expresses two quasistable ‘attractor’ states: an ‘on’ state with high activity. In the context of the metastable network, the ‘foreground’ assembly is bistable in that it transitions spontaneously between ‘on’ and ‘off’ states. Such state transitions are noisedriven escape events from an energy well and therefore occur with Poissonlike rates $\nu}^{+$ (activation) and $\nu}^{$ (inactivation). Figure 1b and Appendix 1—figure 2b illustrate this energy landscape for the ‘diffusion limit’ of very large assemblies, where quasistable activity levels are $\nu}_{\mathit{f}\mathit{o}\mathit{r}\mathit{e}}\simeq 45\phantom{\rule{thinmathspace}{0ex}}\mathit{H}\mathit{z$ for the ‘on’ state and $\nu}_{\mathit{f}\mathit{o}\mathit{r}\mathit{e}}\simeq 4\phantom{\rule{thinmathspace}{0ex}}\mathit{H}\mathit{z$ for the ‘off’ state. For small assemblies with fewer neurons, the difference between ‘on’ and ‘off’ states is less pronounced.
To establish the dependence of transition rates on external input to the ‘foreground’ assembly, we stepped external input rate $\mathrm{\Delta}{\nu}_{\mathit{e}\mathit{x}\mathit{t}}$ between two values selected from a range $\mathrm{\Delta}{\nu}_{\mathit{e}\mathit{x}\mathit{t}}\in [120\phantom{\rule{thinmathspace}{0ex}}\mathit{H}\mathit{z},50\phantom{\rule{thinmathspace}{0ex}}\mathit{H}\mathit{z}]$ and monitored the resulting spiking activity in individual neurons, as well as activity $\nu}_{\mathit{f}\mathit{o}\mathit{r}\mathit{e}$ of the entire population (Appendix 1—figure 2c, upper and middle panels). Comparing population activity to a suitable threshold, we identified ‘on’ and ’off’ states of the ‘foreground’ assembly (Appendix 1—figure 2c, lower panel), as well as the probability of ‘on’ or ‘off’ states at different points in time following a step in $\mathrm{\Delta}{\nu}_{\mathit{e}\mathit{x}\mathit{t}}$ (Appendix 1—figure 2d). From the hazard rate (temporal derivative of probability), we then estimated the rates $\nu}^{\pm$ of state transitions shown in Appendix 1—figure 2d. Transition rates $\nu}^{\pm$ vary approximately antisymmetrically and exponentially with external input $\mathrm{\Delta}{\nu}_{\mathit{e}\mathit{x}\mathit{t}}$. In the present example, ${\nu}^{+}\simeq 2.2\text{}\mathit{H}\mathit{z}\mathrm{exp}\left(+0.85\text{}\mathit{s}\phantom{\rule{thinmathspace}{0ex}}\mathrm{\Delta}{\nu}_{\mathit{e}\mathit{x}\mathit{t}}\right)$ and ${\nu}^{}\simeq 0.5\text{}\mathit{H}\mathit{z}\mathrm{exp}\left(0.79\text{}\mathit{s}\phantom{\rule{thinmathspace}{0ex}}\mathrm{\Delta}{\nu}_{\mathit{e}\mathit{x}\mathit{t}}\right)$ (Appendix 1—figure 2e, red and blue lines). This Arrhenius–Van’tHofflike dependence of escape rates is a consequence of the approximately linear dependence of activation energy on external input. Escape kinetics is typical for attractor systems and motivates Equation 1.
Quality of representation
Accumulation of information
A birthdeath process – defined as $N$ bistable variables with transition rates $\nu}^{\pm}=v{e}^{\pm ws$, where $\nu$ is a baseline rate and $w$ a coupling constant – accumulates and retains information about input $s$, performing as a ‘leaky integrator’ with a characteristic time scale [Braun and Mattia, 2010]. Specifically, the value of $s$ may be inferred from fractional activity $x(t)$ at time $t$, if coupling $w$ and baseline rate $\nu$ are known. The inverse variance of the maximum likelihood estimate is given by the Fisher information
Its value grows with time, approaching ${J}_{x}=N{w}^{2}/cos{h}^{2}(ws/2)$ for $t\to \mathrm{\infty}$. For small inputs $s\simeq 0$, the Fisher information increases monotonically as ${J}_{x}(t)\approx \left(N{w}^{2}/4\right)\mathrm{tanh}\left(\nu t/2\right)$. Surprisingly, the upper bound of ${J}_{x}\le N{w}^{2}/4$ depends linearly on pool size $N$, but quadratically on coupling $w$. Thus, stronger coupling substantially improves encoding accuracy (of input $s$).
The rate at which Fisher information is accumulated by a pool is set by the baseline transition rate $\nu$. An initially inactive pool, with n_{0} = 0, accumulates Fisher information at an initial rate of $\mathrm{\partial}}_{t}{{J}_{x}}_{t=0}=\left(\nu N{w}^{2}/4\right){e}^{ws/2$. Thus, any desired rate of gaining Fisher information may be obtained by choosing an appropriate value for $\nu$. However, unavoidably, after an input $s$ has ceased (and was replaced by another), information about $s$ is lost at the same rate.
Integration of noisy samples
Birthdeath processes are able to encode also noisy sensory inputs, capturing much of the information provided. When an initially inactive pool receives an input $s$ over time $t$, stochastic activity $n(t)$ gradually accumulates information about the value of $s$. Normally distributed inputs $s\in N(\mu ,\sigma )$ provide Fisher information $J}_{s}=1/{\sigma}^{2$ about mean µ. Pool activity $n(t)$ accumulates Fisher information ${J}_{n}(t)$ about input mean µ, which may be compared to $J}_{s$. Comparatively small pools with strong coupling (e.g., $N=25$, $w=2.5$) readily capture 90% of the information provided (Appendix 1—figure 5a).
Moreover, pools readily permit information from multiple independent inputs to be combined over space and/or time. For example, the combined activity of four pools ($N=25$, $w=2.5$), which receive concurrently four independent samples, captures approximately 80% of the information provided, and a single pool receiving four samples in succession still retains approximately 60% of the information provided (Appendix 1—figure 5b,c). In the latter case, retention is compromised by the ‘leaky’ nature of stochastic integration. Whether signals are being integrated over space or time, the retained fraction of information is highest for inputs of moderate and larger variance $\sigma}^{2$ (Appendix 1—figure 5b,c). This is because inputs with smaller variance are degraded more severely by the internal noise of a birthdeath process (i.e., stochastic activations and inactivations).
Suitability for inference
Summation of heterogeneous neural responses can be equivalent to Bayesian integration of sensory information [Beck et al., 2008; Pouget et al., 2013]. In general, this is the case when response variability is ‘Poissonlike’ and response tuning differs only multiplicatively [Ma et al., 2006; Ma et al., 2008]. We now show that bistable stochastic variables ${x}_{i}(t)$, with heterogeneous transition rates ${\nu}_{i}^{\pm}(s)$, satisfy these conditions as long as synaptic coupling $w$ is uniform.
Assuming initially inactive variables, ${x}_{i}(0)=0$, incremental responses ${x}_{i}(\mathrm{\Delta}t)$ after a short interval $\mathrm{\Delta}t$ are binomially distributed about mean $\u27e8{x}_{i}(\mathrm{\Delta}t)\u27e9$, which is approximately
where ${\varphi}_{i}={\nu}_{i}\mathrm{\Delta}t/2$ reflects (possibly heterogeneous) response tuning and $f(s)={e}^{ws/2}$ represents a common response function which depends only on synaptic coupling $w$. The Fisher information, about $s$, of individual responses is
as long as expected activation $\u27e8{x}_{i}\u27e9$ is small. The Fisher information of summed responses $\sum _{i}{x}_{i}$ is
and equals the combined Fisher information of individual responses. Accordingly, the summation of bistable activities with heterogeneous transition rates ${\nu}_{i}$ optimally integrates information, provided expected activations remain small, $\u27e8{x}_{i}\u27e9\ll $1, and synaptic coupling $w$ is uniform.
Categorical choice
The ‘biased competition’ circuit proposed here expresses a categorical decision by either raising $r$ towards unity (and lowering $r}^{\mathrm{\prime}$ towards zero) or vice versa. Here, we describe its stochastic steadystate response to constant visual inputs $I=f(c)$ and ${I}^{\mathrm{\prime}}=f({c}^{\mathrm{\prime}})$ and for arbitrary initial conditions of $e$, $e}^{\mathrm{\prime}$, $r$ and $r}^{\mathrm{\prime}$ (Appendix 1—figure 3). Note that, for purposes of this analysis, evidence activity $e$, $e}^{\mathrm{\prime}$ was not subject to feedback suppression.
The choice is random when the input is ambiguous, $I\simeq {I}^{\prime}$, but quickly becomes deterministic with growing input bias $I{I}^{\mathrm{\prime}}\S gt;0$. Importantly, the choice is consistently determined by visual input for all initial conditions. The 75% performance level is reached for biases $I{I}^{\mathrm{\prime}}\approx 0.04\phantom{\rule{thinmathspace}{0ex}}\mathrm{t}\mathrm{o}\phantom{\rule{thinmathspace}{0ex}}0.06$.
Mutual inhibition $w}_{\mathit{c}\mathit{o}\mathit{m}\mathit{p}$ controls the width of the ambiguous region around $I={I}^{\mathrm{\prime}}$, and selfexcitation $w}_{\mathit{c}\mathit{o}\mathit{o}\mathit{p}$ ensures a categorical decision even for small $I,{I}^{\mathrm{\prime}}\simeq 0$. The balance between feedforward excitation $w}_{\mathit{e}\mathit{x}\mathit{c}$ and inhibition $w}_{\mathit{i}\mathit{n}\mathit{h}$ eliminates decision failures for all but the largest values of $I,{I}^{\mathrm{\prime}}\S gt;0.7$ and reduces the degree to which sensitivity to differential input $I{I}^{\prime}$ varies with total input $I+{I}^{\prime}$.
For particularly high values of input $I,{I}^{\mathrm{\prime}}\S gt;0.7$, no categorical decision is reached and activities of both $r$ and $r}^{\mathrm{\prime}$ grow above 0.5. In the full model, such inconclusive outcomes are eliminated by feedback suppression.
Deterministic dynamics
In the deterministic limit of $N\to \mathrm{\infty}$, fractional pool activity $x$ equals its expectation $\u27e8x\u27e9$ and the relaxation dynamics of Equation 2 becomes
with characteristic time ${\tau}_{x}=\frac{1}{{\nu}^{+}+{\nu}_{}}=\mathrm{{\rm Y}}(\mathrm{\Delta}u)$ and asymptotic values ${x}_{\mathrm{\infty}}=\frac{{\nu}^{+}}{{\nu}^{+}+{\nu}^{}}=\mathrm{\Phi}(\mathrm{\Delta}u)$, where $\mathrm{\Delta}u$ is the potential difference. Input dependencies of characteristic time and of asymptotic value follow from Equation 1:
Evidence pools
The relaxation dynamics of evidence pools is given by Equation 2 and Equation 3′. As shown in the next section, reversals occur when evidence difference $e{e}^{\mathrm{\prime}}$ reaches a reversal threshold $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$. For example, a dominance period of evidence $e}^{\mathrm{\prime}$ begins with ${e}^{\mathrm{\prime}}}_{\mathit{s}\mathit{t}\mathit{a}\mathit{r}\mathit{t}}={e}_{\mathit{s}\mathit{t}\mathit{a}\mathit{r}\mathit{t}}+{\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$ and ends when the concurrent habituation of $e}^{\mathrm{\prime}$ and recovery of $e$ have inverted the situation to $e}_{\mathit{e}\mathit{n}\mathit{d}}={{e}^{\mathrm{\prime}}}_{\mathit{e}\mathit{n}\mathit{d}}+{\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$ (Appendix 1—figures 4). Once the deterministic limit has settled into a limit cycle, all dominance periods start from, and end at, the same evidence levels.
If pool $R}^{\mathrm{\prime}$ has just become dominant, so that ${r}^{\mathrm{\prime}}\simeq 1$ and $r\simeq 0$, the statedependent potential differences are
and the deterministic development is
with asymptotic values
and characteristic times
The starting points of the development, $e}_{\mathit{r}\mathit{e}\mathit{v}$ and ${e}^{\mathrm{\prime}}}_{\mathit{r}\mathit{e}\mathit{v}$ (dashed lines in Appendix 1—figure 4a,b), depend mostly on total input $c+{c}^{\prime}$ and only little on input difference $c{c}^{\prime}$. Accordingly, for a given level of total input $c+{c}^{\mathrm{\prime}}$, the situation is governed by the distance between asymptotic evidence levels $\mathrm{\Delta}}_{\mathrm{\infty}}={{e}^{\prime}}_{\mathrm{\infty}}{e}_{\mathrm{\infty}$ and by characteristic times $\tau}_{e$, $\tau}_{{e}^{\mathrm{\prime}}$.
The dependence on input bias $c{c}^{\prime}$ of effective potential $\mathrm{\Delta}{u}_{e}+{u}_{e}$, characteristic time $\tau}_{e$, and asymptotic value $e}_{\mathrm{\infty}$ is illustrated in Appendix 1—figure 4c–e. The potential range of relaxation is $e}_{\mathit{r}\mathit{e}\mathit{v}}\to {e}_{\mathrm{\infty}$ and ${e}^{\mathrm{\prime}}}_{\mathit{r}\mathit{e}\mathit{v}}\to {{e}^{\prime}}_{\mathrm{\infty}$, where reversal levels $e}_{\mathit{r}\mathit{e}\mathit{v}$ and ${e}^{\mathrm{\prime}}}_{\mathit{r}\mathit{e}\mathit{v}$ can be obtained numerically.
Dominance durations depend more sensitively on the slower of the two concurrent processes as it sets the pace of the combined development. The initial rates $\rho}_{e$ and $\rho}_{{e}^{\mathrm{\prime}}$ after a reversal of the two opponent relaxations
provide a convenient proxy for relative rate. As shown in app. Figure 4f, when strongerinput evidence $e$ dominates, recovery of weakerinput evidence (red up arrow on blue background) is slower than habituation of strongerinput evidence (blue down arrow on blue background). Conversely, when weakerinput evidence $e}^{\mathrm{\prime}$ dominates, recovery of strongerinput evidence (blue up arrow on red background) is slower than habituation of weakerinput evidence (red down arrow on red background). In short, dominance durations always depend more sensitively on the recovery of the currently nondominant evidence than on the habituation of the currently dominant evidence.
If the two evidence populations $E$, $E}^{\mathrm{\prime}$ have equal and opposite potential differences, $\mathrm{\Delta}{u}_{e}=\mathrm{\Delta}{u}_{{e}^{\mathrm{\prime}}}$, then they also have equal and opposite activation and inactivation rates (Equation 1)
and identical characteristic times $\tau}_{e$ (recovery of $E$) and $\tau}_{{e}^{\mathrm{\prime}}$ (habituation of $E}^{\mathrm{\prime}$). In this special case, the two processes may be combined and the development of evidence difference $\mathrm{\Delta}e=e{e}^{\prime}$ is
Starting from $\mathrm{\Delta}e(0)={\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}}$, we consider the firstpassagetime of $\mathrm{\Delta}e(t)$ through $+{\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}}$. If a crossing is certain (i.e. when $\frac{{\nu}^{+}}{{\nu}^{+}+{\nu}^{}}>{\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}}$), the firstpassagetime $T$ writes
A similar hyperbolic dependence obtains also in all other cases. When the distance between asymptotic levels $\mathrm{\Delta}}_{\mathrm{\infty}$ falls below the reversal threshold $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$, dominance durations become infinite and reversals cease.
The hyperbolic dependence of dominance durations, illustrated in Appendix 1—figure 4d, has an interesting implication. Consider the point of equidominance, at which both dominance durations are equal and of moderate duration. Increasing the difference between image contrasts (e.g., increasing $c\to c+\mathrm{\Delta}c$ and decreasing ${c}^{\prime}\to {c}^{\prime}\mathrm{\Delta}c$) increases $\mathrm{\Delta}}_{\mathrm{\infty}$ during the dominance of $e$ and decreases it during the dominance of $e}^{\mathrm{\prime}$. Due to the hyperbolic dependence, longer dominance periods lengthen more ($T\to T+\mathrm{\Delta}T$) than shorter dominance periods shorten ($T}^{\prime}\to {T}^{\prime}\mathrm{\Delta}{T}^{\prime$), consistent with the contemporary formulation of Levelt III [Brascamp et al., 2015].
Decision pools
We wish to analyze steadystate conditions for decision pools $R$, ${R}^{\prime}$, as illustrated in Appendix 1—figure 4a,b. From Equation 4, we can write
Under certain conditions – in particular, for sufficient selfcoupling $w}_{\mathit{c}\mathit{o}\mathit{o}\mathit{p}$ – the steadystate equations admit more than one solution: a lowactivity fixed point with ${{r}^{\mathrm{\prime}}}_{\mathrm{\infty}}\simeq 0$, and a highactivity fixed point with ${r}_{\mathrm{\infty}}\simeq 1$. Importantly, the lowactivity fixed point can be destabilized when evidence activities change.
Consider a nondominant decision pool $R$ with fractional activity $r={n}_{r}/N\simeq 0$ and its dominant rival pool ${R}^{\prime}$ with fractional activity ${r}^{\mathrm{\prime}}={n}_{{r}^{\mathrm{\prime}}}/N\simeq 1$. The steadystate condition then becomes
For certain values $x}_{\mathit{e}\mathit{f}\mathit{f}}\le {x}_{crit$, the lowactivity fixed point becomes unstable, causing a sudden upward activation of pool $R$ and eventually a perceptual reversal. We call $r}_{\mathit{c}\mathit{r}\mathit{i}\mathit{t}$ the steadystate value of $r}_{\mathrm{\infty}$ at the point of disappearance.
We can now define a threshold $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$ in terms of the value of evidence bias $\mathrm{\Delta}e=e{e}^{\prime}$ which ensures that $x}_{\mathit{e}\mathit{f}\mathit{f}}\ge {x}_{crit$:
We find that the threshold value $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$ decreases linearly with average evidence $\overline{e}=(e+{e}^{\mathrm{\prime}})/2$, so that higher evidence activity necessarily entails lower thresholds (dashed red line in Figure 5c).
For ${w}_{\mathit{c}\mathit{o}\mathit{o}\mathit{p}}=15.21$, we find ${x}_{\mathit{c}\mathit{r}\mathit{i}\mathit{t}}=0.24006$, ${r}_{\mathit{c}\mathit{r}\mathit{i}\mathit{t}}=0.0708$, and $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}}=0.45541.1564\phantom{\rule{thinmathspace}{0ex}}\overline{e$.
Potential landscape
In Figure 5b, we illustrate the steadystate condition ${r}_{\mathrm{\infty}}=\varphi \left[{w}_{\mathit{c}\mathit{o}\mathit{o}\mathit{p}}\left({r}_{\mathrm{\infty}}{x}_{\mathit{e}\mathit{f}\mathit{f}}\right)\right]$ in terms of an effective potential landscape $U(x)$. The functional form of this landscape was obtained by integrating ‘restoring force’ $F(x)$ over activity $x$:
Stochastic dynamics
Poissonlike variability
The discretely stochastic process $x(t)\in \{0,\frac{1}{N},\frac{2}{N},...,1\}$ has a continuously stochastic ‘diffusion limit,’ ${x}_{\mathit{d}\mathit{i}\mathit{f}\mathit{f}}(t)$, for $N\to \mathrm{\infty}$, with identical mean $\u27e8{x}_{\mathit{d}\mathit{i}\mathit{f}\mathit{f}}\u27e9=\u27e8x\u27e9$ and variance $\u27e8{{x}_{\mathit{d}\mathit{i}\mathit{f}\mathit{f}}}^{2}\u27e9\u27e8{x}_{\mathit{d}\mathit{i}\mathit{f}\mathit{f}}{\u27e9}^{2}=\u27e8{x}^{2}\u27e9\u27e8x{\u27e9}^{2}$. This diffusion limit is a Cox–Ingersoll process and its dynamical equation
where $\xi (t)$ is white noise, reveals that its increments $N\text{}{\dot{x}}_{\mathit{d}\mathit{i}\mathit{f}\mathit{f}}$ (and thus also the increments of the original discrete process) exhibit Poissonlike variability. Specifically, in the lowactivity regime, ${x}_{\mathit{d}\mathit{i}\mathit{f}\mathit{f}}\ll 1$, both mean and variance of increments approximate activation rate $N\text{}{\nu}^{+}$:
Gammadistributed firstpassage times
When the input to a pool of bistable variables undergoes a step change, the active fraction $x(t)$ transitions stochastically between old and new steady states, $x}_{\mathrm{\infty}}^{\mathit{o}\mathit{l}\mathit{d}$ and $x}_{\mathrm{\infty}}^{\mathit{n}\mathit{e}\mathit{w}$ (set by old and new input values, respectively). The time that elapses until fractional activity crosses an intermediate ‘threshold’ level $\theta$ ($x}_{\mathrm{\infty}}^{\mathit{o}\mathit{l}\mathit{d}}<\theta <{x}_{\mathrm{\infty}}^{\mathit{n}\mathit{e}\mathit{w}$) is termed a ‘firstpassagetime.’ In a lowthreshold regime, birthdeath processes exhibit a particular and highly unusual distribution of firstpassage times.
Specifically, the distribution of firstpassagetimes assumes a characteristic, gammalike shape for a wide range of value triplets ($x}_{\mathrm{\infty}}^{\mathit{o}\mathit{l}\mathit{d}$, $\theta$, $x}_{\mathrm{\infty}}^{\mathit{n}\mathit{e}\mathit{w}$) [Cao et al., 2014]: skewness $\gamma}_{1$ takes a stereotypical value $\gamma}_{1}\simeq 2{c}_{V$, the coefficient of variation $c}_{V$ remains constant (as long as the distance between $x}_{\mathrm{\infty}}^{\mathit{o}\mathit{l}\mathit{d}$ and $\theta$ remains the same), whereas the distribution mean may assume widely different values. This gammalike distribution shape is maintained even when shared input changes during the transition (e.g., when bistable variables are coupled to each other) [Cao et al., 2014].
Importantly, only a birthdeath process (e.g., a pool of bistable variables) guarantees a gammalike distribution of firstpassagetimes under different input conditions [25]. Many other discretely stochastic processes (e.g., Poisson process) and continuously stochastic processes (e.g., Wiener, Ornstein–Uhlenbeck, Cox–Ingersoll) produce inverse Gaussian distributions with $\gamma}_{1}\simeq 3\text{}{c}_{v$. Models combining competition, adaptation, and noise can produce gammalike distributions, but require different parameter values for every input condition (see Materials and methods: Alternative model).
Scaling property
In the present model, firstpassagetimes reflect the concurrent development of two opponent birthdeath processes (pools of $N=25$ binary variables). Dominance periods begin with newly nondominant evidence $e$ well below newly dominant evidence $e}^{\mathrm{\prime}$, $\mathrm{\Delta}e=e{e}^{\prime}\simeq {\mathrm{\Delta}}_{rev}$, and end with the former well above the latter, $\mathrm{\Delta}e\simeq +{\mathrm{\Delta}}_{rev}$ (Appendix 1—figure 6a). The combination of two small pools with $N=25$ approximates a single large pool with $N=25$. When image contrast changes, distribution shape remains nearly the same, with a coefficient of variation ${c}_{V}\simeq 0.6$ and a gammalike skewness ${\gamma}_{1}\simeq 2$, even though mean $\mu$ of firstpassagetimes changes substantially (Appendix 1—figure 6b).
This ‘scaling property’ (preservation of distribution shape) is owed to the Poissonlike variability of birthdeath processes (see above, Appendix 1—figure 6c). Poissonlike variability implies that accumulation rate $\mu$ and dispersion rate $\sigma}^{2$ are proportional, $\mu \sim {\sigma}^{2}$. This proportionality ensures that activity at threshold disperses equally widely for different accumulation rates (i.e., for different input strengths), preserving the shape of firstpassagetime distributions [Cao et al., 2016].
Characteristic times
As mentioned previously, the characteristic times of pools of bistable variables are not fixed but vary with input (Equation 2). In our model, the characteristic times of evidence activities lengthen with increasing input contrast and shorten with feedback suppression (Appendix 1—figure 7a). Characteristic times are reflected also in the temporal autocorrelation, which averages over periods of dominance and nondominance alike. Autocorrelation times lengthen with increasing input contrast, both in absolute terms and relative to the average dominance duration (Appendix 1—figure 7b).
Importantly, the autocorrelation time of mean evidence activity $\overline{e}=(e+{e}^{\prime})/2$ is even longer, particularly for high input contrast (Appendix 1—figure 7c). The reason is that spontaneous fluctuations of $\overline{e}$ are constrained not only by birthdeath dynamics, but additionally by the reversal dynamics that keeps evidence activities $e$ and $e}^{\mathrm{\prime}$ close together (i.e., within reversal threshold $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$). As a result, the characteristic timescale of spontaneous fluctuations of $\overline{e}$ lengthens with input contrast. The amplitude of such fluctuations also grows with contrast (not shown).
The slow fluctuations of $\overline{e}$ induce mirrorimage fluctuations of reversal threshold $\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$ and thus are responsible for the serial dependency of reversal sequences (see Deterministic dynamics: Decision pools).
Burstiness
The proposed mechanism predicts that reversal sequences include episodes with several successive short (or long) dominance periods. It further predicts that this inhomogeneity increases with image contrast. Such an inhomogeneity may be quantified in terms of a ‘burstiness index’ (BI), which compares the variability of the mean for sets of $n$ successive periods to the expected variability for randomly shuffled reversal sequences. In both model and experimental observations, this index rises far above chance (over broad range of $n$) for high image contrast (Appendix 1—figure 8). The degree of inhomogeneity expressed by the model at high image contrast is comparable to that observed experimentally, even though the model was neither designed nor fitted to reproduce nonstationary aspects of reversal dynamics. This correspondence between model and experimental observation compellingly corroborates the proposed mechanism.
Robustness of fit
The parameter values associated with the global minimum of the fit error define the model used throughout the article. As described in Materials and methods, we explored the vicinity of this parameter set by individually varying each parameter within a certain neighborhood. This allowed us to estimate 95% confidence intervals for each parameter value. The results are illustrated in Appendix 1—figure 9.
Note that optimal parameter values (red crosses) are consistently near extrema of the parabolic fits (green circles), indicating the robustness of the fit. Note further that instead of the parameter pair $w}_{\mathit{v}\mathit{i}\mathit{s}$ and $u}_{e}^{0$, we show the related parameter pair $\alpha$ and $\beta$, which is defined through the relations ${w}_{\mathit{v}\mathit{i}\mathit{s}}=\alpha \mathrm{ln}((1+\gamma )/\gamma )$ and ${u}_{e}^{0}=\alpha \mathrm{ln}\gamma +\beta$.
The code used to analyze optimization statistics is available in the folder ‘analyzeOptimizationStatistics’ of the Github repository provided with this article (https://github.com/mauriziomattia/2021.BistablePerceptionModel) copy archived at.
Data availability
Source data is provided for Figures 2 and 3. Source code for the binocular rivalry model is provided in a Github repository (https://github.com/mauriziomattia/2021.BistablePerceptionModel) copy archived at https://archive.softwareheritage.org/swh:1:rev:f70e9e45ddb64cef7fc9a3ea57f0b7a04dfc6729.
References

A hierarchical stochastic model for bistable perceptionPLOS Computational Biology 13:e1005856.https://doi.org/10.1371/journal.pcbi.1005856

A lowdimensional model of binocular rivalry using winnerless competitionPhysica D. Nonlinear Phenomena 239:529–538.https://doi.org/10.1016/j.physd.2009.06.018

Multistability in perceptionScientific American 225:63–71.https://doi.org/10.1038/scientificamerican127162

The proactive brain: memory for predictionsPhil. Trans. R. Soc. Lond. B 364:1235–1243.https://doi.org/10.1098/rstb.2008.0310

Stochastic properties of stabilizedimage binocular rivalry alternationsJournal of Experimental Psychology 88:327–332.https://doi.org/10.1037/h0030877

Spatial zones of binocular rivalry in central and peripheral visionVisual Neuroscience 8:469–478.https://doi.org/10.1017/s0952523800004971

The time course of binocular rivalry reveals a fundamental role of noiseJournal of Vision 6:1244–1256.https://doi.org/10.1167/6.11.8

Neural field model of binocular rivalry wavesJournal of Computational Neuroscience 32:233–252.https://doi.org/10.1007/s108270110351y

Collective activity of many bistable assemblies reproduces characteristic dynamics of multistable perceptionThe Journal of Neuroscience 36:6957–6972.https://doi.org/10.1523/JNEUROSCI.462615.2016

Normalization as a canonical neural computNature Reviews. Neuroscience 13:51–62.https://doi.org/10.1038/nrn3136

Stimulus onset quenches neural variability: a widespread cortical phenomenonNature Neuroscience 13:369–378.https://doi.org/10.1038/nn.2501

Neural mechanisms for interacting with a world full of action choicesAnnual Review of Neuroscience 33:269–298.https://doi.org/10.1146/annurev.neuro.051508.135409

Multiscale variability in neuronal competitionCommun. Biol 2:319–330.https://doi.org/10.1038/s4200301905557

Temporally irregular mnemonic persistent activity in prefrontal neurons of monkeys during a delayed response taskJournal of Neurophysiology 90:3441–3454.https://doi.org/10.1152/jn.00949.2002

Inferring cortical variability from local field potentialsThe Journal of Neuroscience 36:4121–4135.https://doi.org/10.1523/JNEUROSCI.250215.2016

Perceptual rivalry with vibrotactile stimuliAtten. Percept. & Psychophys 22:2278.https://doi.org/10.3758/s13414021022781

A hierarchical model of binocular rivalryNeural Computation 10:1119–1135.https://doi.org/10.1162/089976698300017377

Attention, shortterm memory, and action selection: a unifying theoryProgress in Neurobiology 76:236–256.https://doi.org/10.1016/j.pneurobio.2005.08.004

Weber’s law in decision making: integrating behavioral data in humans with a neurophysiological modelThe Journal of Neuroscience 27:11192–11200.https://doi.org/10.1523/JNEUROSCI.107207.2007

Neural network mechanisms underlying stimulus driven variability reductionPLOS Computational Biology 8:e1002395.https://doi.org/10.1371/journal.pcbi.1002395

Microstimulation of visual cortex affects the speed of perceptual decisionsNature Neuroscience 6:891–898.https://doi.org/10.1038/nn1094

Perspective: Maximum caliber is a general variational principle for dynamical systemsThe Journal of Chemical Physics 148:010901.https://doi.org/10.1063/1.5012990

Balanced neural architecture and the idling brainFrontiers in Computational Neuroscience 8:56.https://doi.org/10.3389/fncom.2014.00056

Stochastic properties of binocular rivalry alternationsPerception & Psychophysics 2:432–436.https://doi.org/10.3758/BF03208783

The freeenergy principle: a unified brain theory?Nature Reviews. Neuroscience 11:127–138.https://doi.org/10.1038/nrn2787

Neural substrate of dynamic Bayesian inference in the cerebral cortexNature Neuroscience 19:1682–1689.https://doi.org/10.1038/nn.4390

Bistable perception modeled as competing stochastic integrations at two levelsPLOS Computational Biology 5:e1000430.https://doi.org/10.1371/journal.pcbi.1000430

The neural basis of decision makingAnnual Review of Neuroscience 30:535–574.https://doi.org/10.1146/annurev.neuro.29.051605.113038

Visual decisionmaking in an uncertain and dynamic worldAnnual Review of Vision Science 3:227–250.https://doi.org/10.1146/annurevvision111815114511

Perceptions as hypothesesPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 290:181–197.https://doi.org/10.1098/rstb.1980.0090

Reactionrate theory: fifty years after kramersReviews of Modern Physics 62:251–341.https://doi.org/10.1103/RevModPhys.62.251

Once upon a (slow) time in the land of recurrent neuronal networks…Current Opinion in Neurobiology 46:31–38.https://doi.org/10.1016/j.conb.2017.07.003

Size matters: A study of binocular rivalry dynamicsJournal of Vision 9:17.https://doi.org/10.1167/9.1.17

Ehrenfest Urn ModelsJournal of Applied Probability 2:352–376.https://doi.org/10.1017/S0021900200108708

The role of the primary visual cortex in perceptual suppression of salient visual stimuliThe Journal of Neuroscience 30:12353–12365.https://doi.org/10.1523/JNEUROSCI.067710.2010

Stochastic resonance in binocular rivalryVision Research 46:392–406.https://doi.org/10.1016/j.visres.2005.08.009

Shifts in selective visual attention: Towards the underlying neural circuitryHuman Neurobiology 4:219–227.

Cognitive computational neuroscienceNature Neuroscience 21:1148–1160.https://doi.org/10.1038/s4159301802105

Coding perceptual decisions: From single units to emergent signaling properties in cortical circuitsAnnual Review of Vision Science 6:387–409.https://doi.org/10.1146/annurevvision030320041223

Cortical computations via metastable activityCurrent Opinion in Neurobiology 58:37–45.https://doi.org/10.1016/j.conb.2019.06.007

A spiking neuron model for binocular rivalryJournal of Computational Neuroscience 12:39–53.https://doi.org/10.1023/a:1014942129705

Hierarchical Bayesian inference in the visual cortexJournal of the Optical Society of America. A, Optics and Image Science 20:1434–1448.https://doi.org/10.1364/JOSAA.20.001434

An astable multivibrator model of binocular rivalryPerception 17:215–228.https://doi.org/10.1068/p170215

BookBrain Mechanisms of Visual Awareness Using Perceptual Ambiguity to Investigate the Neural Basis of Image Segmentation and Grouping. Ph.D. ThesisHouston, Texas: Baylor College of Medicine.

Multistable phenomena: changing views in perceptionTrends Cogn. Sci 3:254–264.https://doi.org/10.1016/s13646613(99)013327

Stable perception of visually ambiguous patternsNature Neuroscience 5:605–609.https://doi.org/10.1038/nn0602851

A functional theory of bistable perception based on dynamical circular inferencePLOS Computational Biology 16:e1008480.https://doi.org/10.1371/journal.pcbi.1008480

Slow dynamics and high variability in balanced cortical networks with clustered connectionsNature Neuroscience 15:1498–1505.https://doi.org/10.1038/nn.3220

Single units and conscious visionPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 353:1801–1818.https://doi.org/10.1098/rstb.1998.0333

BookResponse Times: Their Role in Inferring Elementary Mental OrganizationNew York: Oxford University Press.

Bayesian inference with probabilistic population codesNature Neuroscience 9:1432–1438.https://doi.org/10.1038/nn1790

Spiking networks for Bayesian inference and choiceCurrent Opinion in Neurobiology 18:217–222.https://doi.org/10.1016/j.conb.2008.07.004

Perception of temporally interleaved ambiguous patternsCurrent Biology 13:1076–1085.https://doi.org/10.1016/s09609822(03)004147

Heterogeneous attractor cell assemblies for motor planning in premotor cortexThe Journal of Neuroscience 33:11155–11168.https://doi.org/10.1523/JNEUROSCI.466412.2013

Dynamics of multistable states during ongoing and evoked cortical activityThe Journal of Neuroscience 35:8214–8231.https://doi.org/10.1523/JNEUROSCI.481914.2015

Modelling the emergence and dynamics of perceptual organisation in auditory streamingPLOS Computational Biology 9:e1002925.https://doi.org/10.1371/journal.pcbi.1002925

Itinerancy between attractor states in neural systemsCurrent Opinion in Neurobiology 40:14–22.https://doi.org/10.1016/j.conb.2016.05.005

Scene construction, visual foraging, and active inferenceFrontiers in Computational Neuroscience 10:1–16.https://doi.org/10.3389/fncom.2016.00056

Noiseinduced alternations in an attractor network model of perceptual bistabilityJournal of Neurophysiology 98:1125–1139.https://doi.org/10.1152/jn.00116.2007

LXI. Observations on some remarkable optical phænomena seen in Switzerland; and on an optical phænomenon which occurs on viewing a figure of a crystal or geometrical solidThe London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 1:329–337.https://doi.org/10.1080/14786443208647909

Buildup and bistability in auditory streaming as an evidence accumulation process with saturationPLOS Computational Biology 16:e1008152.https://doi.org/10.1371/journal.pcbi.1008152

The neocortex. An overview of its evolutionary development, structural organization and synaptologyAnatomy and Embryology 190:307–337.https://doi.org/10.1007/BF00187291

The active construction of the visual worldNeuropsychologia 104:92–101.https://doi.org/10.1016/j.neuropsychologia.2017.08.003

Uncertainty, epistemics and active inferenceJournal of the Royal Society, Interface 14:1–10.https://doi.org/10.1098/rsif.2017.0376

Dynamic Causal Modelling of Active VisionThe Journal of Neuroscience 39:6265–6275.https://doi.org/10.1523/JNEUROSCI.245918.2019

Perceptual reversals need no prompting by attentionJournal of Vision 7:5.https://doi.org/10.1167/7.10.5

A shortterm memory of multistable perceptionJournal of Vision 8:7.https://doi.org/10.1167/8.13.7

Sensory memory of structurefrommotion is shapespecificAttention, Perception, & Psychophysics 75:1215–1229.https://doi.org/10.3758/s1341401304718

Multistable perception balances stability and sensitivityFrontiers in Computational Neuroscience 7:17.https://doi.org/10.3389/fncom.2013.00017

Sensory memory of illusory depth in structurefrommotionAtten. Percept. & Psychophys 76:123–132.https://doi.org/10.3758/s1341401305573

Perception and the strongest sensory memory trace of multistable displays both form shortly after stimulus onsetAtten. Percept. & Psychophys 78:674–684.https://doi.org/10.3758/s1341401510044

BookProbabilistic Reasoning in Intelligent Systems: Networks of Plausible InferenceMorgan Kaufmann.

Hierarchical active inference: a theory of motivated controlTrends Cogn. Sci 22:294–306.https://doi.org/10.1016/j.tics.2018.01.009

Dynamics of cortical neuronal ensembles transit from decision making to storage for later reportThe Journal of Neuroscience 32:11956–11969.https://doi.org/10.1523/JNEUROSCI.617611.2012

Probabilistic brains: knowns and unknownsNature Neuroscience 16:1170–1178.https://doi.org/10.1038/nn.3495

Principles of maximum entropy and maximum caliber in statistical physicsRev. Modern Phys 85:1115–1141.https://doi.org/10.1103/RevModPhys.85.1115

Perceptual organization in multistable apparent motionPerception 14:135–143.https://doi.org/10.1068/p140135

A comparison of sequential sampling models for twochoice reaction timePsychological Review 111:333–367.https://doi.org/10.1037/0033295X.111.2.333

The Normalization Model of AttentionJournal of Cognitive Neuroscience 61:168–185.https://doi.org/10.1016/j.neuron.2009.01.002

Decoding subjective decisions from orbitofrontal cortexNature Neuroscience 19:973–980.https://doi.org/10.1038/nn.4320

The spatial structure of correlated neuronal variabilityNature Neuroscience 20:107–114.https://doi.org/10.1038/nn.4433

BookFigure and groundIn: Beardslee DC, Wertheimer M, editors. Readings in Perception. Van Nostrand. pp. 194–203.

Mechanisms of visual attention in the human cortexAnnual Review of Neuroscience 23:315–341.https://doi.org/10.1146/annurev.neuro.23.1.315

Emergence of slowswitching assemblies in structured neuronal networksPLOS Computational Biology 11:e1004196.https://doi.org/10.1371/journal.pcbi.1004196

Theory and dynamics of derceptual bistabilityAdv. Neural Inf. Proc. Sys 19:1217–1224.https://doi.org/10.7551/mitpress/7503.003.0157

Multistability in perception: Binding sensory modalities, an overviewPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 367:896–905.https://doi.org/10.1098/rstb.2011.0254

Role of mutual inhibition in binocular rivalryJournal of Neurophysiology 106:2136–2150.https://doi.org/10.1152/jn.00228.2011

Neural Elements for Predictive CodingFrontiers in Psychology 7:1–21.https://doi.org/10.3389/fpsyg.2016.01792

Balance between noise and adaptation in competition models of perceptual bistabilityJournal of Computational Neuroscience 27:37–54.https://doi.org/10.1007/s1082700801253

Dynamics of random neural networks with bistable unitsPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 90:062710.https://doi.org/10.1103/PhysRevE.90.062710

Stochastic processes in reversing figure perceptionPercept. & Psychophys 16:9–27.https://doi.org/10.3758/BF03203243

Untersuchungen zur Lehre von der GestaltPsychologische Forschung 7:81–136.https://doi.org/10.1007/BF00410640

Toward an interpretation of dynamic neural activity in terms of chaotic dynamical systemsThe Behavioral and Brain Sciences 24:793–810.https://doi.org/10.1017/s0140525x01000097

Stochastic variations in sensory awareness are driven by noisy neuronal adaptation: evidence from serial correlations in perceptual bistabilityJournal of the Optical Society of America. A, Optics and Image Science 26:2612–2622.https://doi.org/10.1364/JOSAA.26.002612

BookStochastic Processes in Physics and ChemistryAmsterdam: NorthHolland Physics Publishing.

Stochastic properties of binocular rivalry alternationsPerception & Psychophysics 18:467–473.https://doi.org/10.3758/BF03204122

Neural dynamics and circuit mechanisms of decisionmakingCurrent Opinion in Neurobiology 22:1039–1046.https://doi.org/10.1016/j.conb.2012.08.006

a predictive coding account of bistable perception  a modelbased fMRI studyPLOS Computational Biology 13:e1005536.https://doi.org/10.1371/journal.pcbi.1005536

Zeitschrift für Psychologie mit Zeitschrift für angewandte PsychologieExperimentelle Studien Über Das Sehen von Bewegung 61:161–165.

Contributions to the physiology of vision: On some remarkable, and hitherto unobserved, phenomena of binocular visionPhilosophical Transactions of the Royal Society A 128:371–394.

The neural site of binocular rivalry relative to the analysis of motion in the human visual systemThe Journal of Neuroscience 10:3880–3888.

Human V4 and ventral occipital retinotopic mapsVisual Neuroscience 32:E020.https://doi.org/10.1017/S0952523815000176

Multistability in auditory stream segregation: A predictive coding viewPhilosophical Transactions of the Royal Society B 367:1001–1012.https://doi.org/10.1098/rstb.2011.0359

RivalryLike Neural Activity in Primary Visual Cortex in Anesthetized MonkeysThe Journal of Neuroscience 36:3231–3242.https://doi.org/10.1523/JNEUROSCI.366015.2016

Longrange traveling waves of activity triggered by local dichoptic stimulation in V1 of behaving monkeysJournal of Neurophysiology 112:18–22.https://doi.org/10.1152/jn.00610.2013

Theoretical perspectives on active sensingCurrent Opinion in Behavioral Sciences 11:100–108.https://doi.org/10.1016/j.cobeha.2016.06.009

Vision as Bayesian inference: analysis by synthesis?Trends in Cognitive Sciences 10:301–308.https://doi.org/10.1016/j.tics.2006.05.002
Decision letter

John RinzelReviewing Editor; Courant Institute of Mathematical Sciences, New York University, United States

Joshua I GoldSenior Editor; University of Pennsylvania, United States

James RankinReviewer

Peter Christiaan KlinkReviewer; Netherlands Institute for Neuroscience, Netherlands
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
In a computational model of perceptual bistability, here binocular rivalry, a novel hierarchical structure includes an evidence accumulation stage and a competition stage that together support continuous decisionmaking and perceptual alternations, and account for the statistics of percept durations and serial dependencies, as well as Levelt's four propositions in the study's behavioral data. Of particular interest, feedback inhibition from the perceptual competition to the sensory evidence accumulation stage provides a gating mechanism on sensory units so that decisionmaking is based on evidence against the current percept.
Decision letter after peer review:
Thank you for submitting your article "Instability of visual perception reveals the dynamics of decisionmaking in a volatile world" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Joshua Gold as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: James Rankin (Reviewer #1); Chris Klink (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
We would like to draw your attention to changes in our revision policy that we have made in response to COVID19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below address clarity and presentation, including some toning down of overgeneralization.
This paper presents valuable advances to the modeling of perceptual bistability, here applied to binocular rivalry. The novel hierarchical structure includes an evidence accumulation stage and a competition stage that together support continuous decisionmaking and perceptual alternations. Importantly, the study is used to account for the statistics of percept durations and as well as Levelt's four propositions in the behavioral data.
Summary:
This paper presents an original hierarchical model for the perceptual dynamics of binocular rivalry. The model separates a stage of perceptual evidence accumulation from a stage of dominance competition and implements feedback inhibition from the competition to the evidence accumulation stage. This feedback provides a gating mechanism on sensory units so that decisionmaking is based on evidence against the current percept. The study demonstrates a potential neural mechanism that builds on previous models of decisionmaking and rivalry. Decision making in the model is continuous and leads to alternations that mimic statistical properties of behavioral perceptual bistability, including correlations from percept to percept. The authors effectively extract explanatory value from their model and its mechanisms, using analytic approximations based on statistical dynamics.
The study brings together, in the context of binocular rivalry, two existing strands of modelling perceptual bistability: capturing the statistical properties of dominance durations (a universal distribution shape independent of stimulus parameters, correlations etc) and capturing their dependence on input strengths (Levelt's propositions for binocular rivalry). The results presented go substantially beyond a longestablished modelling literature.
The paper is well written and the figures clearly presented. The core results are described in a suitable style so as to keep the paper accessible to a wide readership. The later and supporting figures become more technical but this is necessary to draw out a deeper understanding as to why the model works.
We offer the following suggestions to improve the manuscript.
Essential revisions:
1. The presentation has a tone of overgeneralized claims. Please keep closer to the Results – to what has been done. For example, statements about normative constraints and volatile world can be toned down. The paper does not directly assess the model's performance or autoadjustment in dynamic and unpredictable ('volatile') environments as would be with normative modeling. Such statements do not reflect primary results and should be restricted primarily to the Discussion. They should not be in the paper's title. Please have the title refer to Binocular Rivalry. Also note: the journal eLife discourages 2part titles with ":".
Additional data/experiments are not required if the authors drop the overgeneralizations. However, if they do want to make more general claims, these should be backed up.
2. Re: Robustness of the simulation results. Some indication of model robustness should be provided to demonstrate whether all these nicely fitting dynamics do crucially depend on a precise set of parameters. Presumably in different parameter regions the model could also produce negative lag2 correlations. Is it not just luck that the bestfit fell in a region with positive lag2 correlations as in the data. On the other hand, are the authors able to show that if the lag2 data is included in the fit function for the alternative model it is incapable of producing the positive correlations? These issues could be addressed in the Discussion.
Additional data/experiments are not required if the authors drop the overgeneralizations. However, if they do want to make more general claims, these should be backed up..
3. Re: perceptual bistability is restricted here to binocular rivalry. Topics for the Discussion:
a. The occurrence of piecemeal or mixed percepts are not addressed. In the experiments, such perhaps are identified (with released keys) but they're not analyzed, nor modeled. Instead, bistable assemblies are assumed binary. Phenomenologically, this is definitely not the case and given existing research on mixed percepts, this should at least be discussed.
b. How does the model deal with other known binocular rivalry phenomena like flash suppression, or priming, intermittent presentation, or different timescales? These are phenomena that were previously modeled with a strong role for adaptation. Are there any unexpected prediction the model makes beyond currently known binocular rivalry dynamics that can be tested experimentally?
c. Can the authors discuss their framework in the context of the visual system and competition between binocular and feature representations of the stimuli (in the vein of Wilson 2003)?
d. Another hierarchical model (Li, H.H.; Rankin, J.; Rinzel, J.; Carrasco, M. and Heeger, D. Attention model of binocular rivalry P Natl Acad Sci, 2017, 114), building on the framework proposed by the cited Wilson 2003 study, proposes a descending featurebased excitation as a proxy for exogenous attention (in contrast with feedback suppression proposed here). Can the authors comment on the features of their data that this alternative mechanism might fail to capture. How could the effects of attention (or its absence) be explained by the model presented here?
4. Modeling aspects:
a. Perceptual bistability for perceptual grouping (as for ambiguous auditory stimuli) has been modeled explicitly as evidence accumulation against the current percept; please discuss and relate your approach to these: Barniv and Nelken (2015), Nguyen, Rinzel and Curtu (2020)
b. The model assumes two sets of independent bistable feature detectors. Please relate this assumption to potential brain areas and evidence that units with similar receptive fields may operate independently, as uncoupled units. (c) What other experiments on decision making (perceptual or otherwise) might this framework be able to speak to?
5. Re: specific conclusions: From one reviewer: Whilst I clearly understand what has been achieved I found it hard to pin down the specific conclusions from this study. State them explicitly, at the end of the discussion? Do not omit more specific conclusions based on this model and binocular rivalry before any other more general conclusions that draw from a wider literature.
6. The authors should post/link their model codes to a welldocumented software repository. Interested readers could then see the model in action and inspect what happens internally.
https://doi.org/10.7554/eLife.61581.sa1Author response
Essential revisions:
1. The presentation has a tone of overgeneralized claims. Please keep closer to the Results – to what has been done. For example, statements about normative constraints and volatile world can be toned down. The paper does not directly assess the model's performance or autoadjustment in dynamic and unpredictable ('volatile') environments as would be with normative modeling. Such statements do not reflect primary results and should be restricted primarily to the Discussion. They should not be in the paper's title. Please have the title refer to Binocular Rivalry. Also note: the journal eLife discourages 2part titles with ":".
Additional data/experiments are not required if the authors drop the overgeneralizations. However, if they do want to make more general claims, these should be backed up.
Title and abstract were revised as follows:
“Binocular rivalry reveals an outof equilibrium neuronal dynamics suited for decisionmaking”.
“In ambiguous or conflicting sensory situations, perception is often ‘multistable’ in that it perpetually changes at irregular intervals, shifting abruptly between distinct alternatives. […] Thus, multistable perception may reflect decisionmaking in a volatile world: integrating evidence over space and time, choosing categorically between hypotheses, while concurrently evaluating alternatives.”
2. Re: Robustness of the simulation results. Some indication of model robustness should be provided to demonstrate whether all these nicely fitting dynamics do crucially depend on a precise set of parameters.
For each parameter value, we now provide confidence intervals (—Appendix—figure 9), which we obtained by establishing the dependence of fit error on individual parameter values (while other parameters remained fixed). Typically, fit error increases parabolically around the optimal parameter value.
This is now described in the new section "Robustness of fit" in the Supplementary Methods.
Red crosses represent the optimal model parameters while green circles are associated to the extrema of the quadratic fit of the computed fit error obtained by varying in a suited range the single eleven parameters. Dotted lines are the 95% confidence intervals of the fit. For each parameter, 30 nearby equally spaced values have been tested. This allowed us to estimate the error of the parameter values found from the original fitting procedure. Note that instead of the parameters $w}_{\text{vis}$ and $u}_{0\text{}}^{e$ we show α and β which are directly related parameters: $w}_{\text{vis}}=\text{}\alpha \mathrm{ln}\frac{1+\gamma}{\gamma$ and ${u}_{0\text{}}^{e}=\text{}\alpha \text{}ln\text{}\gamma +\text{}\beta$.
Presumably in different parameter regions the model could also produce negative lag2 correlations. Is it not just luck that the bestfit fell in a region with positive lag2 correlations as in the data.
The model consistently predicts positive lag2 correlations, at least in parameter regimes that reproduce Levelt's propositions (dependence of dominance durations on image contrast). As explained in Results under 'Serial dependence', these correlations result from an interaction between evidence and decision level dynamics.
Specifically, dominance periods reflect the current reversal threshold (defined as the expected value of at reversal time), because a longer (shorter) dominance period is needed to reach a higher (lower) threshold. In turn, reversal threshold tracks the slow fluctuations of combined evidence activity: $\overline{e}=(e+{e}^{\prime})/2$ more (less) evidence activity entails a lower (higher) reversal threshold$\mathrm{\Delta}}_{\mathit{r}\mathit{e}\mathit{v}$
As shown in Supplementary Figure 13, the autocorrelation times of $\overline{e}$ (and ) are substantially longer than mean dominance periods, especially for large image contrasts. Due to this long autocorrelation, both reversal thresholds and dominance durations are strictly positively correlated over successive reversals.
On the other hand, are the authors able to show that if the lag2 data is included in the fit function for the alternative model it is incapable of producing the positive correlations? These issues could be addressed in the Discussion.
Additional data/experiments are not required if the authors drop the overgeneralizations. However, if they do want to make more general claims, these should be backed up.
The alternative model, which includes two independent adaptive states and ' consistently produced positive lag1 correlations and negative lag2 correlations for all combinations of image contrast fitted to reproduce the moments of dominance distributions (mean, CV, skewness).
In a Laing and Chowtype model, states and ' develop deterministically in an opponent fashion, in that an increase of one state necessarily entails a decrease of the other. Adaptive bias ' represents a negative memory of recent dominance history. At the end of a long period of dominance, from this bias for the new percept. During the new dominance period, bias will diminish but will not necessarily change direction. Negative lag2 correlations result when the direction of persists to the next reversal, so that it once again disfavours the old (original) percept.
In our experience, the prediction of positive lag1 and negative lag2 correlations is robust and statistically significant, provided sufficiently long reversal sequences are generated. Parenthetically, we note that we gained extensive experience in reproducing moments of observed dominance distributions with Laing and Chowtype models in the context of other studies (102; Ziman et al., 2021: https://www.biorxiv.org/content/10.1101/2021.02.11.430816v1).
In fact, Laing and Chow (2002) predicted positive and negative correlations at lag1 and lag2, respectively (see their Figure 5), but chose to simulate only a small number of reversals, so that these correlations failed to reach significance.
However, we do not consider the present paper a good venue for an in depth evaluation of Laing and Chowtype models.
3. Re: perceptual bistability is restricted here to binocular rivalry. Topics for the Discussion:
a. The occurrence of piecemeal or mixed percepts are not addressed. In the experiments, such perhaps are identified (with released keys) but they're not analyzed, nor modeled. Instead, bistable assemblies are assumed binary. Phenomenologically, this is definitely not the case and given existing research on mixed percepts, this should at least be discussed.
Discussion, page 25:
“Binocular rivalry arises within local regions of the visual field, measuring approximately 0.25 deg in the fovea [Leopold, 1997; Logothetis, 1998]. […] To account for these phenomena, the visual field would have to be tiled with replicant models linked by grouping interactions [Knapen et al., 2007; Bressloff et al., 2012].”
b. How does the model deal with other known binocular rivalry phenomena like flash suppression, or priming, intermittent presentation, or different timescales? These are phenomena that were previously modeled with a strong role for adaptation.
In response, the discussion was extended by the following sections:
Discussion, page 24:
“Unlike many previous models [e.g. Laing and Chow, 2002; Wilson, 2007; MorenoBote et al., 2007; MorenoBote et al., 2010; Cohen et al., 2019], the proposed mechanism does not include adaptation (stimulationdriven weakening of evidence), but a phenomenologically similar feedback suppression (perceptiondriven weakening of evidence). […] The extent to which a perceptiondriven suppression could also explain these observations remains an open question for future work.”
Discussion, page 25:
“Multistable perception induces a positive priming or ‘sensory memory’ [Pearson et al. 2005; Pastukhov et al., 2008, Pastukhov et al., 2013a], which can stabilize a dominant appearance during intermittent presentation [Leopold et al., 2003; Maier et al., 2003; Sandberg et al., 2014]. This positive priming exhibits rather different characteristics (e.g., shape, size and motion specificity, inducement period, persistence period) than the negative priming / adaptation of rivalling representations [De Jong et al., 2012; Pastukhov et al., 2013a, 2013b, 2014a, 2014b, 2016]. […] To incorporate sensory memory, the present model would have to be extended to include three hierarchical levels (evidence, decision, and memory), as previously proposed by [Gigante et al., 2009].”
Are there any unexpected prediction the model makes beyond currently known binocular rivalry dynamics that can be tested experimentally?
Due to a highly nontrivial interaction with the competitive decision, discretely stochastic fluctuations of evidence level activity express themselves in a serial dependency of dominance durations. Several features of this dependency were unexpected and not reported previously, for example, the sensitivity to image contrast and also the `burstiness' of dominance reversals (i.e., extended episodes in which dominance periods are consistently longer or shorter than average). The fact that these predictions are confirmed by our experimental observations supports the proposed mechanism.
In addition, the hierarchical and modular architecture of the present model lends itself to a number of extensions, for example, to multistable phenomena with more than two competing appearances, or to situations with volatile and unpredictable input. We are currently working to develop additional predictions that we believe to be beyond the scope of the present manuscript.
c. Can the authors discuss their framework in the context of the visual system and competition between binocular and feature representations of the stimuli (in the vein of Wilson 2003)?
Discussion, page 25:
“A particularly intriguing previous model [Wilson, 2003] postulated a hierarchy with competing and adapting representations at two separate levels (8 state variables), one lower (monocular) and another higher (binocular) level. […] It is tempting to speculate that such ‘stacking’ might have a normative justification in that it might subserve hierarchical inference [Yuille et al., 2006; Hohwy and Friston, 2008, Friston, 2010].”
d. Another hierarchical model (Li, H.H.; Rankin, J.; Rinzel, J.; Carrasco, M. and Heeger, D. Attention model of binocular rivalry P Natl Acad Sci, 2017, 114.), building on the framework proposed by the cited Wilson 2003 study, proposes a descending featurebased excitation as a proxy for exogenous attention (in contrast with feedback suppression proposed here). Can the authors comment on the features of their data that this alternative mechanism might fail to capture. How could the effects of attention (or its absence) be explained by the model presented here?
Discussion, page 25/26:
“Another previous model [Li et al., 2017] used a hierarchy with three separate levels (and 24 state variables) to show that a stabilizing influence of selective visual attention could also explain slow rivalry when images are swapped rapidly. […] As the present model already incorporates the ‘biased competition’ that is widely thought to underlie selective attention [Kastner et al., 2000; Reynolds and Heeger, 2009], we expect that it could reproduce attentional effects by means of additive modulations.”
4. Modeling aspects:
a. Perceptual bistability for perceptual grouping (as for ambiguous auditory stimuli) has been modeled explicitly as evidence accumulation against the current percept; please discuss and relate your approach to these: Barniv and Nelken (2015), Nguyen, Rinzel and Curtu (2020)
Discussion, page 26/27:
“Continuous inference has been studied extensively in auditory streaming paradigms [Winkler et al., 2012; Denham et al., 2014]. […] The dynamics of two recent auditory models [Barniv and Nelken, 2015; Nguyen et al., 2020] are rather similar to the model presented here: while one sound pattern dominates awareness, evidence against this pattern is accumulated at a subliminal level.”
b. The model assumes two sets of independent bistable feature detectors. Please relate this assumption to potential brain areas and evidence that units with similar receptive fields may operate independently, as uncoupled units.
Discussion, page 27/28:
“Unfortunately, the spatial structure of the "shared variability" and "noise correlations" described above is poorly understood. […] Additionally, a new section was added on the neurophysiology of rivalry and its relation to our model.”
Discussion, page 27/28:
“Neurophysiological correlates of binocular rivalry
Neurophysiological correlates of binocular rivalry have been studied extensively, often by comparing reversals of phenomenal appearance during binocular stimulation with physical alternation of monocular stimulation [e.g., Leopold and Logothetis, 1996; Scheinberg and Logothetis, 1997; Wilke et al., 2006; Maier et al., 2008; Keliris et al., 2010; Panagiotaropoulos et al., 2012; Bahmani et al., 2014; Xu et al., 2016; Kapoor et al., 2020; Dwarakanath et al., 2020]. At higher cortical levels, such as inferior temporal cortex [Scheinbert and Logothetis, 1997] or prefrontal cortex [Panagiotaropoulos et al., 2012; Kapoor et al., 2020; Dwarakanth et al., 2020], binocular rivalry (BR) and physical alternation (PA) elicit broadly comparable neurophysiological responses that mirror perceptual appearance. […] Presumably, a “stacked” model with two successive levels of competitive interactions at monocular and binocular levels or representation [Wilson, 2003; Li et al., 2017] would be needed to account for these phenomena.”
c. What other experiments on decision making (perceptual or otherwise) might this framework be able to speak to?
The hierarchical and modular architecture of the present model lends itself to a number of extensions, for example, to multistable phenomena with more than two competing appearances, or to continuous detection of alternative stimuli in volatile and unpredictable environments. We are currently working to develop additional predictions that we believe to be beyond the scope of the present manuscript.
5. Re: specific conclusions: From one reviewer: Whilst I clearly understand what has been achieved I found it hard to pin down the specific conclusions from this study. State them explicitly, at the end of the discussion? Do not omit more specific conclusions based on this model and binocular rivalry before any other more general conclusions that draw from a wider literature.
Discussion, page 22:
“We have shown that many wellknown features of binocular rivalry are reproduced, and indeed guaranteed, by a particular dynamical mechanism. […] Indeed, the phenomena explained exhibited more effective degrees of freedom (approximately 14) than the mechanism itself (between 3 and 4).”
6. The authors should post/link their model codes to a welldocumented software repository. Interested readers could then see the model in action and inspect what happens internally.
We are of course happy to make available Matlab code for interested readers and have created a GitHub repository for this purpose: https://github.com/mauriziomattia/2021.BistablePerceptionModel
The folder 'plotExampleNetworkDynamics' generates and illustrates realizations of the hierarchical model dynamics for different levels of image contrast.
The folder 'analyzeOptimizationStatistics' displays our analysis of fit error in the vicinity of the optimal parameter set, as explained in the new section "Robustness of fit" in the Supplementary Materials and methods.
https://doi.org/10.7554/eLife.61581.sa2Article and author information
Author details
Funding
European Commission (FP7269459)
 Jochen Braun
Deutsche Forschungsgemeinschaft (BR 987/31)
 Jochen Braun
Deutsche Forschungsgemeinschaft (BR 987/41)
 Jochen Braun
H2020 European Research Council (45539)
 Maurizio Mattia
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Funding from EU FP7269459 Coronet, DFG BR 987/31, DFG 987/41, and
EU Human Brain Project SGA3945539.
The authors thank Andrew Parker and Maike S Braun for helpful comments.
Ethics
Human subjects: Six practised observers participated in the experiment (4 male, 2 female). Informed consent, and consent to publish, was obtained from all observers and ethical approval Z22/16 was obtained from the Ethics Commisson of the Faculty of Medicine of the OttovonGuericke University, Magdeburg.
Senior Editor
 Joshua I Gold, University of Pennsylvania, United States
Reviewing Editor
 John Rinzel, Courant Institute of Mathematical Sciences, New York University, United States
Reviewers
 James Rankin
 Peter Christiaan Klink, Netherlands Institute for Neuroscience, Netherlands
Publication history
 Received: July 29, 2020
 Accepted: May 24, 2021
 Version of Record published: August 9, 2021 (version 1)
Copyright
© 2021, Cao et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 521
 Page views

 88
 Downloads

 0
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.