Binocular rivalry reveals an out-of-equilibrium neural dynamics suited for decision-making
Abstract
In ambiguous or conflicting sensory situations, perception is often ‘multistable’ in that it perpetually changes at irregular intervals, shifting abruptly between distinct alternatives. The interval statistics of these alternations exhibits quasi-universal characteristics, suggesting a general mechanism. Using binocular rivalry, we show that many aspects of this perceptual dynamics are reproduced by a hierarchical model operating out of equilibrium. The constitutive elements of this model idealize the metastability of cortical networks. Independent elements accumulate visual evidence at one level, while groups of coupled elements compete for dominance at another level. As soon as one group dominates perception, feedback inhibition suppresses supporting evidence. Previously unreported features in the serial dependencies of perceptual alternations compellingly corroborate this mechanism. Moreover, the proposed out-of-equilibrium dynamics satisfies normative constraints of continuous decision-making. Thus, multistable perception may reflect decision-making in a volatile world: integrating evidence over space and time, choosing categorically between hypotheses, while concurrently evaluating alternatives.
Introduction
In deducing the likely physical causes of sensations, perception goes beyond the immediate sensory evidence and draws heavily on context and prior experience (von Helmholtz, 1867; Barlow et al., 1972; Gregory, 1980; Rock, 1983). Numerous illusions in visual, auditory, and tactile perception – all subjectively compelling, but objectively false – attest to this extrapolation beyond the evidence. In natural settings, perception explores alternative plausible causes of sensory evidence by active readjustment of sensors (‘active perception,’ Mirza et al., 2016; Yang et al., 2018; Parr and Friston, 2017a). In general, perception is thought to actively select plausible explanatory hypotheses, to predict the sensory evidence expected for each hypothesis from prior experience, and to compare the observed sensory evidence at multiple levels of scale or abstraction (‘analysis by synthesis,’ ‘predictive coding,’ ‘hierarchical Bayesian inference,’ Yuille and Kersten, 2006, Rao and Ballard, 1999, Parr and Friston, 2017b, Pezzulo et al., 2018). Active inference engages the entire hierarchy of cortical areas involved in sensory processing, including both feedforward and feedback projections (Bar, 2009; Larkum, 2013; Shipp, 2016; Funamizu et al., 2016; Parr et al., 2019).
The dynamics of active inference becomes experimentally observable when perceptual illusions are ‘multistable’ (Leopold and Logothetis, 1999). In numerous ambiguous or conflicting situations, phenomenal experience switches at irregular intervals between discrete alternatives, even though the sensory scene is stable (Necker, 2009; Wheatstone, 1838; Rubin, 1958; Attneave, 1971; Ramachandran and Anstis, 2016; Pressnitzer and Hupe, 2006; Schwartz et al., 2012). Multistable illusions are enormously diverse, involving visibility or audibility, perceptual grouping, visual depth or motion, and many kinds of sensory scenes, from schematic to naturalistic. Average switching rates differ greatly and range over at least two orders of magnitude (Cao et al., 2016), depending on sensory scene, perceptual grouping (Wertheimer, 1912; Koffka, 1935; Ternus, 1926), continuous or intermittent presentation (Leopold and Logothetis, 2002; Maier et al., 2003), attentional condition (Pastukhov and Braun, 2007), individual observer (Pastukhov et al., 2013c; Denham et al., 2018; Brascamp et al., 2019), and many other factors.
In spite of this diversity, the stochastic properties of multistable phenomena appear to be quasi-universal, suggesting that the underlying mechanisms may be general. Firstly, average dominance duration depends in a characteristic and counterintuitive manner on the strength of dominant and suppressed evidence (‘Levelt’s propositions I–IV,’ Levelt, 1965; Brascamp et al., 2006; Klink et al., 2016; Kang, 2009; Brascamp et al., 2015; Moreno-Bote et al., 2010). Secondly, the statistical distribution of dominance durations shows a stereotypical shape, resembling a gamma distribution with shape parameter (‘scaling property,’ Cao et al., 2016; Fox and Herrmann, 1967; Blake et al., 1971; Borsellino et al., 1972; Walker, 1975; De Marco et al., 1977; Murata et al., 2003; Brascamp et al., 2005; Pastukhov and Braun, 2007; Denham et al., 2018; Darki and Rankin, 2021). Thirdly, the durations of successive dominance periods are correlated positively, over at least two or three periods (Fox and Herrmann, 1967; Walker, 1975; Van Ee, 2005; Denham et al., 2018).
Here, we show that these quasi-universal characteristics are comprehensively and quantitatively reproduced, indeed guaranteed, by an interacting hierarchy of birth-death processes operating out of equilibrium. While the proposed mechanism combines some of the key features of previous models, it far surpasses their explanatory power.
Several possible mechanisms have been proposed for perceptual dominance, the triggering of reversals, and the stochastic timing of reversals. That a single, coherent interpretation typically dominates phenomenal experience is thought to reflect competition (explicit or implicit) at the level of explanatory hypotheses (e.g., Dayan, 1998), sensory inputs (e.g., Lehky, 1988), or both (e.g., Wilson, 2003). That a dominant interpretation is occasionally supplanted by a distinct alternative has been attributed to fatigue processes (e.g., neural adaptation, synaptic depression, Laing and Chow, 2002), spontaneous fluctuations (‘noise,’ e.g., Wilson, 2007, Kim et al., 2006), stochastic sampling (e.g., Schrater and Sundareswara, 2006), or combinations of these (e.g., adaptation and noise, Shpiro et al., 2009; Seely and Chow, 2011; Pastukhov et al., 2013c). The characteristic stochasticity (gamma-like distribution) of dominance durations has been attributed to Poisson counting processes (e.g., birth-death processes, Taylor and Ladridge, 1974; Gigante et al., 2009; Cao et al., 2016) or stochastic accumulation of discrete samples (Murata et al., 2003; Schrater and Sundareswara, 2006; Sundareswara and Schrater, 2008; Weilnhammer et al., 2017).
‘Dynamical’ models combining competition, adaptation, and noise capture well the characteristic dependence of dominance durations on input strength (‘Levelt’s propositions’) (Laing and Chow, 2002; Wilson, 2007; Ashwin and Aureliu, 2010), especially when inputs are normalized (Moreno-Bote et al., 2007; Moreno-Bote et al., 2010; Cohen et al., 2019), and when the dynamics emphasize noise (Shpiro et al., 2009; Seely and Chow, 2011; Pastukhov et al., 2013c). However, such models do not preserve distribution shape over the full range of input strengths (Cao et al., 2016; Cohen et al., 2019). On the other hand, ‘sampling’ models based on discrete random processes preserve distribution shape (Taylor and Ladridge, 1974; Murata et al., 2003; Schrater and Sundareswara, 2006; Sundareswara and Schrater, 2008; Cao et al., 2016; Weilnhammer et al., 2017), but fail to reproduce the dependence on input strength. Neither type of model accounts for the sequential dependence of dominance durations (Laing and Chow, 2002).
Here, we reconcile ‘dynamical’ and ‘sampling’ approaches to multistable perception, extending an earlier effort (Gigante et al., 2009). Importantly, every part of the proposed mechanism appears to be justified normatively in that it may serve to optimize perceptual choices in a general behavioral situation, namely, continuous inference in uncertain and volatile environments (Bogacz, 2007; Veliz-Cuba et al., 2016). We propose that sensory inputs are represented by birth-death processes in order to accumulate sensory information over time and in a format suited for Bayesian inference (Ma et al., 2006; Pouget et al., 2013). Further, we suggest that explanatory hypotheses are evaluated competitively, with a hypothesis attaining dominance (over phenomenal experience) when its support exceeds the alternatives by a certain finite amount, consistent with optimal decision-making between multiple alternatives (Bogacz, 2007). Finally, we assume that a dominant hypothesis suppresses its supporting evidence, as required by ‘predictive coding’ implementations of hierarchical Bayesian inference (Pearl, 1988; Rao and Ballard, 1999; Hohwy et al., 2008). In contrast to many previous models, we do not require a local mechanisms of fatigue, adaptation, or decay.
Based on these assumptions, the proposed mechanism reproduces dependence on input strength, as well as distribution of dominance durations and positive sequential dependence. Additionally, it predicts novel and unsuspected dynamical features confirmed by experiment.
Results
Below we introduce each component of the mechanism and its possible normative justification, before describing out-of-equilibrium dynamics resulting from the interaction of all components. Subsequently, we compare model predictions with multistable perception of human observers, specifically, the dominance statistics of binocular rivalry (BR) at various combinations of left- and right-eye contrasts (Figure 1a).
Hierarchical dynamics
Bistable assemblies: ‘local attractors’
As operative units of sensory representation, we postulate neuronal assemblies with bistable ‘attractor’ dynamics. Effectively, assembly activity moves in an energy landscape with two distinct quasi-stable states – dubbed ‘on’ and ‘off’ – separated by a ridge (Figure 1b). Driven by noise, assembly activity mostly remains near one quasi-stable state (‘on’ or ‘off’), but occasionally ‘escapes’ to the other state (Kramers, 1940; Hanggi et al., 1990; Deco and Hugues, 2012; Litwin-Kumar and Doiron, 2012; Huang and Doiron, 2017).
An important feature of ‘attractor’ dynamics is that the energy of quasi-stable states depends sensitively on external input. Net positive input destabilizes (i.e., raises the potential of) the ‘off’ state and stabilizes (i.e., lowers the potential of) the ‘on’ state. Transition rates are even more sensitive to external input as they depend approximately exponentially on the height of the energy ridge (‘activation energy’).
Figure 1b illustrates ‘attractor’ dynamics for an assembly of 150 spiking neurons with activity levels of approximately and per neuron in the ‘off’ and ‘on’ states, respectively. Full details are provided in Appendix 1, section Metastable population dynamics, and Appendix 1—figure 2.
Binary stochastic variables
Our model is independent of neural details and relies exclusively on an idealized description of ‘attractor’ dynamics. Specifically, we reduce bistable assemblies to discretely stochastic, binary activity variables , which activate and inactivate with Poisson rates and , respectively. These rates vary exponentially and anti-symmetrically with increments or decrements of activation energy :
where and are baseline potential and baseline rate, respectively, and where the input-dependent part varies linearly with input , with synaptic coupling constant (see Appendix 1, section Metastable population dynamics and Appendix 1—figure 2e).
Pool of binary variables
An extended network, containing individually bistable assemblies with shared input , reduces to a ‘pool’ of binary activity variables with identical rates . Although all variables are independently stochastic, they are coupled through their shared input . The number of active variables or, equivalently, the active fraction =, forms a discretely stochastic process (‘birth-death’ or ‘Ehrenfest’ process; Karlin and McGregor, 1965).
Relaxation dynamics
While activity develops discretely and stochastically according to Equation 5 (Materials and methods), its expectation develops continuously and deterministically,
relaxing with characteristic time towards asymptotic value . As rates change with input (Equation 1), we can define the functions and (see Materials and methods). Characteristic time is longest for small input and shortens for larger positive or negative input . The asymptotic value ranges over the interval (0, 1) and varies sigmoidally with input , reaching half-activation for .
Quality of representation
Pools of bistable variables belong to a class of neural representations particularly suited for Bayesian integration of sensory information (Beck et al., 2008; Pouget et al., 2013). In general, summation of activity is equivalent to optimal integration of information, provided that response variability is Poisson-like, and response tuning differs only multiplicatively (Ma et al., 2006; Ma et al., 2008). Pools of bistable variables closely approximate these properties (see Appendix 1, section Quality of representation: Suitability for inference).
The representational accuracy of even a comparatively small number of bistable variables can be surprisingly high. For example, if normally distributed inputs drive the activity of initially inactive pools of bistable variables, pools as used in the present model (, ) readily capture 90% of the Fisher information (see Appendix 1, section Quality of representation: Integration of noisy samples).
Conflicting evidence
Any model of BR must represent the conflicting evidence from both eyes (e.g., different visual orientations), which supports alternative perceptual hypotheses (e.g., distinct grating patterns). We assume that conflicting evidence accumulates in two separate pools of bistable variables, and , (‘evidence pools,’ Figure 1c). Fractional activations and develop stochastically following Equation 5 (Materials and methods). Transition rates and vary exponentially with activation energy (Equation 1), with baseline potential and baseline rate . The variable components of activation energy, and , are synaptically modulated by image contrasts, and :
where is a coupling constant and is a monotonic function of image contrast (see Materials and methods).
Competing hypotheses: ‘non-local attractors’
Once evidence for, and against, alternative perceptual hypotheses (e.g., distinct grating patterns) has been accumulated, reaching a decision requires a sensitive and reliable mechanism for identifying the best supported hypothesis and amplifying the result into a categorical read-out. Such a winner-take-all decision (Koch and Ullman, 1985) is readily accomplished by a dynamical version of biased competition (Deco and Rolls, 2005; Wang, 2002; Deco et al., 2007; Wang, 2008).
We assume that alternative perceptual hypotheses are represented by two further pools of bistable variables, and , forming two ‘non-local attractors’ (‘decision pools,’ Figure 1c). Similar to previous models of decision-making and attentional selection (Deco and Rolls, 2005; Wang, 2002; Deco et al., 2007; Wang, 2008), we postulate recurrent excitation within pools, but recurrent inhibition between pools, to obtain a ‘winner-take-all’ dynamics. Importantly, we assume that ‘evidence pools’ project to ‘decision pools’ not only in the form of selective excitation (targeted at the corresponding decision pool), but also in the form of indiscriminate inhibition (targeting both decision pools), as suggested previously (Ditterich et al., 2003; Bogacz et al., 2006).
Specifically, fractional activations and develop stochastically according to Equation 5 (Materials and methods). Transition rates and vary exponentially with activation energy (Equation 1), with baseline difference and baseline rate . The variable components of activation energy, and , are synaptically modulated by evidence and decision activities:
where coupling constants , , , reflect feedforward excitation, feedforward inhibition, lateral cooperation within decision pools, and lateral competition between decision pools, respectively.
This biased competition circuit expresses a categorical decision by either raising towards unity (and lowering towards zero) or vice versa. The choice is random when visual input is ambiguous, , but becomes deterministic with growing input bias . This probabilistic sensitivity to input bias is reliable and robust under arbitrary initial conditions of , , and (see Appendix 1, section Categorical choice with Appendix 1—figure 3).
Feedback suppression
Finally, we assume feedback suppression, with each decision pool selectively targeting the corresponding evidence pool. A functional motivation for this systematic bias against the currently dominant appearance is given momentarily. Its effects include curtailment of dominance durations and ensuring that reversals occur from time to time. Specifically, we modify Equation 3 to
where is a coupling constant.
Previous models of BR (Dayan, 1998; Hohwy et al., 2008) have justified selective feedback suppression of the evidence supporting a winning hypothesis in terms of ‘predictive coding’ and ‘hierarchical Bayesian inference’ (Rao and Ballard, 1999; Lee and Mumford, 2003). An alternative normative justification is that, in volatile environments, where the sensory situation changes frequently (‘volatility prior’), optimal inference requires an exponentially growing bias against evidence for the most likely hypothesis (Veliz-Cuba et al., 2016). Note that feedback suppression applies selectively to evidence for a winning hypothesis and is thus materially different from visual adaptation (Wark et al., 2009), which applies indiscriminately to all evidence present.
Reversal dynamics
A representative example of the joint dynamics of evidence and decision pools is illustrated in Figure 1c,d, both at the level of pool activities , , , , and at the level of individual bistable variables . The top row shows decision pools and , with instantaneous active counts, and and active/inactive states of individual variables . The bottom row shows evidence pools and , with instantaneous active counts, and and active/inactive states of individual variables . Only a small fraction of evidence variables is active at any one time.
Phenomenal appearance reverses when the differential activity of evidence pools, and , contradicts sufficiently strongly the differential activity of decision pools, and , such that the steady state of decision pools is destabilized (see further below and Figure 4). As soon as the reversal has been effected at the decision level, feedback suppression lifts from the newly non-dominant evidence and descends upon the newly dominant evidence. Due to this asymmetric suppression, the newly non-dominant evidence recovers, whereas the newly dominant evidence habituates. This opponent dynamics progresses, past the point of equality , until differential evidence activity once again contradicts differential decision activity . Whereas the activity of decision pools varies in phase (or counterphase) with perceptual appearance, the activity of evidence pools changes in quarterphase (or negative quarterphase) with perceptual appearance (e.g., Figures 1c,d,2a), consistent with some previous models (Gigante et al., 2009; Albert et al., 2017; Weilnhammer et al., 2017).
Binocular rivalry
To compare predictions of the model described above to experimental observations, we measured spontaneous reversals of BR for different combinations of image contrast. BR is a particularly well-studied instance of multistable perception (Wheatstone, 1838; Diaz-Caneja, 1928; Levelt, 1965; Leopold and Logothetis, 1999; Brascamp et al., 2015). When conflicting images are presented to each eye (e.g., by means of a mirror stereoscope or of colored glasses, see Materials and methods), the phenomenal appearance reverses from time to time between the two images (Figure 1a). Importantly, the perceptual conflict involves also representations of coherent (binocular) patterns and is not restricted to eye-specific (monocular) representations (Logothetis et al., 1996; Kovács et al., 1996; Bonneh et al., 2001; Blake and Logothetis, 2002).
Specifically, our experimental observations established reversal sequences for combinations of image contrast, . During any given dominance period, is the contrast of the phenomenally dominant image and the contrast of the other, phenomenally suppressed image (see Materials and methods). We analyzed these observations in terms of mean dominance durations , higher moments and of the distribution of dominance durations, and sequential correlation of successive dominance durations.
Additional aspects of serial dependence are discussed further below.
As described in Materials and methods, we fitted 11 model parameters to reproduce observations with more than 50 degrees of freedom: mean dominance durations , coefficients of variation , one value of skewness , and one correlation coefficient . The latter two values were obtained by averaging over contrast combinations and rounding. Importantly, minimization of the fit error, by random sampling of parameter space with a stochastic gradient descent, resulted in a three-dimensional manifold of suboptimal solutions. This revealed a high degree of redundancy among the 11 model parameters (see Materials and methods). Accordingly, we estimate that the effective number of degrees of freedom needed to reproduce the desired out-of-equilibrium dynamics was between 3 and 4. Model predictions and experimental observations are juxtaposed in Figures 3 and 4.
The complex and asymmetric dependence of mean dominance durations on image contrast — aptly summarized by Levelt’s ‘propositions’ I to IV (Levelt, 1965; Brascamp et al., 2015) — is fully reproduced by the model (Figure 3). Here, we use the updated definition of Brascamp et al., 2015: increasing the contrast of one image increases the fraction of time during which this image dominates appearance (‘predominance,’ Levelt I). Counterintuitively, this is due more to shortening dominance of the unchanged image than to lengthening dominance of the changed image (Levelt II, Figure 3b, left panel). Mean dominance durations grow (and alternation rates decline) symmetrically around equal predominance as contrast difference increases (Levelt III, Figure 3b, right panel). Mean dominance durations shorten when both image contrasts increase (Levelt IV, Figure 3c).
Successive dominance durations are typically correlated positively (Fox and Herrmann, 1967; Walker, 1975; Pastukhov et al., 2013c). Averaging over all contrast combinations, observed and fitted correlation coefficients were comparable with (mean and standard deviation). Unexpectedly, both observed and fitted correlations coefficients increased systematically with image contrast (, ), growing from at to at (Figure 3e, blue symbols). It is important to that this dependence was not fitted. Rather, this previously unreported dependence constitutes a model prediction that is confirmed by observation.
The distribution of dominance durations typically takes a characteristic shape (Cao et al., 2016; Fox and Herrmann, 1967; Blake et al., 1971; Borsellino et al., 1972; Walker, 1975; De Marco et al., 1977; Murata et al., 2003; Brascamp et al., 2005; Pastukhov and Braun, 2007; Denham et al., 2018), approximating a gamma distribution with shape parameter , or coefficient of variation . The fitted model fully reproduces this ‘scaling property’ (Figure 4). The observed coefficient of variation remained in the range for nearly all contrast combinations (Figure 4b). Unexpectedly, both observed and fitted values increased above, or decreased below, this range at extreme contrast combinations (Figure 4b, left panel). Along the main diagonal , where observed values had smaller error bars, both observed and fitted values of skewness were and thus approximated a gamma distribution (Figure 4d, blue symbols).
Specific contribution of evidence and decision levels
What are the reasons for the surprising success of the model in reproducing universal characteristics of multistable phenomena, including the counterintuitive input dependence (‘Levelt’s propositions’), the stereotypical distribution shape (‘scaling property’), and the positive sequential correlation (as detailed in Figures 3 and 4)? Which level of model dynamics is responsible for reproducing different aspects of BR dynamics?
Below, we describe the specific contributions of different model components. Specifically, we show that the evidence level of the model reproduces ‘Levelt’s propositions I–III’ and the ‘scaling property,’ whereas the decision level reproduces ‘Levelt’s proposition IV.’ A non-trivial interaction between evidence and decision levels reproduces serial dependencies. Additionally, we show that this interaction predicts further aspects of serial dependencies – such as sensitivity to image contrast – that were not reported previously, but are confirmed by our experimental observations.
Levelt’s propositions I, II, and III
The characteristic input dependence of average dominance durations emerges in two steps (as in Gigante et al., 2009). First, inputs and feedback suppression shape the birth-death dynamics of evidence pools and (by setting disparate transition rates , following Equation 3’ and Equation 1). Second, this sets in motion two opponent developments (habituation of dominant evidence activity and recovery of non-dominant evidence activity, both following Equation 2) that jointly determine dominance duration.
To elucidate this mechanism, it is helpful to consider the limit of large pools () and its deterministic dynamics (Figure 2), which corresponds to the average stochastic dynamics. In this limit, periods of dominant evidence or start and end at the same levels ( and ), because reversal thresholds are the same for evidence difference and (see section Levelt IV below).
The rates at which evidence habituates or recovers depend, in the first instance, on asymptotic levels and (Equation 1 and 2, Figure 2b and Appendix 1—figure 4). In general, dominance durations depend on distance between asymptotic levels: the further apart these are, the faster the development and the shorter the duration. As feedback suppression inverts the sign of the opponent developments, dominant evidence decreases (habituates) while non-dominant evidence increases (recovers). Due to this inversion, is roughly proportional to . It follows that the distance is smaller and the reversal dynamics slower when dominant input is stronger, and vice versa. It further follows that incrementing one input (and raising the corresponding asymptotic level) speeds up recovery or slows down habituation, shortening or lengthening periods of non-dominance and dominance, respectively (Levelt I).
In the second instance, rates of habituation or recovery depend on characteristic times and (Equation 1 and 2). When these rates are unequal, dominance durations depend more sensitively on the slower process. This is why dominance durations depend more sensitively on non-dominant input (Levelt II): recovery of non-dominant evidence is generally slower than habituation of dominant evidence, independently of which input is weaker or stronger. The reason is that the respective effects of characteristic times and and asymptotic levels and are synergistic for weaker-input evidence (in both directions), whereas they are antagonistic for stronger-input evidence (see Appendix 1, section Deterministic dynamics: Evidence pools and Appendix 1—figure 4).
In general, dominance durations depend hyperbolically on (Figure 2c and Equation 7 in Appendix 1). Dominance durations become infinite (and reversals cease) when falls below the reversal threshold . This hyperbolic dependence is also why alternation rate peaks at equidominance (Levelt III): increasing the difference between inputs always lengthens longer durations more than it shortens shorter durations, thus lowering alternation rate.
Distribution of dominance durations
For all combinations of image contrast, the mechanism accurately predicts the experimentally observed distributions of dominance durations. This is owed to the stochastic activity of pools of bistable variables.
Firstly, dominance distributions retain nearly the same shape, even though average durations vary more than threefold with image contrast (see also Appendix 1—figure 6a,b). This ‘scaling property’ is due to the Poisson-like variability of birth-death processes (see Appendix 1, section Stochastic dynamics). Generally, when a stochastic accumulation approaches threshold, the rates of both accumulation and dispersion of activity affect the distribution of first-passage-times (Cao et al., 2014; Cao et al., 2016). In the special case of Poisson-like variability, the two rates vary proportionally and preserve distribution shape (see also Appendix 1—figure 6c,d).
Secondly, predicted distributions approximate gamma distributions with scale factor . As shown previously (Cao et al., 2014; Cao et al., 2016), this is due to birth-death processes accumulating activity within a narrow range (i.e., evidence difference ). In this low-threshold regime, the first-passage-times of birth-death processes are both highly variable and gamma distributed, consistent with experimental observations.
Thirdly, the predicted variability (coefficients of variation) of dominance periods varies along the axis, being larger for longer than for shorter dominance durations (Figure 4a,b). The reason is that stochastic development becomes noise-dominated. For longer durations, stronger-input evidence habituates rapidly into a regime where random fluctuations gain importance (see also Appendix 1—figure 4a,b).
Levelt’s proposition IV
The model accurately predicts how dominance durations shorten with higher image contrast (Levelt IV). Surprisingly, this reflects the dynamics of decision pools and (Figure 5).
Here again it is helpful to consider the deterministic limit of large pools (). In this limit, a dominant decision state is destabilized when a contradictory evidence difference exceeds a certain threshold value (Figure 5b and Appendix 1, section Deterministic dynamics: Decision pools). Due to the combined effect of excitatory and inhibitory feedforward projections, and (Equation 4 and Figure 5a), this average reversal threshold decreases with mean evidence activity . Simulations of the fully stochastic model () confirm this analysis (Figure 5c). As average evidence activity increases with image contrast, the average evidence bias at the time of reversals decreases, resulting in shorter dominance periods (Figure 5d).
Serial dependence
The proposed mechanism predicts positive correlations between successive dominance durations, a well-known characteristic of multistable phenomena (Fox and Herrmann, 1967; Walker, 1975; Van Ee, 2005; Denham et al., 2018). In addition, it predicts further aspects of serial dependence not reported previously.
In both model and experimental observations, a long dominance period tends to be followed by another long period, and a short dominance period by another short period (Figure 6). In the model, this is due to mean evidence activity fluctuating stochastically above and below its long-term average. The autocorrelation time of these fluctuations increases monotonically with image contrast and, for high contrast, spans multiple dominance periods (see Appendix 1, section Characteristic times and Appendix 1—figure 7). Note that fluctuations of diminish as the number of bistable variables increases and vanishe in the deterministic limit .
Crucially, fluctuations of mean evidence modulate both reversal threshold and dominance durations , as illustrated in Figure 6a,b. To obtain Figure 6a, dominance durations were grouped into quantiles and the average duration of each quantile was compared to the conditional expectation of preceding and following durations (upper graph). For the same quantiles (compare color coding), average evidence activity was compared to the conditional expectation at the end of preceding and following periods (lower graph). Both the inverse relation between and and the autocorrelation over multiple dominance periods are evident.
This source of serial dependency – comparatively slow fluctuations of and – predicts several qualitative characteristics not reported previously and now confirmed by experimental observations. First, sequential correlations are predicted (and observed) to be strictly positive at all lags (next period, one-after-next period, and so on) (Figure 6d). In other words, it predicts that several successive dominance periods are shorter (or longer) than average.
Second, due to the contrast dependence of autocorrelation time, sequential correlations are predicted (and observed) to increase with image contrast (Figure 6d). The experimentally observed degree of contrast dependence is broadly consistent with pool sizes between and (black and red curves in Figure 3e). Larger pools with hundreds of bistable variables do not express the observed dependence on contrast (not shown).
Third, for high image contrast, reversal sequences are predicted (and observed) to contain extended episodes with dominance periods that are short or extended episodes with periods that are long (Figure 6c). When quantified in terms of a ‘burstiness index,’ the degree of inhomogeneity in predicted and observed reversal sequences is comparable (see Appendix 1, section Burstiness and Appendix 1—figure 8).
Many previous models of BR (e.g., Laing and Chow, 2002) postulated selective adaptation of competing representations to account for serial dependency. However, selective adaptation is an opponent process that favors positive correlations between different dominance periods, but negative correlations between same dominance periods. To demonstrate this point, we fitted such a model to reproduce our experimental observations (, , , and ) for five image contrasts . As expected, the alternative model predicts negative correlations for same dominance periods (Figure 6d, right panel), contrary to what is observed.
Discussion
We have shown that many well-known features of BR are reproduced, and indeed guaranteed, by a particular dynamical mechanism. Specifically, this mechanism reproduces the counterintuitive input dependence of dominance durations (‘Levelt’s propositions’), the stereotypical shape of dominance distributions (‘scaling property’), and the positive sequential correlation of dominance periods. The explanatory power of the proposed mechanism is considerably higher than that of previous models. Indeed, the observations explained exhibited more effective degrees of freedom (approximately 14) than the mechanism itself (between 3 and 4).
The proposed mechanism is biophysically plausible in terms of the out-of-equilibrium dynamics of a modular and hierarchical network of spiking neurons (see also further below). Individual modules idealize the input dependence of attractor transitions in assemblies of spiking neurons. All synaptic effects superimpose linearly, consistent with extended mean-field theory for neuronal networks (Amit and Brunel, 1997; Van Vreeswijk and Sompolinski, 1996). The interaction between ‘rivaling’ sets of modules (‘pools’) results in divisive normalization, which is consistent with many cortical models (Carandini and Heeger, 2011; Miller, 2016).
It has long been suspected that multistable phenomena in visual, auditory, and tactile perception may share a similar mechanistic origin. As the features of BR explained here are in fact universal features of multistable phenomena in different modalities, we hypothesize that similar out-of-equilibrium dynamics of modular networks may underlie all multistable phenomena in all sensory modalities. In other words, we hypothesize that this may be a general mechanism operating in many perceptual representations.
Dynamical mechanism
Two principal alternatives have been considered for the dynamical mechanism of perceptual decision-making: drift-diffusion models (Luce, 1986; Ratcliff and Smith, 2004) and recurrent network models (Wang, 2008; Wang, 2012). The mechanism proposed here combines both alternatives: at its evidence level, sensory information is integrated, over both space and time, by ‘local attractors’ in a discrete version of a drift-diffusion process. At its decision level, the population dynamics of a recurrent network implements a winner-take-all competition between ‘non-local attractors.’ Together, the two levels form a ‘nested attractor’ system (Braun and Mattia, 2010) operating perpetually out of equilibrium.
A recurrent network with strong competition typically ‘normalizes’ individual responses relative to the total response (Miller, 2016). Divisive normalization is considered a canonical cortical computation (Carandini and Heeger, 2011), for which multiple rationales can be found. Here, divisive normalization is augmented by indiscriminate feedforward inhibition. This combination ensures that decision activity rapidly and reliably categorizes differential input strength, largely independently of total input strength.
Another key feature of the proposed mechanism is that a ‘dominant’ decision pool applies feedback suppression to the associated evidence pool. Selective suppression of evidence for a winning hypothesis features in computational theories of ‘hierarchical inference’ (Rao and Ballard, 1999; Lee and Mumford, 2003; Parr and Friston, 2017b; Pezzulo et al., 2018), as well as in accounts of multistable perception inspired by such theories (Dayan, 1998; Hohwy et al., 2008; Weilnhammer et al., 2017). A normative reason for feedback suppression arises during continuous inference in uncertain and volatile environments, where the accumulation of sensory information is ongoing and cannot be restricted to appropriate intervals (Veliz-Cuba et al., 2016). Here, optimal change detection requires an exponentially rising bias against evidence for the most likely state, ensuring that even weak changes are detected, albeit with some delay.
The pivotal feature of the proposed mechanism are pools of bistable variables or ‘local attractors.’ Encoding sensory inputs in terms of persistent ‘activations’ of local attractors assemblies (rather than in terms of transient neuronal spikes) creates an intrinsically retentive representation: sites that respond are also sites that retain information (for a limited time). Our results are consistent with a few tens of bistable variables in each pool. In the proposed mechanism, differential activity of two pools accumulates evidence against the dominant appearance until a threshold is reached and a reversal ensues (see also Barniv and Nelken, 2015; Nguyen et al., 2020). Conceivably, this discrete non-equilibrium dynamics might instantiate a variational principle of inference such as ‘maximum caliber’ (Pressé et al., 2013; Dixit et al., 2018).
Emergent features
The components of the proposed mechanism interact to guarantee the statistical features that characterize BR and other multistable phenomena. Discretely stochastic accumulation of differential evidence against the dominant appearance ensures sensitivity of dominance durations to non-dominant input. It also ensures the invariance of relative variability (‘scaling property’) and gamma-like distribution shape of dominance durations. Due to a non-trivial interaction with the competitive decision, discretely stochastic fluctuations of evidence-level activity express themselves in a serial dependency of dominance durations. Several features of this dependency were unexpected and not reported previously, for example, the sensitivity to image contrast and the ‘burstiness’ of dominance reversals (i.e., extended episodes in which dominance periods are consistently longer or shorter than average). The fact that these predictions are confirmed by our experimental observations provides further support for the proposed mechanism.
Relation to previous models
How does the proposed mechanism compare to previous ‘dynamical’ models of multistable phenomena? It is of similar complexity as previous minimal models (Laing and Chow, 2002; Wilson, 2007; Moreno-Bote et al., 2010) in that it assumes four state variables at two dynamical levels, one slow (accumulation) and one fast (winner-take-all competition). It differs in reversing their ordering: visual input impinges first on the slow level, which then drives the fast level. It also differs in that stochasticity dominates the slow dynamics (as suggested by van Ee, 2009), not the fast dynamics. However, the most fundamental difference is discreteness (pools of bistable variables), which shapes all key dynamical properties.
Unlike many previous models (e.g., Laing and Chow, 2002; Wilson, 2007; Moreno-Bote et al., 2007; Moreno-Bote et al., 2010; Cohen et al., 2019), the proposed mechanism does not include adaptation (stimulation-driven weakening of evidence), but a phenomenologically similar feedback suppression (perception-driven weakening of evidence). Evidence from perceptual aftereffects supports the existence of both stimulation- and perception-driven adaptation, albeit at different levels of representation. Aftereffects in the perception of simple visual features – such as orientation, spatial frequency, or direction of motion (Blake and Fox, 1974; Lehmkuhle and Fox, 1975; Wade and Wenderoth, 1978) – are driven by stimulation rather than by perceived dominance, whereas aftereffects in complex features – such as spiral motion, subjective contours, rotation in depth (Wiesenfelder and Blake, 1990; Van der Zwan and Wenderoth, 1994; Pastukhov et al., 2014a) – typically depend on perceived dominance. Several experimental observations related to BR have been attributed to stimulation-driven adaptation (e.g., negative priming, flash suppression, generalized flash suppression; Tsuchiya et al., 2006). The extent to which a perception-driven adaptation could also explain these observations remains an open question for future work.
Multistable perception induces a positive priming or ‘sensory memory’ (Pearson and Clifford, 2005; Pastukhov and Braun, 2008; Pastukhov et al., 2013a), which can stabilize a dominant appearance during intermittent presentation (Leopold et al., 2003; Maier et al., 2003; Sandberg et al., 2014). This positive priming exhibits rather different characteristics (e.g., shape-, size- and motion-specificity, inducement period, persistence period) than the negative priming/adaptation of rivaling representations (de Jong et al., 2012; Pastukhov et al., 2013a; Pastukhov and Braun, 2013b; Pastukhov et al., 2014a; Pastukhov et al., 2014b; Pastukhov, 2016). To our mind, this evidence suggest that sensory memory is mediated by additional levels of representation and not by self-stabilization of rivaling representations, as has been suggested (Noest et al., 2007; Leptourgos, 2020). To incorporate sensory memory, the present model would have to be extended to include three hierarchical levels (evidence, decision, and memory), as previously proposed by Gigante et al., 2009.
BR arises within local regions of the visual field, measuring approximately to in the fovea (Leopold, 1997; Logothetis, 1998). No rivalry ensues when the stimulated locations in the left and right eye are more distant from each other. The computational model presented here encompasses only one such local region, and therefore cannot reproduce spatially extended phenomena such as piecemeal rivalry (Blake et al., 1992) or traveling waves (Wilson et al., 2001). To account for these phenomena, the visual field would have to be tiled with replicant models linked by grouping interactions (Knapen et al., 2007; Bressloff and Webber, 2012).
A particularly intriguing previous model (Wilson, 2003) postulated a hierarchy with competing and adapting representations in eight state variables at two separate levels, one lower (monocular) and another higher (binocular) level. This ‘stacked’ architecture could explain the fascinating experimental observation that one image can continue to dominate (dominance durations ) even when images are rapidly swapped between eyes (period ) (Kovács et al., 1996; Logothetis et al., 1996). We expect that our hierarchical model could also account for this phenomenon if it were to be replicated at two successive levels. It is tempting to speculate that such ‘stacking’ might have a normative justification in that it might subserve hierarchical inference (Yuille and Kersten, 2006; Hohwy et al., 2008; Friston, 2010).
Another previous model (Li et al., 2017) used a hierarchy with 24 state variables at three separate levels to show that a stabilizing influence of selective visual attention could also explain slow rivalry when images are swapped rapidly. Additionally, this rather complex model reproduced the main features of Levelt’s propositions, but did not consider scaling property and sequential dependency. The model shared some of the key features of the present model (divisive inhibition, differential excitation-inhibition), but added a multiplicative attentional modulation. As the present model already incorporates the ‘biased competition’ that is widely thought to underlie selective attention (Sabine and Ungerleider, 2000; Reynolds and Heeger, 2009), we expect that it could reproduce attentional effects by means of additive modulations.
Continuous inference
The notion that multistable phenomena such as BR reflect active exploration of explanatory hypotheses for sensory evidence has a venerable history (von Helmholtz, 1867; Barlow et al., 1972; Gregory, 1980; Leopold and Logothetis, 1999). The mechanism proposed here is in keeping with that notion: higher-level ‘explanations’ compete for control (‘dominance’) of phenomenal appearance in terms of their correspondence to lower-level ‘evidence.’ An ‘explanation’ takes control if its correspondence is sufficiently superior to that of rival ‘explanations.’ The greater the superiority, the longer control is retained. Eventually, alternative ‘explanations’ seize control, if only briefly. This manner of operation is also consistent with computational theories of ‘analysis by synthesis’ or ‘hierarchical inference,’ although there are many differences in detail (Rao and Ballard, 1999; Parr and Friston, 2017b; Pezzulo et al., 2018).
Interacting with an uncertain and volatile world necessitates continuous and concurrent evaluation of sensory evidence and selection of motor action (Cisek and Kalaska, 2010; Gold and Stocker, 2017). Multistable phenomena exemplify continuous decision-making without external prompting (Braun and Mattia, 2010). Sensory decision-making has been studied extensively, mostly in episodic choice-task, and the neural circuits and activity dynamics underlying episodic decision-making – including representations of potential choices, sensory evidence, and behavioral goals – have been traced in detail (Cisek and Kalaska, 2010; Gold and Shadlen, 2007; Wang, 2012; Krug, 2020). Interestingly, there seems to be substantial overlap between choice representations in decision-making and in multistable situations (Braun and Mattia, 2010).
Continuous inference has been studied extensively in auditory streaming paradigms (Winkler et al., 2012; Denham et al., 2014). The auditory system seems to continually update expectations for sound patterns on the basis of recent experience. Compatible patterns are grouped together in auditory awareness, and incompatible patterns result in spontaneous reversals between alternatives. Many aspects of this rich phenomenology are reproduced by computational models driven by some kind of ‘prediction error’ (Mill et al., 2013). The dynamics of two recent auditory models (Barniv and Nelken, 2015; Nguyen et al., 2020) are rather similar to the model presented here: while one sound pattern dominates awareness, evidence against this pattern is accumulated at a subliminal level.
Relation to neural substrate
What might be the neural basis of the bistable variables/‘local attractors’ proposed here? Ongoing activity in sensory cortex appears to be low-dimensional, in the sense that the activity of neurons with similar response properties varies concomitantly (‘shared variability,’ ‘noise correlations,’ Ponce-Alvarez et al., 2012, Mazzucato et al., 2015, Engel et al., 2016, Rich and Wallis, 2016, Mazzucato et al., 2019). This shared variability reflects the spatial clustering of intracortical connectivity (Muir and Douglas, 2011; Okun et al., 2015; Cossell et al., 2015; Lee et al., 2016; Rosenbaum et al., 2017) and unfolds over moderately slow time scales (in the range of to both in primates and rodents (Ponce-Alvarez et al., 2012; Mazzucato et al., 2015; Cui et al., 2016; Engel et al., 2016; Rich and Wallis, 2016; Mazzucato et al., 2019).
Possible dynamical origins of shared and moderately slow variability have been studied extensively in theory and simulation (for reviews, see Miller, 2016; Huang and Doiron, 2017; La Camera et al., 2019). Networks with weakly clustered connectivity (e.g., 3% rewiring) can express a metastable attractor dynamics with moderately long time scales (Litwin-Kumar and Doiron, 2012; Doiron and Litwin-Kumar, 2014; Schaub et al., 2015; Rosenbaum et al., 2017). In a metastable dynamics, individual (connectivity-defined) clusters transition spontaneously between distinct and quasi-stationary activity levels (‘attractor states’) (Tsuda, 2001; Stern et al., 2014).
Evidence for metastable attractor dynamics in cortical activity is accumulating steadily (Mattia et al., 2013; Mazzucato et al., 2015; Rich and Wallis, 2016; Engel et al., 2016; Marcos et al., 2019; Mazzucato et al., 2019). Distinct activity states with exponentially distributed durations have been reported in sensory cortex (Mazzucato et al., 2015; Engel et al., 2016), consistent with noise-driven escape transitions (Doiron and Litwin-Kumar, 2014; Huang and Doiron, 2017). And several reports are consistent with external input modulating cortical activity mostly indirectly, via the rate of state transitions (Fiser et al., 2004; Churchland et al., 2010; Mazzucato et al., 2015; Engel et al., 2016; Mazzucato et al., 2019).
The proposed mechanism assumes bistable variables with noise-driven escape transitions, with transition rates modulated exponentially by external synaptic drive. Following previous work (Cao et al., 2016), we show this to be an accurate reduction of the population dynamics of metastable networks of spiking neurons.
Unfortunately, the spatial structure of the ‘shared variability’ or ‘noise correlations’ in cortical activity described above is poorly understood. However, we estimate that the cortical representation of our rivaling display involves approximately and of cortical surface in cortical areas V1 and V4, respectively (Winawer and Witthoft, 2015; Winawer and Benson, 2021). Accordingly, in each of these two cortical areas, the neural representation of rivaling stimulation can comfortably accommodate several thousand recurrent local assemblies, each capable of expressing independent collective dynamics (i.e., ‘classic columns’ comprising several ‘minicolumns’ with distinct stimulus selectivity Nieuwenhuys R, 1994, Kaas, 2012). Thus, our model assumes that the representation of two rivaling images engages approximately 1–2% of the available number of recurrent local assemblies.
Neurophysiological correlates of BR
Neurophysiological correlates of BR have been studied extensively, often by comparing reversals of phenomenal appearance during binocular stimulation with physical alternation (PA) of monocular stimulation (e.g., Leopold and Logothetis, 1996; Scheinberg and Logothetis, 1997; Logothetis, 1998; Wilke et al., 2006; Aura et al., 2008; Keliris et al., 2010; Panagiotaropoulos et al., 2012; Bahmani et al., 2014; Xu et al., 2016; Kapoor et al., 2020; Dwarakanath et al., 2020). At higher cortical levels, such as inferior temporal cortex (Scheinberg and Logothetis, 1997) or prefrontal cortex (Panagiotaropoulos et al., 2012; Kapoor et al., 2020; Dwarakanath et al., 2020), BR and PA elicit broadly comparable neurophysiological responses that mirror perceptual appearance. Specifically, activity crosses its average level at the time of each reversal, roughly in phase with perceptual appearance (Scheinberg and Logothetis, 1997; Kapoor et al., 2020). In primary visual cortex (area V1), where many neurons are dominated by input from one eye, neurophysiological correlates of BR and PA diverge in an interesting way: whereas modulation of spiking activity is weaker during BR than PA (Leopold and Logothetis, 1996; Logothetis, 1998; Wilke et al., 2006; Aura et al., 2008; Keliris et al., 2010), measures thought to record dendritic inputs are modulated comparably under both conditions (Aura et al., 2008; Keliris et al., 2010; Bahmani et al., 2014; Yang et al., 2015; Xu et al., 2016). A stronger divergence is observed at an intermediate cortical level (visual area V4), where neurons respond to both eyes. Whereas some units modulate their spiking activity comparably during BR and PA (i.e., increased activity when preferred stimulus becomes dominant), other units exhibit the opposite modulation during BR (i.e., reduced activity when preferred stimulus gains dominance) (Leopold and Logothetis, 1996; Logothetis, 1998; Wilke et al., 2006). Importantly, at this intermediate cortical level, activity crosses its average level well before and after each reversal (Leopold and Logothetis, 1996; Logothetis, 1998), roughly in quarter phase with perceptual appearance.
Some of these neurophysiological observations are directly interpretable in terms of the model proposed here. Specifically, activity modulation at higher cortical levels (inferotemporal cortex, prefrontal cortex) could correspond to ‘decision activity,’ predicted to vary in phase with perceptual appearance. Similarly, activity modulation at intermediate cortical levels (area V4) could correspond to ‘evidence activity,’ which is predicted to vary in quarter phase with perceptual appearance. This identification would also be consistent with the neurophysiological evidence for attractor dynamics in columns of area V4 (Engel et al., 2016). The subpopulation of area V4 with opposite modulation could mediate feedback suppression from decision levels. If so, our model would predict this subpopulation to vary in counterphase with perceptual appearance. Finally, the fascinating interactions observed within primary visual cortex (area V1) are well beyond the scope of our simple model. Presumably, a ‘stacked’ model with two successive levels of competitive interactions at monocular and binocular levels or representation (Wilson, 2003; Li et al., 2017) would be required to account for these phenomena.
Conclusion
As multistable phenomena and their characteristics are ubiquitous in visual, auditory, and tactile perception, the mechanism we propose may form a general part of sensory processing. It bridges neural, perceptual, and normative levels of description and potentially offers a ‘comprehensive task-performing model’ (Kriegeskorte and Douglas, 2018) for sensory decision-making.
Materials and methods
Psychophysics
Request a detailed protocolSix practiced observers participated in the experiment (four males, two females). Informed consent, and consent to publish, was obtained from all observers, and ethical approval Z22/16 was obtained from the Ethics Commission of the Faculty of Medicine of the Otto-von-Guericke University, Magdeburg. Stimuli were displayed on an LCD screen (EIZO ColorEdge CG303W, resolution pixels, viewing distance was 104 cm, single pixel subtended , refresh rate 60 Hz) and were viewed through a mirror stereoscope, with viewing position being stabilized by chin and head rests. Display luminance was gamma-corrected and average luminance was .
Two grayscale circular orthogonally oriented gratings ( and ) were presented foveally to each eye. Gratings had diameter of , spatial period . To avoid a sharp outer edge, grating contrast was modulated with Gaussian envelope (inner radius , ). Tilt and phase of gratings was randomized for each block. Five contrast levels were used: 6.25, 12.5, 25, 50, and 100%. Contrast of each grating was systematically manipulated, so that each contrast pair was presented in two blocks (50 blocks in total). Blocks were long and separated by a compulsory 1 min break. Observers reported on the tilt of the visible grating by continuously pressing one of two arrow keys. They were instructed to press only during exclusive visibility of one of the gratings, so that mixed percepts were indicated by neither key being pressed (25% of total presentation time). To facilitate binocular fusion, gratings were surrounded by a dichoptically presented square frame (outer size 9.8°, inner size 2.8°).
Dominance periods of ‘clear visibility’ were extracted in sequence from the final of each block and the mean linear trend was subtracted from all values. Values from the initial were discarded. To make comparable the dominance periods of different observers, values were rescaled by the ratio of the all-condition-all-observer average () and the all-condition average of each observer (). Finally, dominance periods from symmetric conditions with were combined into a single category , where () was the contrast viewed by the dominant (suppressed) eye. The number of observed dominance periods ranged from 900 to 1700 per contrast combination ().
For the dominance periods observed in each condition, first, second, and third central moments were computed, as well as coefficient of variation and skewness relative to coefficient of variation:
The expected standard error of the mean for distribution moments is 2% for the mean, 3% for the coefficient of variation, and 12% for skewness relative to coefficient of variation, assuming 1000 gamma-distributed samples.
Coefficients of sequential correlations were computed from pairs of periods with opposite dominance (first and next: ‘lag’ ), pairs of periods with same dominance (first and next but one: ‘lag’ ), and so on,
where and are mean duration and mean square duration, respectively. The expected standard deviation of the coefficient of correlation is 0.03, assuming 1000 gamma-distributed samples.
To analyze ‘burstiness,’ we adapted a statistical measure used in neurophysiology (Compte et al., 2003). First, sequences of dominance periods were divided into all possible subsets of successive periods and mean durations computed for each subset. Second, heterogeneity was assessed by computing, for each size , the coefficient of variation cV over mean durations, compared to the mean and variance of the corresponding coefficient of variation for randomly shuffled sequences of dominance periods. Specifically, a ‘burstiness index’ was defined for each subset size as.
where is the coefficient of variation over subsets of size and where and are, respectively, mean and mean square of the coefficients of variation from shuffled sequences.
Model
Request a detailed protocolThe proposed mechanism for BR dynamics relies on discretely stochastic processes (‘birth-death’ or generalized Ehrenfest processes). Bistable variables transition between active and inactive states with time-varying Poisson rates (activation) and (inactivation). Two ‘evidence pools’ of such variables, and , represent two kinds visual evidence (e.g., for two visual orientations), whereas two ‘decision pools,’ and , represent alternative perceptual hypotheses (e.g., two grating patterns) (see also Appendix 1—figure 1). Thus, instantaneous dynamical state is represented by four active counts or, equivalently, by four active fractions .
The development of pool activity over time is described by a master equation for probability of the number active variables.
For constant , the distribution is binomial at all times Karlin and McGregor, 1965, van Kampen, 1981. The time development of the number of active units in pool is an inhomogeneous Ehrenfest process and corresponds to the count of activations, minus the count of deactivations,
where is a discrete random variable drawn from a binomial distribution with trial number and success probability .
All variables of a pool have identical transition rates, which depend exponentially on the ‘potential difference’ between states, with a input-dependent component and a baseline component :
where and are baseline rates and and baseline components. The input-dependent components of effective potentials are modulated linearly by synaptic couplings
Visual inputs are and , respectively, where
is a monotonically increasing, logarithmic function of image contrast, with parameter γ.
Degrees of freedom
Request a detailed protocolThe proposed mechanism has 11 independent parameters – 6 synaptic couplings, 2 baseline rates, 2 baseline potentials, 1 contrast nonlinearity – which were fitted to experimental observations. A 12th parameter – pool size – remained fixed.
Symbol | Description | Value |
---|---|---|
N | Pool size | 25 |
1/ve | Baseline rate, evidence | 1.95 ± 0.10 s |
1/vr | Baseline rate, decision | 0.018 ± 0.010 s |
Baseline potential, evidence | -1.65 ± 0.24 | |
Baseline potential, decision | -4.94 ± 0.67 | |
wvis | Visual input coupling | 1.780 ± 0.092 |
wexc | Feedforward excitation | 152.2 ± 3.7 |
winh | Feedforward inhibition | 32.10 ± 2.3 |
wcomp | Lateral competition | 33.4 ± 1.2 |
wcoop | Lateral cooperation | 15.21± 0.59 |
wsupp | Feedback suppression | 2.34 ± 0.14 |
γ | Contrast nonlinearity | 0.071 ± 0.011 |
Fitting procedure
Request a detailed protocolThe experimental dataset consisted of two 5 × 5 arrays for mean and coefficient of variation , plus two scalar values for skewness and correlation coefficient . The two scalar values corresponded to the (rounded) average values observed over the 5 × 5 combinations of image contrast. In other words, the fitting procedure prescribed contrast dependencies for the first two distribution moments, but not for correlation coefficients.
The fit error was computed as a weighted sum of relative errors
with weighting emphasizing distribution moments.
Approximately 400 minimization runs were performed, starting from random initial configurations of model parameters. For the optimal parameter set, the resulting fit error for the mean observer dataset was approximately 13%. More specifically, the fit errors for mean dominance , coefficient of variation , relative skewness , and correlation coefficients and were 9.8, 7.9, 8.7, 70, and 46%, respectively. Here, fit errors for relative skewness and correlation coefficients were computed for the isocontrast conditions, where experimental observations were least noisy.
To confirm that resulting fit was indeed optimal and could not be further improved, we studied the behavior of the fit error in the vicinity of the optimal parameter set. For each parameter , 30 values were picked in the direct vicinity of the optimal parameter (Appendix 1—figure 9). The resulting scatter plot of value pairs and fit error was approximated by a quadratic function, which provided 95% confidence intervals for . For all parameters except , the estimated quadratic function was convex and the coefficient of the Hessian matrix associated with the fit error was positive. Additionally, the estimated extremum of each parabola was close to the corresponding optimal parameter, confirming that the parameter set was indeed optimal (Appendix 1—figure 9).
To minimize fit error, we repeated a stochastic gradient descent from randomly chosen initial parameter. Interestingly, the ensemble of suboptimal solutions found by this procedure populated a low-dimensional manifold of the parameter space in three principal components accounted for 95% of the positional variance. Thus, models that reproduce experimental observations with varying degrees of freedom exhibit only 3–4 effective degrees of freedom. We surmise that this is due, on the one hand, to the severe constraints imposed by our model architecture (e.g., discrete elements, exponential input dependence of transition rates) and, on the other hand, by the requirement that the dynamical operating regime behaves as a relaxation oscillator.
In support of this interpretation, we note that our 5 × 5 experimental measurements of and were accurately described by ‘quadric surfaces’ () with six coefficients each. Together with the two further measurements of and , our experimental observations accordingly exhibited approximately effective degrees of freedom. This number was sufficient to constrain the 3–4 dimensional manifold of parameters, where the model operated as a relaxation oscillator with a particular dynamics, specifically, a slow-fast dynamics associated, respectively, with the accumulation and reversal phases of BR.
Alternative model
Request a detailed protocolAs an alternative model (Laing and Chow, 2002), a combination of competition, adaptation, and image-contrast-dependent noise was fitted to reproduce four 5 × 5 arrays for mean , coefficient of variation , skewness , and correlation coefficient . Fit error was computed as the average of relative errors
For purposes of comparison, a weighted fit error with weighting was computed, as well.
The model comprised four state variables and independent colored noise:
where is a nonlinear activation function and is white noise.
Additionally, both input and noise amplitude were assumed to depend nonlinearly on image contrast :
This coupling between input and noise amplitude served stabilizes the shape of dominance distributions over different image contrasts (‘scaling property’).
Parameters for competition β = 10, activity time constant , noise time constant , and activation function were fixed. Parameters for adaptation strength , adaptation time constant , contrast dependence of input , , and contrast dependence of noise amplitude , were explored within the ranges indicated.
The best fit (determined with a genetic algorithm) was as follows: , , , , , . The fit errors for mean dominance , coefficient of variation , skewness , and correlation coefficient were, respectively, 11.3, 8.3, 20, and 55%. The fit error for correlation coefficient was 180% (because the model predicted negative values). The combined average for , , and was 13.2%. The fit error obtained with weighting was 16.4%.
For Figure 6d, the alternative model was fitted only to observations at equal image contrast, : mean dominance , coefficient of variation , skewness , and correlation coefficient . The combined average fit error for , , and was 11.2%. The combined average for all four observables was 22%.
Spiking network simulation
Request a detailed protocolTo illustrate a possible neural realization of ‘local attractors,’ we simulated a competitive network with eight identical assemblies of excitatory and inhibitory neurons, which collectively expresses a spontaneous and metastable dynamics (Mattia et al., 2013). One assembly (denoted as ‘foreground’) comprised 150 excitatory leaky-integrate-and-fire neurons, which were weakly coupled to the 1050 excitatory neurons of the other assemblies (denoted as ‘background’), as well as 300 inhibitory neurons. Note that background assemblies are not strictly necessary and are included only for the sake of verisimilitude. The connection probability between any two neurons was . Excitatory synaptic efficacy between neurons in the same assembly and in two different assemblies was and , respectively. Inhibitory synaptic efficacy was , and the efficacy of excitatory synapses onto inhibitory neurons was . Finally, ‘foreground’ neurons, ‘background neurons,’ and ‘inhibitory neurons’ each received independent Poisson spike trains of , and , respectively. Other settings were as in Mattia et al., 2013. As a result of these settings, ‘foreground’ activity transitioned spontaneously between an ‘off’ state of approximately and an ‘on’ state of approximately .
Appendix 1
Model schematics
Metastable attractor dynamics
We postulate assemblies or clusters of neurons with recurrent random connectivity as operative units of sensory representations. In our model, such assemblies are reduced to binary variables with Poisson transitions. Our key assumption is that the rates of activation and inactivation events are modulated exponentially by synaptic input (Equation 1):
Here, we show that these assumptions are a plausible reduction of recurrently connected assemblies of spiking neurons.
Following earlier work, we simulated a competitive network with eight identical assemblies of excitatory and inhibitory neurons (Appendix 1—figure 2a), configured to collectively express a metastable activity dynamics (Mattia et al., 2013). Here, we are interested particularly in the activity dynamics of one excitatory assembly (dubbed ‘foreground’), which expresses two quasi-stable ‘attractor’ states: an ‘on’ state with high activity. In the context of the metastable network, the ‘foreground’ assembly is bistable in that it transitions spontaneously between ‘on’ and ‘off’ states. Such state transitions are noise-driven escape events from an energy well and therefore occur with Poisson-like rates (activation) and (inactivation). Figure 1b and Appendix 1—figure 2b illustrate this energy landscape for the ‘diffusion limit’ of very large assemblies, where quasi-stable activity levels are for the ‘on’ state and for the ‘off’ state. For small assemblies with fewer neurons, the difference between ‘on’ and ‘off’ states is less pronounced.
To establish the dependence of transition rates on external input to the ‘foreground’ assembly, we stepped external input rate between two values selected from a range and monitored the resulting spiking activity in individual neurons, as well as activity of the entire population (Appendix 1—figure 2c, upper and middle panels). Comparing population activity to a suitable threshold, we identified ‘on’ and ’off’ states of the ‘foreground’ assembly (Appendix 1—figure 2c, lower panel), as well as the probability of ‘on’ or ‘off’ states at different points in time following a step in (Appendix 1—figure 2d). From the hazard rate (temporal derivative of probability), we then estimated the rates of state transitions shown in Appendix 1—figure 2d. Transition rates vary approximately anti-symmetrically and exponentially with external input . In the present example, and (Appendix 1—figure 2e, red and blue lines). This Arrhenius–Van’t-Hoff-like dependence of escape rates is a consequence of the approximately linear dependence of activation energy on external input. Escape kinetics is typical for attractor systems and motivates Equation 1.
Quality of representation
Accumulation of information
A birth-death process – defined as bistable variables with transition rates , where is a baseline rate and a coupling constant – accumulates and retains information about input , performing as a ‘leaky integrator’ with a characteristic time scale [Braun and Mattia, 2010]. Specifically, the value of may be inferred from fractional activity at time , if coupling and baseline rate are known. The inverse variance of the maximum likelihood estimate is given by the Fisher information
Its value grows with time, approaching for . For small inputs , the Fisher information increases monotonically as . Surprisingly, the upper bound of depends linearly on pool size , but quadratically on coupling . Thus, stronger coupling substantially improves encoding accuracy (of input ).
The rate at which Fisher information is accumulated by a pool is set by the baseline transition rate . An initially inactive pool, with n0 = 0, accumulates Fisher information at an initial rate of . Thus, any desired rate of gaining Fisher information may be obtained by choosing an appropriate value for . However, unavoidably, after an input has ceased (and was replaced by another), information about is lost at the same rate.
Integration of noisy samples
Birth-death processes are able to encode also noisy sensory inputs, capturing much of the information provided. When an initially inactive pool receives an input over time , stochastic activity gradually accumulates information about the value of . Normally distributed inputs provide Fisher information about mean µ. Pool activity accumulates Fisher information about input mean µ, which may be compared to . Comparatively small pools with strong coupling (e.g., , ) readily capture 90% of the information provided (Appendix 1—figure 5a).
Moreover, pools readily permit information from multiple independent inputs to be combined over space and/or time. For example, the combined activity of four pools (, ), which receive concurrently four independent samples, captures approximately 80% of the information provided, and a single pool receiving four samples in succession still retains approximately 60% of the information provided (Appendix 1—figure 5b,c). In the latter case, retention is compromised by the ‘leaky’ nature of stochastic integration. Whether signals are being integrated over space or time, the retained fraction of information is highest for inputs of moderate and larger variance (Appendix 1—figure 5b,c). This is because inputs with smaller variance are degraded more severely by the internal noise of a birth-death process (i.e., stochastic activations and inactivations).
Suitability for inference
Summation of heterogeneous neural responses can be equivalent to Bayesian integration of sensory information [Beck et al., 2008; Pouget et al., 2013]. In general, this is the case when response variability is ‘Poisson-like’ and response tuning differs only multiplicatively [Ma et al., 2006; Ma et al., 2008]. We now show that bistable stochastic variables , with heterogeneous transition rates , satisfy these conditions as long as synaptic coupling is uniform.
Assuming initially inactive variables, , incremental responses after a short interval are binomially distributed about mean , which is approximately
where reflects (possibly heterogeneous) response tuning and represents a common response function which depends only on synaptic coupling . The Fisher information, about , of individual responses is
as long as expected activation is small. The Fisher information of summed responses is
and equals the combined Fisher information of individual responses. Accordingly, the summation of bistable activities with heterogeneous transition rates optimally integrates information, provided expected activations remain small, 1, and synaptic coupling is uniform.
Categorical choice
The ‘biased competition’ circuit proposed here expresses a categorical decision by either raising towards unity (and lowering towards zero) or vice versa. Here, we describe its stochastic steady-state response to constant visual inputs and and for arbitrary initial conditions of , , and (Appendix 1—figure 3). Note that, for purposes of this analysis, evidence activity , was not subject to feedback suppression.
The choice is random when the input is ambiguous, , but quickly becomes deterministic with growing input bias . Importantly, the choice is consistently determined by visual input for all initial conditions. The 75% performance level is reached for biases .
Mutual inhibition controls the width of the ambiguous region around , and self-excitation ensures a categorical decision even for small . The balance between feedforward excitation and inhibition eliminates decision failures for all but the largest values of and reduces the degree to which sensitivity to differential input varies with total input .
For particularly high values of input , no categorical decision is reached and activities of both and grow above 0.5. In the full model, such inconclusive outcomes are eliminated by feedback suppression.
Deterministic dynamics
In the deterministic limit of , fractional pool activity equals its expectation and the relaxation dynamics of Equation 2 becomes
with characteristic time and asymptotic values , where is the potential difference. Input dependencies of characteristic time and of asymptotic value follow from Equation 1:
Evidence pools
The relaxation dynamics of evidence pools is given by Equation 2 and Equation 3′. As shown in the next section, reversals occur when evidence difference reaches a reversal threshold . For example, a dominance period of evidence begins with and ends when the concurrent habituation of and recovery of have inverted the situation to (Appendix 1—figures 4). Once the deterministic limit has settled into a limit cycle, all dominance periods start from, and end at, the same evidence levels.
If pool has just become dominant, so that and , the state-dependent potential differences are
and the deterministic development is
with asymptotic values
and characteristic times
The starting points of the development, and (dashed lines in Appendix 1—figure 4a,b), depend mostly on total input and only little on input difference . Accordingly, for a given level of total input , the situation is governed by the distance between asymptotic evidence levels and by characteristic times , .
The dependence on input bias of effective potential , characteristic time , and asymptotic value is illustrated in Appendix 1—figure 4c–e. The potential range of relaxation is and , where reversal levels and can be obtained numerically.
Dominance durations depend more sensitively on the slower of the two concurrent processes as it sets the pace of the combined development. The initial rates and after a reversal of the two opponent relaxations
provide a convenient proxy for relative rate. As shown in app. Figure 4f, when stronger-input evidence dominates, recovery of weaker-input evidence (red up arrow on blue background) is slower than habituation of stronger-input evidence (blue down arrow on blue background). Conversely, when weaker-input evidence dominates, recovery of stronger-input evidence (blue up arrow on red background) is slower than habituation of weaker-input evidence (red down arrow on red background). In short, dominance durations always depend more sensitively on the recovery of the currently non-dominant evidence than on the habituation of the currently dominant evidence.
If the two evidence populations , have equal and opposite potential differences, , then they also have equal and opposite activation and inactivation rates (Equation 1)
and identical characteristic times (recovery of ) and (habituation of ). In this special case, the two processes may be combined and the development of evidence difference is
Starting from , we consider the first-passage-time of through . If a crossing is certain (i.e. when ), the first-passage-time writes
A similar hyperbolic dependence obtains also in all other cases. When the distance between asymptotic levels falls below the reversal threshold , dominance durations become infinite and reversals cease.
The hyperbolic dependence of dominance durations, illustrated in Appendix 1—figure 4d, has an interesting implication. Consider the point of equidominance, at which both dominance durations are equal and of moderate duration. Increasing the difference between image contrasts (e.g., increasing and decreasing ) increases during the dominance of and decreases it during the dominance of . Due to the hyperbolic dependence, longer dominance periods lengthen more () than shorter dominance periods shorten (), consistent with the contemporary formulation of Levelt III [Brascamp et al., 2015].
Decision pools
We wish to analyze steady-state conditions for decision pools , , as illustrated in Appendix 1—figure 4a,b. From Equation 4, we can write
Under certain conditions – in particular, for sufficient self-coupling – the steady-state equations admit more than one solution: a low-activity fixed point with , and a high-activity fixed point with . Importantly, the low-activity fixed point can be destabilized when evidence activities change.
Consider a non-dominant decision pool with fractional activity and its dominant rival pool with fractional activity . The steady-state condition then becomes
For certain values , the low-activity fixed point becomes unstable, causing a sudden upward activation of pool and eventually a perceptual reversal. We call the steady-state value of at the point of disappearance.
We can now define a threshold in terms of the value of evidence bias which ensures that :
We find that the threshold value decreases linearly with average evidence , so that higher evidence activity necessarily entails lower thresholds (dashed red line in Figure 5c).
For , we find , , and .
Potential landscape
In Figure 5b, we illustrate the steady-state condition in terms of an effective potential landscape . The functional form of this landscape was obtained by integrating ‘restoring force’ over activity :
Stochastic dynamics
Poisson-like variability
The discretely stochastic process has a continuously stochastic ‘diffusion limit,’ , for , with identical mean and variance . This diffusion limit is a Cox–Ingersoll process and its dynamical equation
where is white noise, reveals that its increments (and thus also the increments of the original discrete process) exhibit Poisson-like variability. Specifically, in the low-activity regime, , both mean and variance of increments approximate activation rate :
Gamma-distributed first-passage times
When the input to a pool of bistable variables undergoes a step change, the active fraction transitions stochastically between old and new steady states, and (set by old and new input values, respectively). The time that elapses until fractional activity crosses an intermediate ‘threshold’ level () is termed a ‘first-passage-time.’ In a low-threshold regime, birth-death processes exhibit a particular and highly unusual distribution of first-passage times.
Specifically, the distribution of first-passage-times assumes a characteristic, gamma-like shape for a wide range of value triplets (, , ) [Cao et al., 2014]: skewness takes a stereotypical value , the coefficient of variation remains constant (as long as the distance between and remains the same), whereas the distribution mean may assume widely different values. This gamma-like distribution shape is maintained even when shared input changes during the transition (e.g., when bistable variables are coupled to each other) [Cao et al., 2014].
Importantly, only a birth-death process (e.g., a pool of bistable variables) guarantees a gamma-like distribution of first-passage-times under different input conditions [25]. Many other discretely stochastic processes (e.g., Poisson process) and continuously stochastic processes (e.g., Wiener, Ornstein–Uhlenbeck, Cox–Ingersoll) produce inverse Gaussian distributions with . Models combining competition, adaptation, and noise can produce gamma-like distributions, but require different parameter values for every input condition (see Materials and methods: Alternative model).
Scaling property
In the present model, first-passage-times reflect the concurrent development of two opponent birth-death processes (pools of binary variables). Dominance periods begin with newly non-dominant evidence well below newly dominant evidence , , and end with the former well above the latter, (Appendix 1—figure 6a). The combination of two small pools with approximates a single large pool with . When image contrast changes, distribution shape remains nearly the same, with a coefficient of variation and a gamma-like skewness , even though mean of first-passage-times changes substantially (Appendix 1—figure 6b).
This ‘scaling property’ (preservation of distribution shape) is owed to the Poisson-like variability of birth-death processes (see above, Appendix 1—figure 6c). Poisson-like variability implies that accumulation rate and dispersion rate are proportional, . This proportionality ensures that activity at threshold disperses equally widely for different accumulation rates (i.e., for different input strengths), preserving the shape of first-passage-time distributions [Cao et al., 2016].
Characteristic times
As mentioned previously, the characteristic times of pools of bistable variables are not fixed but vary with input (Equation 2). In our model, the characteristic times of evidence activities lengthen with increasing input contrast and shorten with feedback suppression (Appendix 1—figure 7a). Characteristic times are reflected also in the temporal autocorrelation, which averages over periods of dominance and non-dominance alike. Autocorrelation times lengthen with increasing input contrast, both in absolute terms and relative to the average dominance duration (Appendix 1—figure 7b).
Importantly, the autocorrelation time of mean evidence activity is even longer, particularly for high input contrast (Appendix 1—figure 7c). The reason is that spontaneous fluctuations of are constrained not only by birth-death dynamics, but additionally by the reversal dynamics that keeps evidence activities and close together (i.e., within reversal threshold ). As a result, the characteristic timescale of spontaneous fluctuations of lengthens with input contrast. The amplitude of such fluctuations also grows with contrast (not shown).
The slow fluctuations of induce mirror-image fluctuations of reversal threshold and thus are responsible for the serial dependency of reversal sequences (see Deterministic dynamics: Decision pools).
Burstiness
The proposed mechanism predicts that reversal sequences include episodes with several successive short (or long) dominance periods. It further predicts that this inhomogeneity increases with image contrast. Such an inhomogeneity may be quantified in terms of a ‘burstiness index’ (BI), which compares the variability of the mean for sets of successive periods to the expected variability for randomly shuffled reversal sequences. In both model and experimental observations, this index rises far above chance (over broad range of ) for high image contrast (Appendix 1—figure 8). The degree of inhomogeneity expressed by the model at high image contrast is comparable to that observed experimentally, even though the model was neither designed nor fitted to reproduce non-stationary aspects of reversal dynamics. This correspondence between model and experimental observation compellingly corroborates the proposed mechanism.
Robustness of fit
The parameter values associated with the global minimum of the fit error define the model used throughout the article. As described in Materials and methods, we explored the vicinity of this parameter set by individually varying each parameter within a certain neighborhood. This allowed us to estimate 95% confidence intervals for each parameter value. The results are illustrated in Appendix 1—figure 9.
Note that optimal parameter values (red crosses) are consistently near extrema of the parabolic fits (green circles), indicating the robustness of the fit. Note further that instead of the parameter pair and , we show the related parameter pair and , which is defined through the relations and .
The code used to analyze optimization statistics is available in the folder ‘analyzeOptimizationStatistics’ of the Github repository provided with this article (https://github.com/mauriziomattia/2021.BistablePerceptionModel) copy archived at.
Data availability
Source data is provided for Figures 2 and 3. Source code for the binocular rivalry model is provided in a Github repository (https://github.com/mauriziomattia/2021.BistablePerceptionModel) copy archived at https://archive.softwareheritage.org/swh:1:rev:f70e9e45ddb64cef7fc9a3ea57f0b7a04dfc6729.
References
-
A hierarchical stochastic model for bistable perceptionPLOS Computational Biology 13:e1005856.https://doi.org/10.1371/journal.pcbi.1005856
-
A low-dimensional model of binocular rivalry using winnerless competitionPhysica D. Nonlinear Phenomena 239:529–538.https://doi.org/10.1016/j.physd.2009.06.018
-
Multistability in perceptionScientific American 225:63–71.https://doi.org/10.1038/scientificamerican1271-62
-
The proactive brain: memory for predictionsPhil. Trans. R. Soc. Lond. B 364:1235–1243.https://doi.org/10.1098/rstb.2008.0310
-
Stochastic properties of stabilized-image binocular rivalry alternationsJournal of Experimental Psychology 88:327–332.https://doi.org/10.1037/h0030877
-
Spatial zones of binocular rivalry in central and peripheral visionVisual Neuroscience 8:469–478.https://doi.org/10.1017/s0952523800004971
-
The time course of binocular rivalry reveals a fundamental role of noiseJournal of Vision 6:1244–1256.https://doi.org/10.1167/6.11.8
-
Neural field model of binocular rivalry wavesJournal of Computational Neuroscience 32:233–252.https://doi.org/10.1007/s10827-011-0351-y
-
Collective activity of many bistable assemblies reproduces characteristic dynamics of multistable perceptionThe Journal of Neuroscience 36:6957–6972.https://doi.org/10.1523/JNEUROSCI.4626-15.2016
-
Normalization as a canonical neural computNature Reviews. Neuroscience 13:51–62.https://doi.org/10.1038/nrn3136
-
Stimulus onset quenches neural variability: a widespread cortical phenomenonNature Neuroscience 13:369–378.https://doi.org/10.1038/nn.2501
-
Neural mechanisms for interacting with a world full of action choicesAnnual Review of Neuroscience 33:269–298.https://doi.org/10.1146/annurev.neuro.051508.135409
-
Multi-scale variability in neuronal competitionCommun. Biol 2:319–330.https://doi.org/10.1038/s42003-019-0555-7
-
Temporally irregular mnemonic persistent activity in prefrontal neurons of monkeys during a delayed response taskJournal of Neurophysiology 90:3441–3454.https://doi.org/10.1152/jn.00949.2002
-
Inferring cortical variability from local field potentialsThe Journal of Neuroscience 36:4121–4135.https://doi.org/10.1523/JNEUROSCI.2502-15.2016
-
Perceptual rivalry with vibrotactile stimuliAtten. Percept. & Psychophys 22:2278.https://doi.org/10.3758/s13414-021-02278-1
-
A hierarchical model of binocular rivalryNeural Computation 10:1119–1135.https://doi.org/10.1162/089976698300017377
-
Attention, short-term memory, and action selection: a unifying theoryProgress in Neurobiology 76:236–256.https://doi.org/10.1016/j.pneurobio.2005.08.004
-
Weber’s law in decision making: integrating behavioral data in humans with a neurophysiological modelThe Journal of Neuroscience 27:11192–11200.https://doi.org/10.1523/JNEUROSCI.1072-07.2007
-
Neural network mechanisms underlying stimulus driven variability reductionPLOS Computational Biology 8:e1002395.https://doi.org/10.1371/journal.pcbi.1002395
-
Microstimulation of visual cortex affects the speed of perceptual decisionsNature Neuroscience 6:891–898.https://doi.org/10.1038/nn1094
-
Perspective: Maximum caliber is a general variational principle for dynamical systemsThe Journal of Chemical Physics 148:010901.https://doi.org/10.1063/1.5012990
-
Balanced neural architecture and the idling brainFrontiers in Computational Neuroscience 8:56.https://doi.org/10.3389/fncom.2014.00056
-
Stochastic properties of binocular rivalry alternationsPerception & Psychophysics 2:432–436.https://doi.org/10.3758/BF03208783
-
The free-energy principle: a unified brain theory?Nature Reviews. Neuroscience 11:127–138.https://doi.org/10.1038/nrn2787
-
Neural substrate of dynamic Bayesian inference in the cerebral cortexNature Neuroscience 19:1682–1689.https://doi.org/10.1038/nn.4390
-
Bistable perception modeled as competing stochastic integrations at two levelsPLOS Computational Biology 5:e1000430.https://doi.org/10.1371/journal.pcbi.1000430
-
The neural basis of decision makingAnnual Review of Neuroscience 30:535–574.https://doi.org/10.1146/annurev.neuro.29.051605.113038
-
Visual decision-making in an uncertain and dynamic worldAnnual Review of Vision Science 3:227–250.https://doi.org/10.1146/annurev-vision-111815-114511
-
Perceptions as hypothesesPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 290:181–197.https://doi.org/10.1098/rstb.1980.0090
-
Reaction-rate theory: fifty years after kramersReviews of Modern Physics 62:251–341.https://doi.org/10.1103/RevModPhys.62.251
-
Once upon a (slow) time in the land of recurrent neuronal networks…Current Opinion in Neurobiology 46:31–38.https://doi.org/10.1016/j.conb.2017.07.003
-
Size matters: A study of binocular rivalry dynamicsJournal of Vision 9:17.https://doi.org/10.1167/9.1.17
-
Ehrenfest Urn ModelsJournal of Applied Probability 2:352–376.https://doi.org/10.1017/S0021900200108708
-
The role of the primary visual cortex in perceptual suppression of salient visual stimuliThe Journal of Neuroscience 30:12353–12365.https://doi.org/10.1523/JNEUROSCI.0677-10.2010
-
Stochastic resonance in binocular rivalryVision Research 46:392–406.https://doi.org/10.1016/j.visres.2005.08.009
-
Shifts in selective visual attention: Towards the underlying neural circuitryHuman Neurobiology 4:219–227.
-
Cognitive computational neuroscienceNature Neuroscience 21:1148–1160.https://doi.org/10.1038/s41593-018-0210-5
-
Coding perceptual decisions: From single units to emergent signaling properties in cortical circuitsAnnual Review of Vision Science 6:387–409.https://doi.org/10.1146/annurev-vision-030320-041223
-
Cortical computations via metastable activityCurrent Opinion in Neurobiology 58:37–45.https://doi.org/10.1016/j.conb.2019.06.007
-
A spiking neuron model for binocular rivalryJournal of Computational Neuroscience 12:39–53.https://doi.org/10.1023/a:1014942129705
-
Hierarchical Bayesian inference in the visual cortexJournal of the Optical Society of America. A, Optics and Image Science 20:1434–1448.https://doi.org/10.1364/JOSAA.20.001434
-
An astable multivibrator model of binocular rivalryPerception 17:215–228.https://doi.org/10.1068/p170215
-
BookBrain Mechanisms of Visual Awareness Using Perceptual Ambiguity to Investigate the Neural Basis of Image Segmentation and Grouping. Ph.D. ThesisHouston, Texas: Baylor College of Medicine.
-
Multistable phenomena: changing views in perceptionTrends Cogn. Sci 3:254–264.https://doi.org/10.1016/s1364-6613(99)01332-7
-
Stable perception of visually ambiguous patternsNature Neuroscience 5:605–609.https://doi.org/10.1038/nn0602-851
-
A functional theory of bistable perception based on dynamical circular inferencePLOS Computational Biology 16:e1008480.https://doi.org/10.1371/journal.pcbi.1008480
-
Slow dynamics and high variability in balanced cortical networks with clustered connectionsNature Neuroscience 15:1498–1505.https://doi.org/10.1038/nn.3220
-
Single units and conscious visionPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 353:1801–1818.https://doi.org/10.1098/rstb.1998.0333
-
BookResponse Times: Their Role in Inferring Elementary Mental OrganizationNew York: Oxford University Press.
-
Bayesian inference with probabilistic population codesNature Neuroscience 9:1432–1438.https://doi.org/10.1038/nn1790
-
Spiking networks for Bayesian inference and choiceCurrent Opinion in Neurobiology 18:217–222.https://doi.org/10.1016/j.conb.2008.07.004
-
Perception of temporally interleaved ambiguous patternsCurrent Biology 13:1076–1085.https://doi.org/10.1016/s0960-9822(03)00414-7
-
Heterogeneous attractor cell assemblies for motor planning in premotor cortexThe Journal of Neuroscience 33:11155–11168.https://doi.org/10.1523/JNEUROSCI.4664-12.2013
-
Dynamics of multistable states during ongoing and evoked cortical activityThe Journal of Neuroscience 35:8214–8231.https://doi.org/10.1523/JNEUROSCI.4819-14.2015
-
Modelling the emergence and dynamics of perceptual organisation in auditory streamingPLOS Computational Biology 9:e1002925.https://doi.org/10.1371/journal.pcbi.1002925
-
Itinerancy between attractor states in neural systemsCurrent Opinion in Neurobiology 40:14–22.https://doi.org/10.1016/j.conb.2016.05.005
-
Scene construction, visual foraging, and active inferenceFrontiers in Computational Neuroscience 10:1–16.https://doi.org/10.3389/fncom.2016.00056
-
Noise-induced alternations in an attractor network model of perceptual bistabilityJournal of Neurophysiology 98:1125–1139.https://doi.org/10.1152/jn.00116.2007
-
LXI. Observations on some remarkable optical phænomena seen in Switzerland; and on an optical phænomenon which occurs on viewing a figure of a crystal or geometrical solidThe London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 1:329–337.https://doi.org/10.1080/14786443208647909
-
Buildup and bistability in auditory streaming as an evidence accumulation process with saturationPLOS Computational Biology 16:e1008152.https://doi.org/10.1371/journal.pcbi.1008152
-
The neocortex. An overview of its evolutionary development, structural organization and synaptologyAnatomy and Embryology 190:307–337.https://doi.org/10.1007/BF00187291
-
The active construction of the visual worldNeuropsychologia 104:92–101.https://doi.org/10.1016/j.neuropsychologia.2017.08.003
-
Uncertainty, epistemics and active inferenceJournal of the Royal Society, Interface 14:1–10.https://doi.org/10.1098/rsif.2017.0376
-
Dynamic Causal Modelling of Active VisionThe Journal of Neuroscience 39:6265–6275.https://doi.org/10.1523/JNEUROSCI.2459-18.2019
-
Perceptual reversals need no prompting by attentionJournal of Vision 7:5.https://doi.org/10.1167/7.10.5
-
A short-term memory of multi-stable perceptionJournal of Vision 8:7.https://doi.org/10.1167/8.13.7
-
Sensory memory of structure-from-motion is shape-specificAttention, Perception, & Psychophysics 75:1215–1229.https://doi.org/10.3758/s13414-013-0471-8
-
Multi-stable perception balances stability and sensitivityFrontiers in Computational Neuroscience 7:17.https://doi.org/10.3389/fncom.2013.00017
-
Sensory memory of illusory depth in structure-from-motionAtten. Percept. & Psychophys 76:123–132.https://doi.org/10.3758/s13414-013-0557-3
-
Perception and the strongest sensory memory trace of multi-stable displays both form shortly after stimulus onsetAtten. Percept. & Psychophys 78:674–684.https://doi.org/10.3758/s13414-015-1004-4
-
BookProbabilistic Reasoning in Intelligent Systems: Networks of Plausible InferenceMorgan Kaufmann.
-
Hierarchical active inference: a theory of motivated controlTrends Cogn. Sci 22:294–306.https://doi.org/10.1016/j.tics.2018.01.009
-
Dynamics of cortical neuronal ensembles transit from decision making to storage for later reportThe Journal of Neuroscience 32:11956–11969.https://doi.org/10.1523/JNEUROSCI.6176-11.2012
-
Probabilistic brains: knowns and unknownsNature Neuroscience 16:1170–1178.https://doi.org/10.1038/nn.3495
-
Principles of maximum entropy and maximum caliber in statistical physicsRev. Modern Phys 85:1115–1141.https://doi.org/10.1103/RevModPhys.85.1115
-
Perceptual organization in multistable apparent motionPerception 14:135–143.https://doi.org/10.1068/p140135
-
A comparison of sequential sampling models for two-choice reaction timePsychological Review 111:333–367.https://doi.org/10.1037/0033-295X.111.2.333
-
The Normalization Model of AttentionJournal of Cognitive Neuroscience 61:168–185.https://doi.org/10.1016/j.neuron.2009.01.002
-
Decoding subjective decisions from orbitofrontal cortexNature Neuroscience 19:973–980.https://doi.org/10.1038/nn.4320
-
The spatial structure of correlated neuronal variabilityNature Neuroscience 20:107–114.https://doi.org/10.1038/nn.4433
-
BookFigure and groundIn: Beardslee DC, Wertheimer M, editors. Readings in Perception. Van Nostrand. pp. 194–203.
-
Mechanisms of visual attention in the human cortexAnnual Review of Neuroscience 23:315–341.https://doi.org/10.1146/annurev.neuro.23.1.315
-
Emergence of slow-switching assemblies in structured neuronal networksPLOS Computational Biology 11:e1004196.https://doi.org/10.1371/journal.pcbi.1004196
-
Theory and dynamics of derceptual bistabilityAdv. Neural Inf. Proc. Sys 19:1217–1224.https://doi.org/10.7551/mitpress/7503.003.0157
-
Multistability in perception: Binding sensory modalities, an overviewPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 367:896–905.https://doi.org/10.1098/rstb.2011.0254
-
Role of mutual inhibition in binocular rivalryJournal of Neurophysiology 106:2136–2150.https://doi.org/10.1152/jn.00228.2011
-
Neural Elements for Predictive CodingFrontiers in Psychology 7:1–21.https://doi.org/10.3389/fpsyg.2016.01792
-
Balance between noise and adaptation in competition models of perceptual bistabilityJournal of Computational Neuroscience 27:37–54.https://doi.org/10.1007/s10827-008-0125-3
-
Dynamics of random neural networks with bistable unitsPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 90:062710.https://doi.org/10.1103/PhysRevE.90.062710
-
Stochastic processes in reversing figure perceptionPercept. & Psychophys 16:9–27.https://doi.org/10.3758/BF03203243
-
Untersuchungen zur Lehre von der GestaltPsychologische Forschung 7:81–136.https://doi.org/10.1007/BF00410640
-
Toward an interpretation of dynamic neural activity in terms of chaotic dynamical systemsThe Behavioral and Brain Sciences 24:793–810.https://doi.org/10.1017/s0140525x01000097
-
Stochastic variations in sensory awareness are driven by noisy neuronal adaptation: evidence from serial correlations in perceptual bistabilityJournal of the Optical Society of America. A, Optics and Image Science 26:2612–2622.https://doi.org/10.1364/JOSAA.26.002612
-
BookStochastic Processes in Physics and ChemistryAmsterdam: North-Holland Physics Publishing.
-
Stochastic properties of binocular rivalry alternationsPerception & Psychophysics 18:467–473.https://doi.org/10.3758/BF03204122
-
Neural dynamics and circuit mechanisms of decision-makingCurrent Opinion in Neurobiology 22:1039–1046.https://doi.org/10.1016/j.conb.2012.08.006
-
a predictive coding account of bistable perception - a model-based fMRI studyPLOS Computational Biology 13:e1005536.https://doi.org/10.1371/journal.pcbi.1005536
-
Zeitschrift für Psychologie mit Zeitschrift für angewandte PsychologieExperimentelle Studien Über Das Sehen von Bewegung 61:161–165.
-
Contributions to the physiology of vision: On some remarkable, and hitherto unobserved, phenomena of binocular visionPhilosophical Transactions of the Royal Society A 128:371–394.
-
The neural site of binocular rivalry relative to the analysis of motion in the human visual systemThe Journal of Neuroscience 10:3880–3888.
-
Human V4 and ventral occipital retinotopic mapsVisual Neuroscience 32:E020.https://doi.org/10.1017/S0952523815000176
-
Multistability in auditory stream segregation: A predictive coding viewPhilosophical Transactions of the Royal Society B 367:1001–1012.https://doi.org/10.1098/rstb.2011.0359
-
Rivalry-Like Neural Activity in Primary Visual Cortex in Anesthetized MonkeysThe Journal of Neuroscience 36:3231–3242.https://doi.org/10.1523/JNEUROSCI.3660-15.2016
-
Long-range traveling waves of activity triggered by local dichoptic stimulation in V1 of behaving monkeysJournal of Neurophysiology 112:18–22.https://doi.org/10.1152/jn.00610.2013
-
Theoretical perspectives on active sensingCurrent Opinion in Behavioral Sciences 11:100–108.https://doi.org/10.1016/j.cobeha.2016.06.009
-
Vision as Bayesian inference: analysis by synthesis?Trends in Cognitive Sciences 10:301–308.https://doi.org/10.1016/j.tics.2006.05.002
Article and author information
Author details
Funding
European Commission (FP7-269459)
- Jochen Braun
Deutsche Forschungsgemeinschaft (BR 987/3-1)
- Jochen Braun
Deutsche Forschungsgemeinschaft (BR 987/4-1)
- Jochen Braun
H2020 European Research Council (45539)
- Maurizio Mattia
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Funding from EU FP7-269459 Coronet, DFG BR 987/3-1, DFG 987/4-1, and
EU Human Brain Project SGA3-945539.
The authors thank Andrew Parker and Maike S Braun for helpful comments.
Ethics
Human subjects: Six practised observers participated in the experiment (4 male, 2 female). Informed consent, and consent to publish, was obtained from all observers and ethical approval Z22/16 was obtained from the Ethics Commisson of the Faculty of Medicine of the Otto-von-Guericke University, Magdeburg.
Copyright
© 2021, Cao et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,608
- views
-
- 282
- downloads
-
- 24
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
Sour taste, which is elicited by low pH, may serve to help animals distinguish appetitive from potentially harmful food sources. In all species studied to date, the attractiveness of oral acids is contingent on concentration. Many carboxylic acids are attractive at ecologically relevant concentrations but become aversive beyond some maximal concentration. Recent work found that Drosophila ionotropic receptors IR25a and IR76b expressed by sweet-responsive gustatory receptor neurons (GRNs) in the labellum, a peripheral gustatory organ, mediate appetitive feeding behaviors toward dilute carboxylic acids. Here, we disclose the existence of pharyngeal sensors in Drosophila melanogaster that detect ingested carboxylic acids and are also involved in the appetitive responses to carboxylic acids. These pharyngeal sensors rely on IR51b, IR94a, and IR94h, together with IR25a and IR76b, to drive responses to carboxylic acids. We then demonstrate that optogenetic activation of either Ir94a+ or Ir94h+ GRNs promotes an appetitive feeding response, confirming their contributions to appetitive feeding behavior. Our discovery of internal pharyngeal sour taste receptors opens up new avenues for investigating the internal sensation of tastants in insects.
-
- Neuroscience
Time estimation is an essential prerequisite underlying various cognitive functions. Previous studies identified ‘sequential firing’ and ‘activity ramps’ as the primary neuron activity patterns in the medial frontal cortex (mPFC) that could convey information regarding time. However, the relationship between these patterns and the timing behavior has not been fully understood. In this study, we utilized in vivo calcium imaging of mPFC in rats performing a timing task. We observed cells that showed selective activation at trial start, end, or during the timing interval. By aligning long-term time-lapse datasets, we discovered that sequential patterns of time coding were stable over weeks, while cells coding for trial start or end showed constant dynamism. Furthermore, with a novel behavior design that allowed the animal to determine individual trial interval, we were able to demonstrate that real-time adjustment in the sequence procession speed closely tracked the trial-to-trial interval variations. And errors in the rats’ timing behavior can be primarily attributed to the premature ending of the time sequence. Together, our data suggest that sequential activity maybe a stable neural substrate that represents time under physiological conditions. Furthermore, our results imply the existence of a unique cell type in the mPFC that participates in the time-related sequences. Future characterization of this cell type could provide important insights in the neural mechanism of timing and related cognitive functions.