Gain neuromodulation mediates perceptual switches: evidence from pupillometry, fMRI, and RNN Modelling

Gabriel Wainstein; Christopher J. Whyte; Kaylena A. Ehgoetz Martens; Eli J. Müller; Brandon R. Munn; Vicente Medel; Britt Anderson; Elisabeth Stöttinger; James Danckert; James M. Shine

doi:10.7554/eLife.93191.1

eLife assessment

This valuable paper explores the idea that transient modulations of neural gain promote switches between distinct perceptual interpretations of ambiguous stimuli. Evidence for this idea is provided by pupillometry (an indirect proxy of neuromodulatory activity), fMRI, neural network modeling, and dynamical systems analyses. While this integrative approach is intriguing, the data analysis, as well as computational modeling, are incomplete, especially for supporting the many causal statements in the paper.

https://doi.org/10.7554/eLife.93191.1.sa2

Significance of findings

valuable: Findings that have theoretical or practical implications for a subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

incomplete: Main claims are only partially supported

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Perceptual updating has been proposed to rely upon evolving activity within a recurrent, distributed thalamocortical network whose interconnections are modulated by bursts of ascending neuromodulatory neurotransmitters, such as noradrenaline. To test this hypothesis mechanistically, we leveraged a combination of pupillometry, fMRI and recurrent neural network modelling of an ambiguous figures task. Qualitative shifts in the perceptual interpretation of an ambiguous image were associated with peaks in pupil diameter, an indirect readout of phasic bursts in neuromodulatory tone. We hypothesized that increases in neuromodulatory tone led to neural gain alteration so as to causally mediate perceptual switches. To test this hypothesis, we trained a recurrent neural network to perform an analogous perceptual categorisation task, and then manipulated the gain of the RNN to mimic the effect of neuromodulatory tone. As predicted, we observed an earlier perceptual shift as a function of heightened gain. Leveraging a low-dimensional readout of the RNN dynamics, we developed two novel predictions: perceptual switches should co-occur with peaks in low-dimensional brain state velocity and with flattened energy landscape dynamics. We used dimensionality-reduced summaries of whole-brain fMRI dynamics to independently confirm each of these predictions. These results support the role of the neuromodulatory system in the large-scale network reconfigurations that mediate abrupt changes in perception.

Introduction

The human visual system faces a fundamental challenge: the visual inputs that bombard the retina are often fundamentally ambiguous. The inherent ambiguity of perceptual input is overcome by incorporating prior information about the causal structure of the world into sensory inferences. Consider the problem of watching a blurry item fly across the sky (“Is it a bird? Is it a plane?”) – this sensory information is often occluded and noisy, yet somehow our nervous system can adaptatively and dynamically refresh its activity to mediate the accurate perception (“It’s Superman!”). In cases of bistable perception, such as the Necker cube illusion, the inputs themselves offer two equally valid, yet mutually exclusive interpretations that observers periodically switch between. In both cases, perceptual awareness of the images (outside the moments of switching) is remarkably stable, suggesting that the nervous system can rapidly (and flexibly) identify the best ‘match’ between the visual data and stable (likely pre-learned) stimulus categories. The mapping between transitions in brain-states and transitions in perception remains an open question.

fMRI studies investigating perceptual updating and switches in bistable perception have typically identified a distributed set of regions within the cerebral cortex, including the anterior cingulate, anterior insula, dorsolateral prefrontal and superior parietal cortex^1,2. These cortical regions are presumed to play an attentional role with changes in attention driving switches in perceptual contents by selectively boosting activity within the relevant circuits^3,4. This interpretation is complemented by behavioural evidence showing that attention plays a prominent role in determining the contents of perception in bistable perception tasks where competition is not resolved at low-levels of the visual hierarchy^5,6. Despite these clear links, the ability to flexibly respond to ambiguous visual inputs according to changing task demands is a feature that is present across phylogeny⁷ and hence is present in a wide variety of animals that have poorly developed cerebral cortices⁸. This suggests that more highly conserved features of the nervous system may play a role in driving task-relevant switches in the contents of perception.

The ascending neuromodulatory system, and specifically the noradrenergic locus coeruleus (LC), is well-suited to mediate the large-scale, brain state switches required to flexibly alter perceptual contents^9,10. While the cell body of the LC is located in the brainstem, the nucleus sends projections throughout the central nervous system, wherein its axons release noradrenaline, which in turn modulate the excitability of targeted regions¹¹. In previous work, it has been argued that phasic release of noradrenaline from the LC acts as a “network reset” signal, which effectively disrupts ongoing processing, and hence allows animals to reconfigure their ongoing neural dynamics towards more salient (and hopefully, behaviourally-relevant) processes^12–14. This mechanism is of critical importance in ecological contexts in which an animal needs to be able to both focus on the current task in an exploitative mode (such as foraging), while being able to rapidly modify its internal, attentional and behavioural state when required (e.g., if resources are depleted, or in the presence of a predator). Further, in the context of bistable perception, pupil diameter (a non-specific and indirect readout of LC activity^9,15) is tightly linked to switches in the content of perception^16,17. Based on previous work^{12–14,18,19}, and the fact that the LC is known to be interconnected with many of the regions implicated in whole-brain imaging studies of perceptual uncertainty^20–22, we hypothesized that task-related perceptual switches should be intricately related to phasic bursts of LC activity, which then act as a ‘network reset’¹⁴, flattening the whole-brain energy landscape²³ and thus allowing cortical dynamics to evolve into a new state thereby changing the contents of perception.

To test this hypothesis, we analysed a cognitive task designed to investigate switches in perceptual categorisation²⁴. We first demonstrate that pupil diameter – a non-specific, indirect readout of phasic noradrenergic activity^15,20,25,26 – peaks at the point of the perceptual switch. We then trained a recurrent neural network (RNN) to perform an analogous change detection task, after which we altered the gain of activation function of the model – a demonstrated mechanism for the action of noradrenaline on the cerebral cortex^11,27,28 – and showed that switch times occurred earlier as a function of heightened gain. We then used principal component analysis (PCA) and energy landscape analysis²³ in order to translate fMRI data collected during the task into a low dimensional read-out of the network dynamics. We found that: i) increases in gain are associated with increases in the velocity of the low dimensional trajectory around the switch point, and ii) the energy landscape derived from the dynamics of the RNN flattens as a function of gain. We then re-analysed existing BOLD data from the same perceptual task^1,24 to confirm the prediction of brain state velocity increase and energy landscape flattening²³ at the point of perceptual transitions. Overall, our results confirm our hypothesis that phasic bursts of neuromodulatory activity act as a “network reset”^12,14, which via the downstream impact of neuromodulatory neurotransmitters such as noradrenaline on brain network dynamics, leads to a switch in perceptual categorisation.

Results

Evoked pupil dilations coincide with the resolution of perceptual ambiguity

To assess the role of the ascending arousal activity during task performance, we analysed a dataset of 35 participants who performed the ambiguous figures task while simultaneous pupil diameter was recorded (SR Research, 1000 Hz). Briefly, the task consisted of a set of continuously transforming images that transition from an initial object (e.g., a shark) into a second object (e.g., a plane), while preserving basic psychophysical attributes, such as luminance (Fig. 1A). Crucially, even though the task stimuli change incrementally and linearly, with maximal ambiguity at the mid-point of each trial (the peak of the dotted line curve in Fig. 1A), conscious awareness of a change in stimuli is known to ‘pop out’, often at different times on each trial²⁴. When these perceptual switches occurred, subjects were instructed to change the button they were pressing, thus indicating a change in perceptual interpretation while controlling for motor preparation and execution across stimuli. Participants viewed 20 unique sets of images, each of which morphed from a starting image into a second image through 15 equally spaced intermittent stages (Fig. 1A). For each participant, we identified the first and last time they viewed a sequence of images, as well as the three images leading up to and following an identified perceptual switch, irrespective of the categories associated with each specific object switch. The rest of the analyses in this manuscript are organised around this perceptual transition.

Pupil diameter tracks perceptual change.
A) Upper: Trial example showing the continuous change from a stable image (plane) into a shark; Lower: the probability of detecting a switch (Δ) as a function of Image – most switches occur around the mid-point, but not exclusively so, leading to our prediction of heightened locus coeruleus activity at the switch point; B) Diagram of the locus coeruleus (red), its diffuse projections to the whole brain network and its link to pupil dilation. C) Pupil diameter group average evoked response time locked to the perceptual change (dark line, t = 0), significance is shown in the top grey bar (pFDR < 0.05), showing two images around the perceptual change (Δ) are different from the null model. The average of the first and last two images are shown in the left (right) section of the plot (dotted line). We observed an increase of the pupillary response that peaked after the perceptual change. D) Group average of evoked pupillary responses to image switches – red represents the faster response when the switch occurs at image 6; green indicates a medium response with the switch at image 8; and blue denotes the slowest response with the switch at image 10.

Given the known (yet admittedly non-specific) relationship between LC activity and the first-order dynamics of pupil diameter²⁹, we were able to use the diameter of the pupil as an indirect means for testing the hypothesis that ascending neuromodulatory tone is associated with perceptual switching. The linear nature of the morphing procedure meant that luminance levels (which could otherwise bias pupil diameter^9,15,20) were kept constant across all trials. Additional, behavioural output was controlled as subjects needed to press one button on each image (depending on the perceptual content). Mapping all blink-corrected, filtered and normalized trials over time, we noticed two clear and robust patterns. Firstly, we observed a clear increase in the phasic pupillary response coincident with the frame when participants noticed a change in perceptual content (maximum pupil diameter in Fig. 1C). These results can be seen clearly when the pupil diameter at the perceptual switch point is viewed in comparison to the remaining stimuli. Secondly, when analysing the times before and after the change point, there was a significant increase of the mean pupil response, beginning 3 images before the change point (mean β = -0.22; t₍₃₂₎ = -8.02, p = 2.3 × 10⁻¹⁹), before returning to baseline levels. Together, these results clearly couple pupil dilation to the point of perceptual switching.

In subsequent analyses, we sought to elucidate the relationship between ascending arousal, quantified by pupil diameter, and the temporal dynamics of perceptual shifts on a trial-by-trial basis. Given that the LC plays a pivotal role in modulating both sensory processing and the perceptual switches (Fig. 1B), we hypothesized that the speed of a perceptual switch should be dependent on LC activity. Specifically, we predicted that trials with earlier perceptual changes (faster perceptual switches) would be associated with an increase in pupil diameter (and v.v.). To test this prediction, we built a linear model wherein the peak pupil diameter observed during the perceptual switch was designated as the independent variable, and the trial on which the perceptual shift was reported functioned as the regressor. To control for potential confounds, such as impulsive premature responses and to compensate for diminished statistical power in extreme) response epochs (both early and late) due to reduced sample sizes, our analytical framework was confined to responses within a boundary of one standard deviation from the median (8 ± 2; 87.1% of total trials). As expected, we observed an inverse relationship between the evoked pupil diameter and the trial marking the perceptual switch (β = -0.115, R² = 0.02, p = 1.5 x 10⁻³). Earlier responses showed a positive relationship with higher evoked pupil diameter during the switch epoch (red in Fig. 1D), whereas later responses were associated with a more constricted pupil (blue in Fig. 1D). In summary, these results provide indirect evidence for our hypothesis that ascending neuromodulation – such as LC activity – is associated with the resolution of perceptual uncertainty.

Computational evidence for neuromodulatory-mediated perceptual resolution in a recurrent neural network

Our initial results provided evidence implicating the LC in perceptual switching, however there is also somewhat conflicting evidence showing that simply changing stimulus categories can also induce similar pupillary dilations^16,30,31, suggesting the need for a more mechanistic test of the network reset hypothesis. Along with others^32–35, we have used a combination of computational modelling^27,36, neurobiological theory¹¹ and multi-model neuroimaging^19,27,31,32 to suggest that noradrenaline alters neural gain^28,39, which in turn affects inter-regional communication and flattens the energy landscape traversed by network dynamics allowing the state of the network to jump between perceptual attractors more easily. Whether these signatures of large-scale network reconfiguration are mechanistically related to network reset remains and important open question.

To test whether our hypothesised neuromodulatory mechanism could recapitulate the behaviour we observed in the ambiguous figures task, we trained 50 continuous time recurrent neural networks (RNN) that we constrained to respect Dale’s law (i.e., 80/20 split of purely excitatory/inhibitory units^40,41) to perform a perceptual change detection task analogous to the task performed by our participants (Fig. 2A). We also constrained the input and readout weights to be exclusively positive and only allowed the firing rate of excitatory units to contribute to the readout (see Materials and Methods). Each network was provided with a two-dimensional input , with each row representing the “sensory evidence” for each of the two stimulus categories (Fig. 2A). The task lasted for 50 time-steps: to mimic the linear transition between image categories in our task, each trial began with maximum evidence for one of the two categories and minimum evidence for the other (e.g., u₁ = 1, u₂ = 0), and then changed linearly over the course of each trial such that by the final time-step the evidence for each category had switched (e.g., u₁ = 0, u₂ = 1). At each time point, the network was trained to output a categorical response indicating which input dimension had a higher value.

A recurrent neural network model of perceptual switching.
A) we trained a continuous time E/I recurrent neural network (RNN) to categorise linearly changing inputs representing two discrete categories (e.g., output #1 and output #2) into two categories; B) following training, the firing rate of the excitatory units was clearly separated into two stimulus selective clusters – those that responded maximally to input category #1 (dark red) and those that respond maximally to input category #2 (light red). Here colour represents the (normalised) value of the input weight connecting each excitatory unit to each dimension of the input; C) Firing rate of the inhibitory units showing weakly stimulus selective dynamics (dark blue represents category #1, light blue category # 2). D-F) Upper; Increasing neural gain (i.e., the slope of the sigmoidal activation function) caused an earlier switch (and *v.v*.), when the gain of both excitatory and inhibitory units was manipulated. Manipulating the gain of excitatory units in isolation did not lead to speeded responses, whereas manipulating the gain of inhibitory units in isolation produced a similar effect to manipulating the gain of excitatory and inhibitory units. The insets in D-F denote a cartoon simplification of the trained network, with highlights applied to the connections and the gain manipulations.

Following training, all networks achieved near perfect behavioural accuracy (0.99 ± 0.01 Because of the constraint that the input and output weights are strictly positive, we could use their (normalised) value as a measure of stimulus selectivity. Inspection of the firing rates sorted by input weights revealed that the networks had learned to complete the task by segregating the excitatory units into two stimulus-selective clusters, each of which was excited by a single dimension of the input stimulus, which in turn projected to independent dimensions of the network readout (i.e., for all networks input and readout weights were highly correlated; mean = 0.9849, std = .0076; Fig. 2B). The inhibitory units also showed strong stimulus selectivity in their input weights but because of recurrent projections with both excitatory populations their dynamics were only weakly stimulus-selective (Fig. 2C). Because network output was exclusively a function of the excitatory firing rate after training the switch in output from category 1 to category 2 occurred approximately when rate of one of the competing excitatory clusters exceeded the other at time point 25 (i.e., when u₂ > u₁; and v.v.). It is interesting to note that the computational architecture we arrived at by gradient based learning is strikingly similar to classic attractor network models of perceptual decision making^42,43 that have two stimulus selective excitatory populations with recurrent connections to a shared inhibitory pool (see insets of Fig. 2D-F).

With an operational model of the task in hand, we next sought to test our hypothesis about the role of neural gain in perceptual switches. In previous work, we (and others) have argued that the impact of neuromodulators (such as NA) on population-level activity can be effectively approximated by steepening (or flattening) the sigmoid activation function, thus mimicking the effect NA has on neuronal excitability by liberating intracellular calcium stores and/or opening (or closing) voltage-gated ions channels^11,44. To test this mechanism in our model, we manipulated the gain of the sigmoid activation function for both excitatory and inhibitory units by simulating an additional 500 trials in each network between gain values of 0.5 to 1.5 (in steps of 0.1). As predicted, increased gain (red; corresponding to heightened adrenergic tone) lead to earlier ‘perceptual switches’ in the network output whereas low gain caused later switches (Fig. 2E).

Inspection of the firing rates showed that the speeding effect of gain was mediated by a shift in the timepoint at which each cluster of stimulus-selective excitatory units was out competed by the opposing excitatory cluster (Fig. S2). To interrogate whether the earlier switching was caused by increased excitatory or increased inhibitory gain (or both), we re-ran the simulations with the gain manipulation applied exclusively to either the excitatory or inhibitory units in isolation (Fig. 2E-F). Intriguingly, manipulating excitatory gain in isolation slowed switching under reduced gain, but did not speed up switching under heightened gain (Fig. 2E). In contrast, manipulating inhibitory gain in isolation led to both slowed switches under reduced gain and faster switches under heightened gain (Fig. 2F) demonstrating that, in our model, the speeding effect of gain on perceptual switch time was largely mediated by enhanced inhibitory signalling.

Large-scale neural predictions of recurrent neural network model

Having confirmed the behavioural component of our gain modulation hypothesis in our model, we next sought to identify population-level neural signatures of noradrenergic tone. In particular, we sought to test our hypotheses that the speeding effect of gain on perceptual switches is mediated by a flattening of the energy landscape traversed by the network dynamics. Crucially, this shift in perspective from the firing rates of individual units to the dynamics of the RNN as a whole also allowed us to generate a series of predictions that we could later test in functional neuroimaging data.

In recent work^23,45, we have shown that peaks in blood flow within the LC precede large changes in brain state dynamics. Viewed through the lens of dynamical systems theory⁴⁶ in which the brain is treated as a dynamical system whose state space (i.e., an instantaneous snap-shot of the activity of all regions in the system) evolves over time according to the presence (or absence) of attractors, the effect of the LC can be conceptualised as akin to lowering the energy barrier required to escape an attractor defining the current state so as to reach a novel location in state-space²³ (see Fig. S3 for an intuitive example of the energy landscape analysis). In light of the task used in this experiment, stable perception of the first image in a sequence (e.g., the plane in Fig. 2A) is represented as a stable fixed-point attractor, to which all visual inputs somewhat similar to the plane would evolve, as if brain states were a ball drawn by gravity down into a valley. Note that, in this scenario, there also exists an attractor representing a shark to which the brain state could evolve, but the trajectory is blocked by an energy barrier (i.e., it is easier for a plane-like input to be interpreted as a plane). As the incoming image begins to look less like a plane, we hypothesize that this growing perceptual mismatch recruits increased neuromodulatory tone (likely mediated via the LC^47,48). The effect of this increased tone is to ‘flatten’ the energy barrier between the two attractors, and thus to make it more likely that the brain state could fall into the second attractor – i.e., to ‘switch’ the interpretation from a plane to a shark.

To test whether gain was playing an analogous role in the dynamics of the RNN, we leveraged the relationship between probability and energy in statistical mechanics to construct a measure of the energy landscape traversed by the dynamics of each network. Specifically, we calculated the mean-squared displacement of the activity of each RNN at each time point t₀ relative to a reference point t in the trial. We then used Gaussian kernel estimation to construct a distribution over MSD for each timepoint t, and then used this to estimate the energy^23,49 . Following the physical analogy, we conceptualise gain as akin to temperature – systems with a high temperature, have a higher average energy and a more uniform probability distribution. Correspondingly, low temperature systems, are associated with lower average energy and thus a more peaked probability distribution.

We ran this analysis over all trials at each gain value. To obtain a summary measure of the flatness of each landscape we integrated over time to visualise energy exclusively as a function of MSD. In line with our hypothesis the energy required for large movements in state space (i.e., large MSD values) decreased as a function of gain (Fig. 3B) similar to the manner in which increasing temperature increases average kinetic energy (i.e., particle motion). These results reinforce our previous work and clearly demonstrates that the implementation of neuromodulatory-mediated dynamics in the RNN acted in a similar fashion to previously observed patterns in resting-state fMRI²⁶. In addition, our results confirm that the putative impact of the release of noradrenaline from the locus coeruleus can change the manner in which brain states evolve over time by ‘turning up the temperature’ which ‘flattens’ the brains energy landscape, which in turn helps to facilitate the navigation of otherwise difficult state-to-state transitions²⁶.

Heightened gain mediates earlier perceptual switching through speeded low-dimensional dynamics and flattened attractor dynamics.
A) Area under each the curve for the energy landscapes shown in the three insets at high (red), medium (green), and low (blue) gain. B) Velocity of the first PC (derived from the RNN dynamics (inset) as a function of gain (low, medium and high) – the dotted line designates the switch point.

Having confirmed our hypothesis that the energy landscape flattens under increased gain, we turned from the energy-landscape traversed by the population dynamics to the populations dynamics themselves. Specifically, we performed a Principal Component Analysis (PCA) on the concatenated activity of the network at gain = 1. The set of PCs was low-dimensional, with 75.98 ± 5.79% of the variance explained by the first principal component (PC₁). Based on this information, we projected the trial averaged activity at each gain value and timepoint onto the first PC. The resultant low dimensional trajectories all showed a change in direction around the timepoint of the switch in network output from category 1 to category 2 (and v.v.; Fig. 3B inset), reminiscent of a system jumping between attractors which, occurred earlier (and later) as a function of heightened (and lowered) gain (Fig. 3C). This switch not only occurred sooner as a function of heightened gain (Fig. 3B inset), but it also occurred at a higher neural “speed” with the velocity of the trajectory peaking sharply at the point of the switch under high gain, whereas the transition between states was relatively gradual under low gain (Fig. 3B).

The low-dimensional signature of ambiguity resolution and perceptual change

Having confirmed our hypothesis about the speeding effect of gain in our RNN model we next sought to test the predictions in the human brain – i.e., whether the increase in neural speed and the flattening of the energy landscape observed in the RNN were also present in functional neuroimaging data. To this end, we re-analysed an existing BOLD dataset collected while participants performed a similar version of the ambiguous figures task to identify the low-dimensional patterns that occurs during the perceptual change.

We were, however, left with a dilemma: RNNs provide a proof-in-principle of how computations can be instantiated in networks, however there are key differences between artificial neural networks and the human brain that require careful consideration^50,51. While both RNNs and the brain are thought to compute through dynamics⁵², the human brain is comprised of highly specialised neural circuits that have been shaped over evolutionary time to perform a range of highly idiosyncratic functions that matter for adaptive behaviour⁵², but aren’t necessarily related to task-switching. So where in the brain should we look for the same low-dimensional signatures we observed in the RNN as a function of gain? Rather than select a particular region a priori, we instead opted for a data-driven approach – principal components analysis (PCA) – which summarizes regional timeseries concatenated across all subjects and trials into a set of low-dimensional patterns that can then be interrogated in a similar fashion to the activity of the RNN (see Methods for details). Consistent with previous work⁵³, a small number of principal components (PCs) mapped onto distributed regions across the brain (Fig. 4A) and explained a substantial proportion of the variance observed in the task (PC_1-3 explained 32% of the total variance).

Low-dimensional switch-related dynamics and connectivity.
A) spatial loadings of PC₁ (green), PC₂ (red) and PC₃ (blue); B) Mean absolute β loading (solid lines) and group standard error (shaded) of PC₁ (green), PC₂ (red) and PC₃ (blue), organized around the image switch point (Δ) – the dotted grey lines show the 95^th percentile of the null distribution of a block-resampling permutation; C) radar plot showing the partial correlations of PC₁ (green), PC₂ (red) and PC₃ (blue); D) Evoked Brain activity of PC₂ + PC₃ during the perceptual switch. E) Group averaged functional connectivity and module assignments using a Louvain analysis – three clusters were observed. F) Pearson’s correlation between the sum of PC₂ and PC₃ (per subject) and a joint-histogram comparing Integration (participation coefficient) and Segregation (module-degree Z-score); p < 0.05 following permutation testing.

To isolate the low-dimensional component that best reflected the task (Fig. 1A), we performed a principal component regression⁵⁴ that modelled the switch point of each trial using the loadings of the top 3 PCs calculated from fMRI data. PC₁ was not selectively aligned with switches, both PC₂ and PC₃ showed a pronounced, isolated peak around the switch point across trials (Fig 4B), with PC₂ showing the most robust task-related engagement. To ensure that these results could not be explained by the spatial autocorrelation inherent within the PC maps, we created a null distribution of regression coefficients calculated using the same statistical model but with block-resampling applied to the switch times in the design matrix. The dotted grey line in Fig. 3C denotes the 95^th percentile of the null distribution, and clearly shows that the engagement of both PC₂ and PC₃ during the switch point was greater than to be expected by chance. Furthermore, to validate that the perceptual switch was predominantly represented by PC₂ and PC₃ (Fig. 4D), we conducted a regression with these two PCs as predictors and the evoked activity derived from the original BOLD time-series as the dependent variable. The resulting variance accounted for was 88% (R² = 0.88, β = 0.99, p = 9.2x10^-178).

To determine whether PC₂ or PC₃ was a better index of perceptual switching, we then correlated the spatial loadings of PC₂ and PC₃ with the spatial map associated with the term “switching” from a meta-analysis performed on the neurosynth database⁵⁵. We observed a significant positive correlation between the map for “switching” and both PC₂ (r = 0.453, p = 3.041x10^-18), and PC₃ (r = 0.115, p = 0.037), however the correlation for PC₃ was much lower than PC₃, suggesting that PC₂ was a better match for “switching”. The spatial map of PC₂ was also positively correlated with other terms putatively associated with the ambiguous figures task (notably, “effort”, “load” and “attention”; all r > 0.2; and not with “episodic”, which was included as a negative control), a partial correlation analysis revealed that PC₂ was selectively associated with “switching” and “attention” (Fig. 4C). Given the multifaceted nature of the ambiguous figures task, the convergence between brain maps for “switching”, “attention”, and “effort” was to be expected and we therefore did not try to dissociate them in further analysis.

Before turning to the predictions of the RNN, we first sought to further validate the face validity of focusing on a limited number of principle components. In previous work, we have linked the impacts of NA on systems-level neural dynamics to alterations in network topology^23,56,57, with NA increasing large-scale network integration. Given that PCA naturally captures patterns of covariance between regions, we expected to see that the observed time signatures of PC engagement at the switch-point should coincide with similar measures of network integration. To test this hypothesis, we clustered the time-averaged functional connectivity matrix using a hierarchical modular decomposition approach (see Methods) – doing so revealed three main clusters (Fig. 4E). For each participant, we used this matrix and the three clusters to estimate the amount of integration (using the participation coefficient) and segregation (using the module-degree Z-score, see Methods) of each region. We then correlated a joint histogram of these measures with the sum of subject-specific regression coefficients for PC₂ and PC₃ and observed a robust correlation with integration (Fig. 4F; p < 0.05 following permutation testing). These results clearly demonstrate the highly convergent nature of PCA and our previous network-based approaches.

Confirmation of model predictions in whole-brain BOLD data

Based on the patterns observed in the RNN (i.e., those in Fig. 2-3), we hypothesized that the energy landscape topography would decrease, and the velocity of the low dimensional brain patterns would peak at the switch point. Given the prominent role in switching we focused our analysis on the PC₂ time series. To estimate the energy landscape, we first estimated the mean displacement of PC₂ by averaging the β value around the switch-point and then divided this term by the logarithm of the inverse probability of the loading of PC₂, which was also inferred from the GLM. Using this approach, we observed that PC₂ was maximally displaced at the perceptual change (Fig. S4C), suggesting that the brain state showed a substantial shift from baseline during the perceptual change. Energy (log[1/p_switch]; see Methods) showed a U-shaped pattern around the perceptual change point – i.e., with a minimum value in the perceptual change along with the first and last images (Fig. 5B-D). To relate this measure to the energy landscape framework, and to control by the specific displacement occurring at each image, we then calculated the ratio between energy and the mean displacement (i.e., energy landscape ’depth’; Fig. 5A). As predicted, the brain-state reduced the amount of energy per displacement towards its minimum around the perceptual change (Fig. 5B). We interpret this set of results as the system flattening the energy landscape, reducing the energy (i.e., higher system changes become more common) required for large displacement values, effectively generating a ‘network reset’^12,14,58 of the brain state, which ultimately facilitated an updating of the content of perception.

To analyse speed-evoked changes in brain trajectories, we used a GLM to analyse each PC time series as a function of each perceptual switch. Our design matrix included the first and last images seen in each set, along with the three images leading up to the switch, the switch trial itself and the three images following the switch (see Methods for details). This approach thus allowed us to track the low-dimensional signature of the brain through the processing and resolution of perceptual ambiguity. As predicted (Fig. 5C), we found evidence that PC₂ showed a peak in velocity at the change point (Fig. 5D) providing confirmatory evidence that the low-dimensional brain state dynamics observed in whole-brain fMRI were highly similar to those observed in the trained RNN.

Discussion

Here, we studied the relationships between the ascending arousal system, low-dimensional trajectories and energy landscape dynamics during a perceptual switch task. Our results provide evidence that the ascending arousal system is involved in the modulation of dynamic brain state topography during switches in perceptual interpretation. We found that pupil diameter tracked with the task stimuli and was directly related to the resolution of the ambiguity of the image being presented (Fig. 1). Next, we confirmed that this process could be replicated using a simple RNN Model (Fig. 2), in which the functional behaviour of the network could be directly modulated by neuromodulatory-mediated alteration of activation function (Fig. 3). Finally, we confirmed that low-dimensional dynamics around the perceptual change could be inferred from whole brain neuroimaging data (Fig. 4) and that the low-dimensional signatures of the impacts of NA-mediated gain changes on RNN behaviour could also be inferred from whole-brain functional neuroimaging data (Fig. 5). Together, these results provide evidence of how the ascending arousal system facilitates perceptual dynamics during ambiguity by flattening the energy landscape thereby reducing the energy needed to reset the system topology in an adaptive and task dependent manner.

The relationship between perception and pupil diameter is consistent with the role of the noradrenergic system in cognition and attention^39,44. For instance, the LC dynamically changes its activity according to external and cognitive demands imposed on the system^{29,39,44,59,60}. Our results demonstrate a more precise role for LC-mediated alterations in neural gain in the transition between the two images. Even though the image changes occurred linearly across each trial, such that the middle trial contained a perfect blend of both images, the perceptual ambiguity is hypothesized to place demands on the system that we propose are resolved through the recruitment of the LC and noradrenaline, which then subsequently increase brain-wide communication by increasing the gain and signal transmission in targeted brain regions^11,27,60,61. As the pupil diameter increased as a function of perceptual ambiguity, which rose sharply in the few images prior to a reported perceptual change (Figure 1D), and pupil diameter is an indirect marker of the noradrenergic system^9,15, we thus provide early evidence for the role of the ascending neuromodulatory activity to facilitate perceptual changes and decision making under perceptual ambiguity^17,21,22. Taken together, we have provided evidence that the ascending arousal system, measured indirectly through pupillometry, can modulate the conscious perception of ambiguous images by facilitating the arbitration of uncertain information decision-making processes.

A core neuroanatomical property of the LC noradrenergic system is that a relatively small number of neurons (~fifty thousand in an adult human) projects to almost all of the regions in the brain^20,62. This organisation implies that the LC acts as a low dimensional modulator of the much more high-dimensional cerebral cortex. Subtle changes in the activity of LC can have significant effects on how different brain regions communicate^{27,44,59,63–65}. Based on this, we hypothesized that the ascending neuromodulatory system needs to increase its activity in order to facilitate network communication and brain state transitions when the sensory information is limited or presents ambiguity. In line with this hypothesis, we showed that increasing the gain of the sigmoid transfer function – which at a neuronal level is mediated by NA increasing intracellular calcium and opening (or closing) voltage-gated ions channels^11,44 – in our RNN model of perceptual switching had a speeding effect on the timing of perceptual switches, an effect that we found was largely mediated by enhanced inhibitory signalling. At the large-scale level, increasing gain led to a flattening of the energy landscape traversed by the dynamics of the RNN increasing the likelihood of large shifts in state space analogous to increasing the ‘temperature’ of a physical system^36,66. Indeed, analogous to an increase in kinetic energy we observed an increase in the velocity around the switch point as a function of gain.

In line with the predictions of the RNN in our analysis of the BOLD data, we showed that the velocity of the low dimensional brain state trajectory most associated with perceptual switching increased significantly during the point of reported perceptual change in comparison (Fig. 4B), which we interpret as the brain moving from one attractor to another (Fig. 5A). Importantly, we showed that around the perceptual switch, the energy needed for each unit of change in brain state (i.e., displacement) is smaller than at other points in the task (Fig. 5A-B). Under the energy landscape framework^23,38, this shows that the landscape is flattened, and the energy required to transition between states is reduced. Together with the pupillary findings (Fig. 1), the computation model (Fig. 2-3), and replication of former results^23,38 (Fig. 4E-F), we propose that the ascending neuromodulatory system is responsible for the large-scale flattening of the energy landscape which facilitates the change in reported perceptual changes.

This work is not without limitations. First, the pupil diameter dataset and the fMRI analysis came from different participants, such that the link between the pupil diameter and the fMRI results is inherently indirect. Moreover, the specificity of the pupil diameter as a marker of the LC activity is under active debate¹⁵. For instance, there is evidence supporting a role of the superior colliculus, the dorsal raphe nucleus and central cholinergic system in driving pupil dilations^29,67,68. There is uncertainty regarding whether these other nuclei are directly related to pupil dilation, or only indirectly via their connections with other neural regions and nuclei. Despite this, we believe that our pupillometry dataset captures the ascending neuromodulatory role on perceptual ambiguity as there is strong evidence showing that pupil diameter is a reliable marker of noradrenergic activity during evoked cognitive tasks^{19,63,69–71}. Additionally, the sample size of our fMRI study makes it difficult to generalize our results. In spite of this, we believe that the converging evidence from the pupillometry dataset, the fMRI dataset and the computational model, supports the role of the ascending neuromodulation on the resolution of perceptual ambiguity hypothesized here. Future work is needed both in humans, with higher sample sizes utilizing fMRI and eye-tracking recordings, as well as animal studies, to directly modulate and record the LC activity in a perceptual ambiguity task of the kind used here.

Conclusion

In summary, we provide computational and empirical evidence for the association between neuromodulation, pupil dilation, energy landscape flattening and the resolution of perceptual ambiguity. Our results strengthen our understanding of the complex, distributed neurobiological processes that in combination facilitate our ability to resolve perceptual ambiguity. Specifically, we suggest that the gain-modulated dynamics of the brain system use the widespread projections of the noradrenergic arousal system to mediate the reconfiguration of the systems-level organisation of cortical network architecture^62,63 to complete challenging perceptual tasks. Rather than working alone, we further hypothesize that the noradrenergic system cooperates and competes with other distinct arms of the ascending arousal system to together facilitate complex, adaptive network dynamics in the brain.

Methods

Overview

There were two independent groups analysed in this study: 35 subjects performed a perceptual decision-making perceptual task while pupil diameter was recorded; and a separate group of 17 subjects performed a version of the task adapted for the MRI scanner.

Perceptual Task

Twenty picture sets were used in which line drawings of common objects morphed over 15 iterations into a different object (Fig. 1A). Picture sets were selected from a larger set validated in an earlier study²⁴. Picture sets for the current study were selected with the criterion that all sets were perceived categorically in the normative study (i.e., that the majority of participants in the normative study categorized each picture they saw as either the first object or second object in the set¹). Selecting only the categorically perceived image sets guaranteed that pictures in the middle of the morphing sequence were not simply ‘noisier’ than pictures at the beginning or end. In other words, the ambiguous images were still easily categorised by participants as either object 1 or object 2. All images were a standard size (316x316 pixels) and were displayed on a white background. In addition, participants were presented with two kinds of control picture sets to ensure that they were responding to changes in the pictures in the set rather than simply to the position in the set (e.g., always switching after the 8th picture). In these control picture sets, a salient deviating picture was presented either after three pictures or after thirteen pictures resulting in an early or late abrupt shift. Those sets served as controls and were not analysed further.

The picture morphing task consisted of five experimental runs. We randomized the order in which the picture sets were presented in each run and kept this randomized order consistent across participants. Picture morphing in each picture set occurred over fifteen discrete steps, each corresponding with the acquisition of a whole-brain image. Each picture within a set was presented for two seconds. Pictures were randomly intermixed with eight inter-stimulus-intervals (2, 4, 6 or 8 seconds) during which participants saw a fixation cross. Participants provided their responses in the scanner using two buttons on a four button Cedrus fibre optic system. In a two-alternative forced-choice task, participants were asked to press the first button when they ‘saw the first object' and the second button when they ‘saw the second object' – this ensured that there was not a motor confound present on only the switch trials. All participants were ignorant as to the identity of the second object in each picture set. At the end of each set of 15 images the word END was presented for 2 s to indicate that the next picture set would begin shortly.

Participants

A total of seventeen (6 male) neurologically healthy participants with normal or corrected to normal vision took part in this study (mean age 27.65 ± 8.01). Fifteen were right-hand dominant. A separate cohort of 35 participants performed the task while simultaneous pupil diameter was recorded using an eye tracker device (SR Research, 1000 Hz). None of the participants had a history of brain injury. Participants received $30 for their participation. All participants provided informed consent prior to participation. The research protocol was approved by the Office of Research Ethics at the University of Waterloo and the Tri-Hospital Research Ethics Board of the Region of Waterloo in Ontario, Canada.

Pupillometry

Fluctuations in pupil diameter of the left eye were collected using an Eyelink 1000 (SR Research Ltd., Mississauga, Ontario, Canada), with a 1 kHz sampling frequency. Blinks, artifacts, and outliers were removed and linearly interpolated⁷². High-frequency noise was smoothed using a second-order 2.5-Hz low-pass Butterworth filter. To obtain the pupil diameter average profile, data from each participant were normalized across each trial (corresponding to the 15 consecutive image set). This allowed us to correct for low-frequency baseline changes without eliminating the load effect and baseline differences due to load manipulations^73,74.

Recurrent Neural Network Modelling

We used PyTorch⁷⁵ to implement and train 50 continuous-time recurrent neural networks that we constrained to respect Dale’s law (N_E+I = 40, 80% excitatory N_E = 32, and 20% inhibitory N_I = 8) using the procedure set out in Ref #⁴¹. The dynamics of each network evolved according to the following system of ordinary differential equations:

Where x ∈ ℝ^N×1 represents the sub-threshold activation of each unit, u ∈ ℝ^2×1 the external input into the network, W^rec ∈ ℝ^40×40 the recurrent weights, Wⁱⁿ ∈ ℝ^40×2 the input weights, and τ the time constant which we set to 100ms. In addition to task input each unit in the network received independent Gaussian noise Ν with 0 mean and standard deviation (with σ_rec set to 0.1). The subthreshold activation x was converted into a vector of firing rates by applying a sigmoid function where g ∈ ℝ^N×1 is a vector containing the gain control parameter of each unit’s activation function that was multiplied element wise with x. Network outputs z ∈ ℝ^2×1 were given by a linear readout of the excitatory population’s firing rate z = W^outr_E. Where . The network’s choice at each time point was the maximum of the two-dimensional output z.

We imposed Dale’s law on the recurrent weights of the network by parametrising the weight matrix with a mask W^mas ∈ ℝ^40×40 which contained zeros in the leading diagonal (removing self-connections), +1 in all non-diagonal entries of the first 32 rows/columns and −1 in the remaining 8 rows/columns. We obtained the constrained recurrent weight matrix by multiplying the absolute value of the trained weights element wise with the mask thereby imposing an 80/20 E/I ratio. Similarly, in line with the finding that long range connection in mammalian cortex are purely excitatory, we constrained the projection of the input to the network and the readout projection to be strictly positive by taking the absolute value of the trained input and output weights .

Following standard practice⁴¹, we simulated the network by discretising the system of ODEs using a forward Euler integration scheme with Δt set to 20 (denoting ).

Each network was trained by optimising , to minimise a cross entropy loss function through 2000 iterations of back propagation through time⁷⁶ (BPTT) with ADAM⁷⁷. Batches consisted of single trials which for our simple task (described below) was sufficient for each network to converge on near perfect behavioural accuracy. All training was performed with the gain control parameter set to 1.

The task consisted of a simple change detection paradigm analogous to the task performed by our human participants. Specifically, at each time point the network was fed a two-dimensional input with each row representing the “sensory evidence” for each of the two stimulus categories. The task lasted for 50 time-steps beginning with maximum evidence for one of the two categories and over the course of each trial changed linearly such that at time-step 26 the sensory evidence for each stimulus category changed to slightly favour the second stimulus category and by the final time-step consisted of maximum evidence for the second stimulus category . We trained the network to output a response for stimulus category 1 whenever u₁ > 0.5, and u₂ < 0.5, and category 2 whenever u₁ < 0.5, and u₂ > 0.5.

Following training we simulated a further 500 trials over the gain parameter range [0.5, 1.5] in steps of 0.1. To study how the population dynamics of the trained networks changed as a function of gain we performed a Principal Component Analysis (PCA) on the concatenated activity of the network at gain = 1. The set of principal components was highly low-dimensional, with 75.98 ± 5.79% of the variance explained by the first principal component (PC₁) and 13.75 ± 4.13 % explained by the second. We then projected the trial averaged activity at each gain value at each timepoint onto the top PC. The resultant low dimensional trajectories all showed a change in direction around the timepoint of the switch in network input from category 1 to category 2. We then calculated the gradient of the trajectory around the switch point observing a consistent increase in velocity as a function of gain (Fig. 2D).

Leveraging previous work from our group²³ we constructed a measure of the energy landscape traversed by each network through an analogy to the relationship between probability and energy in statistical mechanics⁴⁹ which is given by:

Where p_i denotes the probability of each state, E_i the energy of each state, β thermodynamic beta, and z the canonical partition function. Solving for E_i we obtain:

We then calculated the mean-squared displacement (MSD) of the activity of the RNN at each time point t₀ relative to a reference point t in the trial.

The MSD is a measure of the average deviation in the activity, x_t = [x_1,t x_2,t, …, x_r,t] of each unit n, with respect to the activity at reference point t0. We then constructed a distribution over MSD values at each timepoint using Gaussian kernel estimation , where . As Σ_i p_i = 1 by construction z = 1, which after setting β = 1 gives:

Following the physical analogy, we think of the MSD of the RNN activity as akin to temperature with large displacements, like systems with a high temperature, having a higher average energy and thus a more uniform probability distribution. Correspondingly, low temperature systems, are associated with lower average energy and thus a more peaked probability distribution. We ran this analysis over all trials at each gain value and then correlated the resulting energy, E, with gain (Fig. 2D).

MRI Data

Functional data were acquired using gradient echo-planar T2*-weighted images collected on a 1.5 T Phillips scanner located at Grand River Hospital in Waterloo, Ontario (TR = 2000 ms; TE = 40 ms; slice thickness = 5 mm with no gap; 26 slices/volume; FOV = 220x220 mm²; voxel size = 2.75x2.75x5mm³; flip angle = 90°). Each experimental run consisted of 26 slices per volume and 285 volumes. At the beginning of each run, a whole brain T1-weighted anatomical image was collected for each participant (TR = 7.4 ms; TE = 3.4 ms; voxel size = 1x1x1mm³; FOV = 240x240 mm²; 150 slices with no gap; flip angle = 8°). The experimental protocol was programmed using E-Prime experimental presentation software (v1.1 SP3; Psychology Software Tools, Pittsburgh, PA). Stimuli were presented on an Avotec Silent Vision™ fibre-optic presentation system using binocular projection glasses (Model SV-7021). The onset of each trial was synchronized with the onset of data collection for the appropriate functional volume using trigger pulses from the scanner.

fMRI Data Preprocessing

After realignment (using FSL’s MCFLIRT), we used FEAT to unwarp the EPI images in the y-direction with a 10% signal loss threshold and an effective echo spacing of 0.333. Following noise-cleaning with FIX (custom training set for scanner, threshold 20, including regression of estimated motion parameters), the unwrapped EPI images were then smoothed at 6-mm FWHM, and nonlinearly co-registered with the anatomical T1 to 2-mm isotropic MNI space. Temporal artifacts were identified in each dataset by calculating framewise displacement (FD) from the derivatives of the six rigid-body realignment parameters estimated during standard volume realignment⁷⁸, as well as the root mean square change in BOLD signal from volume to volume (DVARS). Frames associated with FD > 0.25 mm or DVARS > 2.5% were identified; however, as no participants were identified with greater than 10% of the resting time points exceeding these values, no trials were excluded from further analysis. There were no differences in head motion parameters between the five runs (p > 0.500). Following artifact detection, nuisance covariates associated with the six linear head movement parameters (and their temporal derivatives), DVARS, physiological regressors (created using the RETROICOR method), and anatomical masks from the cerebrospinal fluid and deep cerebral white matter were regressed from the data using the CompCor strategy⁷⁹. Finally, in keeping with previous time-resolved connectivity experiments⁸⁰, a temporal band pass filter (0.0071 < f < 0.125 Hz) was applied to the data.

Brain Parcellation

Following preprocessing, the mean time series was extracted from 375 predefined regions of interest (ROIs). To ensure whole-brain coverage, we extracted the following: (a) 333 cortical parcels (161 and 162 regions from the left and right hemispheres, respectively) using the Gordon atlas⁸¹; (b) 14 subcortical regions from the Harvard-Oxford subcortical atlas (bilateral thalamus, caudate, putamen, ventral striatum, globus pallidus, amygdala, and hippocampus; https://fsl.fmrib.ox.ac.uk/); (c) 28 cerebellar regions from the SUIT atlas⁸² for each participant in the study.

Neuroimaging Analysis

In order to analyse task evoked activity related to stimulus presentations, we first performed a principal component analysis (PCA)⁵³ on the pre-processed BOLD time-series (per subject/session), to extract orthogonal low-dimensional time-series. The top 3 PCs explained ~30.6% of the variance. The time-series of these PCs was entered into a general linear model, in which we modelled the following 9 event-types across an entire session, centred around the perceptual switch point, which changed on a trial-by-trial basis: the first two images (modelled as a single regressor), the seven images surrounding each perceptual change (i.e., the switch trial and the three images surrounding the change point, modelled as seven separate regressors) and the last two images (modelled as a single regressor). Each of the event onset times was also convolved with a canonical hemodynamic response function. This left us with nine unique β values per principal component, which we could use to determine how each PC differentially engaged as a function of the task. To test the hypothesis that the rate of change of PC engagement peaked at the perceptual change point, we calculated the difference between the β value for each of the top 3 PCs for each of the 9 event-types, and then plotted the resultant series in order to identify whether a peak occurred at the perceptual switch point (i.e., the middle β value in the series). A block-resampling null (n = 5,000 permutations) was used as a permutation test (p < 0.05).

Spatial maps associated with the terms: “switching”, “effort”, “attention”, “perception” and “load” were downloaded from the neurosynth repository⁵⁵ and mapped into our parcellation space by calculating the mean value within each independent parcel. These values were then correlated with the spatial loading of each of the top 3 PCs. A separate partial correlation analysis was conducted in which the same correlation was estimated after controlling for each of the other spatial maps.

Topological analyses

A hierarchical modularity approach was used to collapse the mean time-averaged correlation matrix across participants into a set of four spatially non-overlapping modules. Briefly, this involved running the Louvain modularity algorithm, which iteratively maximizes the modularity statistic, Q, for different community assignments until the maximum possible score of Q has been obtained.

The community assignment for each region was then estimated 500 times across a range of γ values (0.5-2.0, in steps of 0.1). In order to identify multi-level structure in our data, we repeated the modularity analysis for each of the modules identified in the first step⁸³. Finally, a consensus partition was identified using a fine-tuning algorithm from the Brain Connectivity Toolbox (http://www.brain-connectivity-toolbox.net/). We subsequently used this final module assignment to estimate the cartographic profile of the each participant’s time-averaged adjacency matrix⁵⁷. Specifically, we estimated Integration using the participation coefficient, which quantifies the extent to which a region connects across all modules (i.e., between-module strength⁸⁴), and Segregation using the module-degree Z-score. These measures were entered into a joint-histogram (101x101 unique bins, equally-spaced between 0-1 [for Integration] and -1 and 1 [for Segregation]). The value within each bin of this joint histogram was then correlated with the combined regression weights of PC₂ and PC₃ for each subject. A permutation test that scrambled the order of participants was used to assess statistical significance (p < 0.05).

Brain-state displacement and the energy landscape

To quantify the change in the evoked BOLD activity following each stimulus we calculated the main BOLD displacement (MBD). The MBD is a measure of the absolute evoked deviation in BOLD activity. The evoked activity is measured through a general linear model using a canonical hemodynamic response function convolved on a design matrix. We are interested in the probability, p(MBD, re), that we will observe a given displacement in BOLD at a given regressor re. The probability is calculated through the null model of the general lineal model (the probability that the observed evoked value of the corresponding region is different from 0). As described above we then calculated the energy for each displacement value as . Finally, to measure the surprise per displacement, we divided the absolute β for PC2 from E_MBD,re for each regressor re (Figure 3).

Supplemental Materials

Overall Analysis Flow.
Top Row (orange) – pupil diameter was collected in a cohort of 35 individuals while they performed the Ambiguous Figures task. We observed a large peak in pupil dilation at the perceptual change point, which led us to make the prediction that there should be an increase in inter-regional gain at the switch point. Middle Row (blue) – we trained a 100-node RNN to perform a similar classification task in the presence of shifting perceptual ambiguity, and then tested the network at different levels of gain (i.e., the slope of the *tanh* activation function). We observed early switches with heightened gain, as well as altered attractor dynamics that caused a flattening of the energy landscape characterising state switches. Bottom Row (green) – we tested the predictions of the RNN using BOLD data from 17 subjects performing the same task. After filtering the BOLD data through a principal component analysis (in which we retained the top 5 principal components; PC_1-5), we observed an increase in the gradient of PC loading around the switch point using an FIR model, as well as a flattening of the energy landscape, thus confirming our original predictions.

Difference in mean firing rates between stimulus selective excitatory clusters.
To examine the effect of manipulating gain on the operation of the network (A) we averaged over the firing rate of the excitatory neurons in each stimulus selective cluster (r_c1, r_c2) and looked for the point at which r_c2 > r_c1 (and v.v.). In line with expectations the speeding and slowing effect of gain on network output time was straightforwardly reflected in the mean firing rates. B) Difference in mean firing rates for high (red), intermediate (green), and low (blue) gain, for gain manipulations targeting both excitatory and inhibitory neurons. Notice that the switch from r_c1 > r_c2 to r_c2 > r_c1 occurs sooner in time for high gain, and slower for low gain. C) Manipulating excitatory gain in isolation led to slower switches in mean firing rates for low gain but high gain did not speed switches. D) Manipulating inhibitory gain in isolation led to slower switches in mean firing rates for low gain and speeded switches under high gain.

Relationship between Attractor Depth and Energy Landscape.
We simulated a simple model (the normal form of a pitchfork bifurcation) (middle row is the corresponding potential ) and set the α term to two different values: on the left (red), α = 0.25, which corresponds to relatively shallow attractors; on the right (blue), α = 0.75, which corresponds to relatively deep attractors. We simulated model in the presence of light noise noisy, and then calculated the energy landscape of the timeseries (see methods in main paper) (bottom row; z-axis) As can be seen by comparing the middle and bottom rows, deeper attractors relate to higher energy barriers.

PC2 evoked displacement and surplice around the perceptual change.
A) Pearson’s correlation between each PCs and the evoked brain activity at the perceptual switch (β values), dashed line at PC2. B) Pearson’s correlation between the inverted brain maps using βPC (PC_(1-i) x βPC_(1-i)). Dashed line shows that the correlation gets to 94% using the first 3 PCs (Pearson’s r =0.94, p < 0.001). C) Mean absolute β loading (red) and group standard error (shaded red). D) Mean surprise calculated as -log(1-pvalues) in each regressors. Dotted black line define the perceptual switch point.

References

1.
1. Stöttinger E.
2. et al.
2015A cortical network that marks the moment when conscious representations are updatedNeuropsychologia 79:113–122
2.
1. Weilnhammer V.
2. Stuke H.
3. Hesselmann G.
4. Sterzer P.
5. Schmack K.
2017A predictive coding account of bistable perception - a model-based fMRI studyPLoS Comput. Biol 13
3.
1. Reynolds J. H.
2. Heeger D. J.
2009The Normalization Model of AttentionNeuron 61:168–185
4.
1. Desimone R.
2. Duncan J.
1995Neural mechanisms of selective visual attentionAnnu. Rev. Neurosci 18:193–222
5.
1. Meng M.
2. Tong F.
2004Can attention selectively bias bistable perception? Differences between binocular rivalry and ambiguous figuresJ. Vis 4:539–551
6.
1. Dieter K. C.
2. Tadin D.
2011Understanding attentional modulation of binocular rivalry: A framework based on biased competitionFront. Hum. Neurosci 5:1–12
7.
1. Cisek P.
2019Resynthesizing behavior through phylogenetic refinementAtten. Percept. Psychophys https://doi.org/10.3758/s13414-019-01760-1
8.
1. Carter O.
2. van Swinderen B.
3. Leopold D. A.
4. Collin S. P.
5. Maier A.
2020Perceptual rivalry across animal speciesJ. Comp. Neurol 528:3123–3133
9.
1. Szabadi E.
2018Functional Organization of the Sympathetic Pathways Controlling the Pupil: Light-Inhibited and Light-Stimulated PathwaysFront. Neurol 9
10.
1. Briand L. A.
2. Gritton H.
3. Howe W. M.
4. Young D. A.
5. Sarter M.
2007Modulators in concert for cognition : Modulator interactions in the prefrontal cortex83:69–91
11.
1. Shine J. M.
2. et al.
2021Computational models link cellular mechanisms of neuromodulation to large-scale neural dynamicsNat. Neurosci 24:765–776
12.
1. Sara S. J.
2009The locus coeruleus and noradrenergic modulation of cognitionNat. Rev. Neurosci 10:211–223
13.
1. Jacob S. N.
2. Nienborg H.
3. Sara S. J.
4. Jacob S. N.
2018Monoaminergic Neuromodulation of Sensory Processing12:1–17
14.
1. Bouret S.
2. Sara S. J.
2005Network reset: A simplified overarching theory of locus coeruleus noradrenaline functionTrends Neurosci 28:574–582
15.
1. Joshi S.
2. Gold J. I.
2020Pupil Size as a Window on Neural Substrates of CognitionTrends Cogn. Sci 24:466–480
16.
1. Einhäuser W.
2. Stout J.
3. Koch C.
4. Carter O.
2008Pupil dilation reflects perceptual selection and predicts subsequent stability in perceptual rivalryProc. Natl. Acad. Sci. U. S. A 105:1704–1709
17.
1. Murphy P. R.
2. Vandekerckhove J.
3. Nieuwenhuis S.
2014Pupil-Linked Arousal Determines Variability in Perceptual Decision MakingPLoS Comput. Biol 10
18.
1. Shine J. M.
2019Neuromodulatory Influences on Integration and Segregation in the BrainTrends Cogn. Sci 23:572–583
19.
1. Wainstein G.
2. et al.
2021The ascending arousal system promotes optimal performance through meso-scale network integration in a visuospatial attentional taskNetw. Neurosci :1–32https://doi.org/10.1162/netn_a_00205
20.
1. Samuels E.
2. Szabadi E.
2008Functional Neuroanatomy of the Noradrenergic Locus Coeruleus: Its Roles in the Regulation of Arousal and Autonomic Function Part II: Physiological and Pharmacological Manipulations and Pathological Alterations of Locus Coeruleus Activity in HumansCurr. Neuropharmacol 6:254–285
21.
1. de Gee J. W.
2. Knapen T.
3. Donner T. H.
2014Decision-related pupil dilation reflects upcoming choice and individual biasProc. Natl. Acad. Sci 111:E618–E625
22.
1. de Gee J. W.
2. et al.
2017Dynamic modulation of decision biases by brainstem arousal systemsElife 6
23.
1. Munn B. R.
2. Müller E. J.
3. Wainstein G.
4. Shine J. M.
2021The ascending arousal system shapes neural dynamics to mediate awareness of cognitive statesNat. Commun 12:1–9
24.
1. Stöttinger E.
2. Sepahvand N. M.
3. Danckert J.
4. Anderson B.
2016Assessing perceptual change with an ambiguous figures task: Normative data for 40 standard picture setsBehav. Res. Methods 48:201–222
25.
1. Pfeffer T.
2. et al.
2022Coupling of pupil- and neuronal population dynamics reveals diverse influences of arousal on cortical processingeLife 11
26.
1. de Gee J. W.
2. et al.
2020Pupil-linked phasic arousal predicts a reduction of choice bias across species and decision domainseLife 9
27.
1. Shine J. M.
2. Aburn M. J.
3. Breakspear M.
4. Poldrack R. A.
2018The modulation of neural gain facilitates a transition between functional segregation and integration in the brainElife 7
28.
1. Servan-schreiber A. D.
2. et al.
2016Reports A Network Model of Catecholamine EffectsGain, Signal-to-Noise Ratio, and Behavior 249:892–895
29.
1. Joshi S.
2. Li Y.
3. Kalwani R. M.
4. Gold J. I.
2016Relationships between Pupil Diameter and Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate CortexNeuron 89:221–234
30.
1. Hupe J. M.
2. Lamirel C.
3. Lorenceau J.
2009Pupil dynamics during bistable motion perceptionJ. Vis 9:10–10
31.
1. Kloosterman N. A.
2. et al.
2015Pupil size tracks perceptual content and surpriseEur. J. Neurosci 41:1068–1078
32.
1. Kosciessa J. Q.
2. Lindenberger U.
3. Garrett D. D.
2021Thalamocortical excitability modulation guides human perception under uncertaintyNat. Commun 12
33.
1. Eldar E.
2. Cohen J. D.
3. Niv Y.
2013The effects of neural gain on attention and learningNat. Neurosci 16:1146–1153
34.
1. Urai A. E.
2. Braun A.
3. Donner T. H.
2017Pupil-linked arousal is driven by decision uncertainty and alters serial choice biasNat. Commun 8
35.
1. Kloosterman N. A.
2. et al.
2015Pupil size tracks perceptual content and surpriseEur. J. Neurosci 41:1068–1078
36.
1. Müller E. J.
2. Munn B. R.
3. Shine J. M.
2020Diffuse neural coupling mediates complex network dynamics through the formation of quasi-critical brain statesNat. Commun 11
37.
1. Shine J. M.
2. et al.
2016The Dynamics of Functional Brain Networks: Integrated Network States during Cognitive Task PerformanceNeuron 92:544–554
38.
1. Taylor N. L.
2. et al.
2022Structural connections between the noradrenergic and cholinergic system shape the dynamics of functional brain networksNeuroImage 260
39.
1. Aston-Jones G.
2. Cohen J. D.
2005An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performanceAnnu. Rev. Neurosci 28:403–450
40.
1. Song H. F.
2. Yang G. R.
3. Wang X.-J.
2016Training Excitatory-Inhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible FrameworkPLOS Comput. Biol 12
41.
1. Yang G. R.
2. Wang X. J.
2020Artificial Neural Networks for Neuroscientists: A PrimerNeuron 107:1048–1070
42.
1. Wong K.-F.
2. Wang X.-J.
2006A recurrent network mechanism of time integration in perceptual decisionsJ. Neurosci. Off. J. Soc. Neurosci 26:1314–1328
43.
1. Wang X.-J.
2002Probabilistic decision making by slow reverberation in cortical circuitsNeuron 36:955–968
44.
1. Wainstein G.
2. Müller E. J.
3. Taylor N.
4. Munn B.
5. Shine J. M.
2022The role of the locus coeruleus in shaping adaptive cortical melodiesTrends Cogn. Sci 26:527–538
45.
1. Taylor N. L.
2. et al.
2022NeuroImage Structural connections between the noradrenergic and cholinergic system shape the dynamics of functional brain networks
46.
1. John Y. J.
2. et al.
2022It’s about time: Linking dynamical systems with human neuroimaging to understand the brainNetw. Neurosci 6:960–979
47.
1. Jordan R.
2. Keller G. B.
2023The locus coeruleus broadcasts prediction errors across the cortex to promote sensorimotor plasticityeLife https://doi.org/10.7554/eLife.85111.1
48.
1. Sales A. C.
2. Friston K. J.
3. Jones M. W.
4. Pickering A. E.
5. Moran R. J.
2019Locus Coeruleus tracking of prediction errors optimises cognitive flexibility: An Active Inference modelPLOS Comput. Biol 15
49.
1. Tkačik G.
2. et al.
2015Thermodynamics and signatures of criticality in a network of neuronsProc. Natl. Acad. Sci. U. S. A 112:11508–11513
50.
1. Richards B. A.
2. et al.
2019A deep learning framework for neuroscienceNat. Neurosci 22:1761–1770
51.
1. Doerig A.
2. et al.
2023The neuroconnectionist research programmeNat. Rev. Neurosci https://doi.org/10.1038/s41583-023-00705-w
52.
1. Vyas S.
2. Golub M. D.
3. Sussillo D.
4. Shenoy K. V.
2020Computation Through Neural Population DynamicsAnnu. Rev. Neurosci 43:249–275
53.
1. Shine J. M.
2. et al.
2019The low-dimensional neural architecture of cognitive complexity is related to activity in medial thalamic nucleiNeuron 104:849–855
54.
1. Jolliffe I. T.
1982A Note on the Use of Principal Components in RegressionAppl. Stat 31
55.
1. Yarkoni T.
2. Poldrack R. A.
3. Nichols T. E.
4. Van Essen D. C.
5. Wager T. D.
2011Large-scale automated synthesis of human functional neuroimaging dataNat. Methods 8:665–670
56.
1. Shine J. M.
2. Aburn M. J.
3. Breakspear M.
4. Poldrack R. A.
2018The modulation of neural gain facilitates a transition between functional segregation and integration in the brainElife 7
57.
1. Shine J. M.
2. et al.
2016The Dynamics of Functional Brain Networks: Integrated Network States during Cognitive Task PerformanceNeuron 92:544–554
58.
1. Sara S. J.
2015Locus Coeruleus in time with the making of memoriesCurr. Opin. Neurobiol 35:87–94
59.
1. Liu Y.
2. Rodenkirch C.
3. Moskowitz N.
4. Schriver B.
2017Dynamic Lateralization of Pupil Dilation Evoked by Locus Coeruleus Activation Results from Article Dynamic Lateralization of Pupil Dilation Evoked by Locus Coeruleus Activation Results from Sympathetic, Not Parasympathetic, ContributionsCellReports 20:3099–3112
60.
1. Nieuwenhuis S.
2. Aston-Jones G.
3. Cohen J. D.
2005Decision making, the P3, and the locus coeruleus-norepinephrine systemPsychol. Bull 131:510–532
61.
1. Murphy P. R.
2. Boonstra E.
3. Nieuwenhuis S.
2016Global gain modulation generates time-dependent urgency during perceptual choice in humansNat. Commun 7:1–15
62.
1. Totah N. K. B.
2. Logothetis N. K.
3. Eschenko O.
2018Noradrenergic ensemble-based modulation of cognition over multiple timescalesBrain Res https://doi.org/10.1016/j.brainres.2018.12.031
63.
1. Zerbi V.
2. et al.
2019Rapid Reconfiguration of the Functional Connectome after Chemogenetic Locus Coeruleus ActivationSSRN Electron. J :1–17https://doi.org/10.2139/ssrn.3334983
64.
1. Hansen J. Y.
2. et al.
2022Mapping neurotransmitter systems to the structural and functional organization of the human neocortexNat. Neurosci 25:1569–1581
65.
1. Hansen J. Y.
2. et al.
2021Mapping gene transcription and neurocognition across human neocortexNat. Hum. Behav 5:1240–1250
66.
1. Müller E. J.
2. et al.
2023The non-specific matrix thalamus facilitates the cortical information processing modes relevant for conscious awarenessCell Rep 42
67.
1. Reimer J.
2. et al.
2016Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortexNat. Commun 7:1–7
68.
1. Cazettes F.
2. Reato D.
3. Morais J. P.
4. Renart A.
5. Mainen Z. F.
2020Phasic Activation of Dorsal Raphe Serotonergic Neurons Increases Pupil SizeCurr. Biol :192–197https://doi.org/10.1016/j.cub.2020.09.090
69.
1. Alnæs D.
2. et al.
2014Pupil size signals mental effort deployed during multiple object tracking and predicts brain activity in the dorsal attention network and the locus coeruleusJ. Vis 14:1–20
70.
1. Janitzky K.
2. Lippert M. T.
3. Engelhorn A.
4. Tegtmeier J.
5. Cope Z. A.
2015Optogenetic silencing of locus coeruleus activity in mice impairs cognitive flexibility in an attentional set-shifting task9:1–8
71.
1. Reimer J.
2. et al.
2014Pupil Fluctuations Track Fast Switching of Cortical States during Quiet WakefulnessNeuron 84:355–362
72.
1. Wainstein G.
2. et al.
2017Pupil size tracks attentional performance in attention-deficit/hyperactivity disorderSci. Rep 7:1–9
73.
1. Campos-Arteaga G.
2. et al.
2020Differential neurophysiological correlates of retrieval of consolidated and reconsolidated memories in humans: an ERP and pupillometry studyNeurobiol. Learn. Mem https://doi.org/10.1016/j.nlm.2020.107279
74.
1. Rojas-Líbano D.
2. et al.
2019A pupil size, eye-tracking and neuropsychological dataset from ADHD children during a cognitive taskSci. Data 6
75.
1. Paszke A.
2. et al.
2019Pytorch: An imperative style, high-performance deep learning libraryAdv. Neural Inf. Process. Syst 32
76.
1. Werbos P. J.
1990Backpropagation Through Time: What It Does and How to Do ItProc. IEEE 78:1550–1560
77.
1. Kingma D. P.
2. Ba J. L.
2015Adam: A method for stochastic optimization3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc :1–15
78.
1. Power J. D.
2. et al.
2014Methods to detect, characterize, and remove motion artifact in resting state fMRINeuroImage 84:320–41
79.
1. Behzadi Y.
2. Restom K.
3. Liau J.
4. Liu T. T.
2007A component based noise correction method (CompCor) for BOLD and perfusion based fMRINeuroImage 37:90–101
80.
1. Gu S.
2. et al.
2015Controllability of structural brain networksNat. Commun 6:1–10
81.
1. Gordon E. M.
2. et al.
2016Generation and Evaluation of a Cortical Area Parcellation from Resting-State CorrelationsCereb. Cortex 26:288–303
82.
1. Diedrichsen J.
2. Balsters J. H.
3. Flavell J.
4. Cussans E.
5. Ramnani N.
2009A probabilistic MR atlas of the human cerebellumNeuroImage 46:39–46
83.
1. Meunier D.
2. Lambiotte R.
3. Bullmore E. T.
2010Modular and Hierarchically Modular Organization of Brain NetworksFront. Neurosci 4
84.
1. Guimerà R.
2. Nunes Amaral L. A.
2005Functional cartography of complex metabolic networksNature 433:895–900

Article and author information

Author information

Gabriel Wainstein
Brain and Mind Center, The University of Sydney, Sydney, Australia
ORCID iD: 0000-0002-8106-6647
- Denotes co-first author
Christopher J. Whyte
Brain and Mind Center, The University of Sydney, Sydney, Australia
- Denotes co-first author
Kaylena A. Ehgoetz Martens
The University of Waterloo, Waterloo, Ontario, Canada
Eli J. Müller
Brain and Mind Center, The University of Sydney, Sydney, Australia
Brandon R. Munn
Brain and Mind Center, The University of Sydney, Sydney, Australia
ORCID iD: 0000-0002-3638-1605
Vicente Medel
Brain and Mind Center, The University of Sydney, Sydney, Australia, Latin American Brain Health (BrainLat), Universidad Adolfo Ibáñez, Santiago, Chile
Britt Anderson
The University of Waterloo, Waterloo, Ontario, Canada
Elisabeth Stöttinger
Hochschule Fresenius, Köln, Germany
James Danckert
The University of Waterloo, Waterloo, Ontario, Canada
James M. Shine
Brain and Mind Center, The University of Sydney, Sydney, Australia, Center for Complex Systems, The University of Sydney, Sydney, Australia
ORCID iD: 0000-0003-1762-5499
- For correspondence: mac.shine@sydney.edu.au
- Denotes corresponding author – mac.shine@sydney.edu.au

Version history

Preprint posted: September 27, 2023
Sent for peer review: October 19, 2023
Reviewed Preprint version 1: January 25, 2024

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Reviewing Editor
Tobias Donner
University Medical Center Hamburg-Eppendorf, Hamburg, Germany
Senior Editor
Joshua Gold
University of Pennsylvania, Philadelphia, United States of America

Reviewer #1 (Public Review):

Summary:
This paper investigates the neural mechanisms underlying the change in perception when viewing ambiguous figures. Each possible percept is related to an attractor-like brain state and a perceptual switch corresponds to a transition between these states. The hypothesis is that these switches are promoted by bursts of noradrenaline that change the gain of neural circuits. The authors present several lines of evidence consistent with this view: pupil diameter changes during the time point of the perceptual change; a gain change in neural network models promotes a state transition; and large-scale fMRI dynamics in a different experiment suggests a lower barrier between brain states at the change point. However, some assumptions of the computational model seem not well justified and the theoretical analysis is incomplete. The paper would also benefit from a more in-depth analysis of the experimental data.

Strengths:
The main strength of the paper is that it attempts to combine experimental measurements - from psychophysics, pupil measurements, and fMRI dynamics - and computational modeling to provide an emerging picture of how a perceptual switch emerges. This integrative approach is highly useful because the model has the potential to make the underlying mechanisms explicit and to make concrete predictions.

Weaknesses:
A general weakness is that the link between the three parts of the paper is not very strong. Pupil and fMRI measurements come from different experiments and additional analysis showing that the two experiments are comparable should be included. Crucially, the assumptions underlying the RNN modeling are unclear and the conclusions drawn from the simulation may depend on those assumptions.

Main points:
Perceptual tasks in pupil and fMRI experiments: how comparable are these two tasks? It seems that the timing is very different, with long stimulus presentations and breaks in the fMRI task and a rapid sequence in the pupil task. Detailed information about the task timing in the pupil task is missing. What evidence is there that the same mechanisms underlie perceptual switches at these different timescales? Quantification of the distributions of switching times/switching points in both tasks is missing. Do the subjects in the fMRI task show the same overall behavior as in the pupil task? More information is needed to clarify these points.

Computational model:
1. Modeling noradrenalin effects in the RNN: The pupil data suggests phasic bursts of NA would promote perceptual switches. But as I understand, in the RNN neuromodulation is modeled as different levels of gain throughout the trial. Making the neural gain time-dependent would allow investigation of whether a phasic gain change can explain the experimentally observed distribution of switching times.

2. Modeling perceptual switches: in the results, it is described that the networks were trained to output a categorical response, but the firing rates in Fig 2B do not seem categorical but rather seem to follow the input stimulus. The output signals of the network are not shown. If I understand correctly, a trivial network that would just represent the two input signals without any internal computation and relay them to the output would do the task correctly (because "the network's choice at each time point was the maximum of the two-dimensional output", p. 22). This seems like cheating: the very operation that the model should perform is to signal the change, in a categorical manner, not to represent the gradually changing input signals.

3. The mechanism of how increased gain leads to faster switches remains unclear to me. My first intuition was that increasing the gain of excitatory populations (the situation shown in Fig. 2E) in discrete attractor models would lead to deeper attractor wells and this would make it more difficult to switch. That is, a higher gain should lead to slower decisions in this case. However, here the switching time remains constant for a gain between 1 and 1.5. Lowering the gain, on the other hand, leads to slower switching. It is, of course, possible that the RNN behaves differently than classical point attractor models or that my intuition is incorrect (though I believe it is consistent with previous literature, e.g. Niyogi & Wong-Lin 2013 (doi:10.1371/journal.pcbi.1003099) who show higher firing rates - more stable attractors - for increased excitatory gain).

4. From the RNN model it is not clear how changes in excitatory and inhibitory gain lead to slower/faster switching. In order to better understand the role of inhibitory and excitatory gain on switching, I would suggest studying a simple discrete attractor model (a rate model, for example as in Wong and Wang 2006 or Roxin and Ledberg, Plos Comp. Bio 2008) which will allow to study these effects in terms of a very few model parameters. The Roxin paper also shows how to map rate models onto simplified one-dimensional systems such as the one in Fig S3. Setting up the model using this framework would allow for making much stronger, principled statements about how gain changes affect the energy landscape, and under which conditions increased inhibitory gain leads to faster switching.

One possibility is that increasing the excitatory gain in the RNN leads to saturated firing rates. If this is the reason for the different effects of excitatory and inhibitory gain changes, it should be properly explained. Moreover, the biological relevance of this effect should be discussed (assuming that saturation is indeed the explanation).

Alternative mechanisms:
It is mentioned in the introduction that changes in attention could drive perceptual switches. A priori, attention signals originating in the frontal cortex may be plausible mechanisms for perceptual switches, as an alternative to LC-controlled gain modulation. Does the observed fMRI dynamics allow us to distinguish these two hypotheses? In any case, I would suggest including alternative scenarios that may be compatible with the observed findings in the discussion.

https://doi.org/10.7554/eLife.93191.1.sa1

Reviewer #2 (Public Review):

Strengths
- the study combines different methods (pupillometry, RNNs, fMRI).
- the study combines different viewpoints and fields of the scientific literature, including neuroscience, psychology, physics, dynamical systems.
- This combination of methods and viewpoints is rarely done, it is thus very useful.
- Overall well-written.

Weaknesses
- The study relies on a report paradigm: participants report when they identify a switch in the item category. The sequence corresponds to the drawing of an object being gradually morphed into another object. Perceptual switches are therefore behaviorally relevant, and it is not clear whether the effect reported correspond to the perceptual switch per se, or the detection of an event that should change behavior (participant press a button indicating the perceived category, and thus switch buttons when they identify a perceptual change). The text mentions that motor actions are controlled for, but this fact only indicates that a motor action is performed on each trial (not only on the switch trial); there is still a motor change confounded with the switch. As a result, it is not clear whether the effect reported in pupil size, brain dynamics, and brain states is related to a perceptual change, or a decision process (to report this change).

- The study presents events that co-occur (perceptual switch, change in pupil size, energy landscape of brain dynamics) but we cannot identify the causes and consequences. Yet, the paper makes several claims about causality (e.g. in the abstract "neuromodulatory tone ... causally mediates perceptual switches", in the results "the system flattening the energy landscape ... facilitated an updating of the content of perception").

- Some effects may reflect the expectation of a perceptual switch, rather than the perceptual switch per se. Given the structure of the task, participants know that there will be a perceptual switch occurring once during a sequence of morphed drawings. This change is expected to occur roughly in the middle of the sequence, making early switches more surprising, and later switches less surprising. Differences in pupil response to early, medium, and late switches could reflect this expectation. The authors interpret this effect very differently ("the speed of a perceptual switch should be dependent on LC activity").

- The RNN is far more complex than needed for the task. It has two input units that indicate the level of evidence for the two categories being morphed, and it is trained to output the dominant category. A (non-recurrent) network with only these two units and an output unit whose activity is a sigmoid transform of the difference in the inputs can solve the task perfectly. The RNN activity is almost 1-dimensional probably for this reason. In addition, the difficult part of the computation done by the human brain in this task is already solved in the input that is provided to the network (the brain is not provided with the evidence level for each category, and in fact, it does not know in advance what the second category will be).

- Basic fMRI results are missing and would be useful, before using elaborate analyses. For instance, what are the regions that are more active when a switch is detected?

- The use of methods from physics may obscure some simple facts and simpler explanations. For instance, does the flatter energy landscape in the higher gain condition reflect a smaller number of states visited in the state space of the RNN because the activity of each unit gets in the saturation range? If correct, then it may be a more straightforward way of explaining the results.

- Some results are not as expected as the authors claim, at least in the current form of the paper. For instance, they show that, when trained to identify which of two inputs u1 and u2 is the largest (with u2=1-u1, starting with u1=1 and gradually decreasing u1), a higher gain results in the RNN reporting a switch in dominance before the true switch (e.g. when u1=0.6 and u2=0.4), and vice et versa with a lower gain. In other words, it seems to correspond to a change in criterion or bias in the RNN's decision. The authors should discuss more specifically how this result is related to previous studies and models on gain modulation. An alternative finding could have been that the network output is a more (or less) deterministic function of its inputs, but this aspect is not reported.

https://doi.org/10.7554/eLife.93191.1.sa0

Author Response

We appreciate the insightful and constructive feedback from the reviewers regarding our manuscript, "Gain neuromodulation mediates perceptual switches: evidence from pupillometry, fMRI, and RNN Modelling." The comments have provided us with a number of valuable perspectives that will undoubtedly strengthen the impact and clarity of our work.

We recognize the need for a more detailed and comparative analysis of the perceptual tasks used in our pupil and fMRI experiments. To address these points directly: the jittered intertrial intervals (ITIs) in the fMRI work were deemed necessary to effectively deconvolve the BOLD response (see Stottinger et al., 2018). In our fMRI work, each image was randomly preceded and followed by varying ITIs (2, 4, 6, and 8 seconds), ensuring an equitable distribution across sets and subjects. Importantly, our analysis of both fMRI and behavioral studies, including eye tracking data, indicates that perceptual switch behavior – the point at which switches occur – is consistent across modalities. If more predictive or preparatory activity were present in the fMRI version of the task, we would expect earlier switches or choices and altered reaction time distributions – neither of these signatures was observed in the original study (Stottinger et al., 2018). Importantly, this suggests that the additional time available in the fMRI experiments did not significantly alter behavioral outcomes. Thus, our findings suggest that despite the differences in timing and task structure, the behavioural responses remain consistent across both experimental setups. We will clarify this in the revised manuscript.

In response to the reviewer's comments on our computational model, particularly regarding the modelling of noradrenaline (NA) effects in the RNN, we agree that modelling gain as stationary is a substantial approximation. However, given the slow ramping of pupil diameter, which served as our proxy for gain, it is an approximation that we believe is justified: in the revised manuscript, we will run additional simulations to ensure the validity of this approximation. In addition, whilst we agree that the model is more complicated than is needed for the task, we opted for RNN modelling, in lieu of a simpler modelling approach, because we wanted to use RNN modelling as a method for both hypothesis testing and generation. To build the RNN, the only key elements of model structure we had to specify in advance were the inputs and the target outputs of the network. The solution the RNN arrived at, although involving many more parameters than a simpler model, was entirely determined by optimisation (i.e., not our a priori hypotheses). We feel that this strengthens the result considerably. Importantly, this approach also allowed us to be surprised by the results of the model – for instance, we did not anticipate that the effect of gain on the energy landscape to be primarily mediated by inhibitory gain. In the revised manuscript, we will integrate this line of thinking into the paper. We are also sensitive to the fact that this result is both counterintuitive and difficult to study in high-dimensional dynamical systems like RNNs. In revisions, we will provide further analysis of the RNN and build a 2D approximation to the RNN that can be studied on the phase plane to better conceptually illuminate the mechanisms at play.

Furthermore, we agree with the suggestion to consider alternative mechanisms that might contribute to perceptual switches, such as attention and top-down processing. While our study primarily focuses on LC-mediated gain modulation, we acknowledge the complexity of neural processes involved in perception and will expand our discussion to include these potential mechanisms. Furthermore, noting the importance of moderating the causal language used in our manuscript. We will revise our wording to more accurately reflect the correlational nature of our findings and ensure that our conclusions are firmly grounded in the data presented.

In conclusion, we are enthusiastic about the opportunity to refine our manuscript based on these valuable comments. In an updated version, we will address the overall points by providing clearer explanations of our methods, refining our figures for better readability, and ensuring that our conclusions are supported by robust analysis. We believe that these revisions will not only address the concerns raised but also significantly enhance the overall quality of our research. We thank the reviewers for their thorough and thoughtful critiques and look forward to submitting our revised manuscript.

https://doi.org/10.7554/eLife.93191.1.sa3

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Evoked pupil dilations coincide with the resolution of perceptual ambiguity

Pupil diameter tracks perceptual change.

Computational evidence for neuromodulatory-mediated perceptual resolution in a recurrent neural network

A recurrent neural network model of perceptual switching.

Large-scale neural predictions of recurrent neural network model

Heightened gain mediates earlier perceptual switching through speeded low-dimensional dynamics and flattened attractor dynamics.

The low-dimensional signature of ambiguity resolution and perceptual change

Low-dimensional switch-related dynamics and connectivity.

Confirmation of model predictions in whole-brain BOLD data

Confirmation of model predictions in whole-brain BOLD data.

Discussion

Conclusion

Methods

Overview

Perceptual Task

Participants

Pupillometry

Recurrent Neural Network Modelling

MRI Data

fMRI Data Preprocessing

Brain Parcellation

Neuroimaging Analysis

Topological analyses

Brain-state displacement and the energy landscape

Supplemental Materials

Overall Analysis Flow.

Difference in mean firing rates between stimulus selective excitatory clusters.

Relationship between Attractor Depth and Energy Landscape.

PC2 evoked displacement and surplice around the perceptual change.

References

Article and author information

Author information

Gabriel Wainstein†

Christopher J. Whyte†

Kaylena A. Ehgoetz Martens

Eli J. Müller

Brandon R. Munn

Vicente Medel

Britt Anderson

Elisabeth Stöttinger

James Danckert

James M. Shine

Version history

Copyright

Peer review process

Editors

Gabriel Wainstein

Christopher J. Whyte