Gain neuromodulation mediates perceptual switches: evidence from pupillometry, fMRI, and RNN Modelling

  1. Brain and Mind Center, The University of Sydney, Sydney, Australia
  2. The University of Waterloo, Waterloo, Ontario, Canada
  3. Latin American Brain Health (BrainLat), Universidad Adolfo Ibáñez, Santiago, Chile
  4. Hochschule Fresenius, Köln, Germany
  5. Center for Complex Systems, The University of Sydney, Sydney, Australia

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Tobias Donner
    University Medical Center Hamburg-Eppendorf, Hamburg, Germany
  • Senior Editor
    Floris de Lange
    Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands

Reviewer #1 (Public Review):

Summary:
This paper investigates the neural mechanisms underlying the change in perception when viewing ambiguous figures. Each possible percept is related to an attractor-like brain state and a perceptual switch corresponds to a transition between these states. The hypothesis is that these switches are promoted by bursts of noradrenaline that change the gain of neural circuits. The authors present several lines of evidence consistent with this view: pupil diameter changes during the time point of the perceptual change; a gain change in neural network models promotes a state transition; and large-scale fMRI dynamics in a different experiment suggests a lower barrier between brain states at the change point. However, some assumptions of the computational model seem not well justified and the theoretical analysis is incomplete. The paper would also benefit from a more in-depth analysis of the experimental data.

Strengths:
The main strength of the paper is that it attempts to combine experimental measurements - from psychophysics, pupil measurements, and fMRI dynamics - and computational modeling to provide an emerging picture of how a perceptual switch emerges. This integrative approach is highly useful because the model has the potential to make the underlying mechanisms explicit and to make concrete predictions.

Weaknesses:
A general weakness is that the link between the three parts of the paper is not very strong. Pupil and fMRI measurements come from different experiments and additional analysis showing that the two experiments are comparable should be included. Crucially, the assumptions underlying the RNN modeling are unclear and the conclusions drawn from the simulation may depend on those assumptions.

Main points:
Perceptual tasks in pupil and fMRI experiments: how comparable are these two tasks? It seems that the timing is very different, with long stimulus presentations and breaks in the fMRI task and a rapid sequence in the pupil task. Detailed information about the task timing in the pupil task is missing. What evidence is there that the same mechanisms underlie perceptual switches at these different timescales? Quantification of the distributions of switching times/switching points in both tasks is missing. Do the subjects in the fMRI task show the same overall behavior as in the pupil task? More information is needed to clarify these points.

Computational model:
1. Modeling noradrenalin effects in the RNN: The pupil data suggests phasic bursts of NA would promote perceptual switches. But as I understand, in the RNN neuromodulation is modeled as different levels of gain throughout the trial. Making the neural gain time-dependent would allow investigation of whether a phasic gain change can explain the experimentally observed distribution of switching times.

2. Modeling perceptual switches: in the results, it is described that the networks were trained to output a categorical response, but the firing rates in Fig 2B do not seem categorical but rather seem to follow the input stimulus. The output signals of the network are not shown. If I understand correctly, a trivial network that would just represent the two input signals without any internal computation and relay them to the output would do the task correctly (because "the network's choice at each time point was the maximum of the two-dimensional output", p. 22). This seems like cheating: the very operation that the model should perform is to signal the change, in a categorical manner, not to represent the gradually changing input signals.

3. The mechanism of how increased gain leads to faster switches remains unclear to me. My first intuition was that increasing the gain of excitatory populations (the situation shown in Fig. 2E) in discrete attractor models would lead to deeper attractor wells and this would make it more difficult to switch. That is, a higher gain should lead to slower decisions in this case. However, here the switching time remains constant for a gain between 1 and 1.5. Lowering the gain, on the other hand, leads to slower switching. It is, of course, possible that the RNN behaves differently than classical point attractor models or that my intuition is incorrect (though I believe it is consistent with previous literature, e.g. Niyogi & Wong-Lin 2013 (doi:10.1371/journal.pcbi.1003099) who show higher firing rates - more stable attractors - for increased excitatory gain).

4. From the RNN model it is not clear how changes in excitatory and inhibitory gain lead to slower/faster switching. In order to better understand the role of inhibitory and excitatory gain on switching, I would suggest studying a simple discrete attractor model (a rate model, for example as in Wong and Wang 2006 or Roxin and Ledberg, Plos Comp. Bio 2008) which will allow to study these effects in terms of a very few model parameters. The Roxin paper also shows how to map rate models onto simplified one-dimensional systems such as the one in Fig S3. Setting up the model using this framework would allow for making much stronger, principled statements about how gain changes affect the energy landscape, and under which conditions increased inhibitory gain leads to faster switching.

One possibility is that increasing the excitatory gain in the RNN leads to saturated firing rates. If this is the reason for the different effects of excitatory and inhibitory gain changes, it should be properly explained. Moreover, the biological relevance of this effect should be discussed (assuming that saturation is indeed the explanation).

Alternative mechanisms:
It is mentioned in the introduction that changes in attention could drive perceptual switches. A priori, attention signals originating in the frontal cortex may be plausible mechanisms for perceptual switches, as an alternative to LC-controlled gain modulation. Does the observed fMRI dynamics allow us to distinguish these two hypotheses? In any case, I would suggest including alternative scenarios that may be compatible with the observed findings in the discussion.

Reviewer #2 (Public Review):

Strengths
- the study combines different methods (pupillometry, RNNs, fMRI).
- the study combines different viewpoints and fields of the scientific literature, including neuroscience, psychology, physics, dynamical systems.
- This combination of methods and viewpoints is rarely done, it is thus very useful.
- Overall well-written.

Weaknesses
- The study relies on a report paradigm: participants report when they identify a switch in the item category. The sequence corresponds to the drawing of an object being gradually morphed into another object. Perceptual switches are therefore behaviorally relevant, and it is not clear whether the effect reported correspond to the perceptual switch per se, or the detection of an event that should change behavior (participant press a button indicating the perceived category, and thus switch buttons when they identify a perceptual change). The text mentions that motor actions are controlled for, but this fact only indicates that a motor action is performed on each trial (not only on the switch trial); there is still a motor change confounded with the switch. As a result, it is not clear whether the effect reported in pupil size, brain dynamics, and brain states is related to a perceptual change, or a decision process (to report this change).

- The study presents events that co-occur (perceptual switch, change in pupil size, energy landscape of brain dynamics) but we cannot identify the causes and consequences. Yet, the paper makes several claims about causality (e.g. in the abstract "neuromodulatory tone ... causally mediates perceptual switches", in the results "the system flattening the energy landscape ... facilitated an updating of the content of perception").

- Some effects may reflect the expectation of a perceptual switch, rather than the perceptual switch per se. Given the structure of the task, participants know that there will be a perceptual switch occurring once during a sequence of morphed drawings. This change is expected to occur roughly in the middle of the sequence, making early switches more surprising, and later switches less surprising. Differences in pupil response to early, medium, and late switches could reflect this expectation. The authors interpret this effect very differently ("the speed of a perceptual switch should be dependent on LC activity").

- The RNN is far more complex than needed for the task. It has two input units that indicate the level of evidence for the two categories being morphed, and it is trained to output the dominant category. A (non-recurrent) network with only these two units and an output unit whose activity is a sigmoid transform of the difference in the inputs can solve the task perfectly. The RNN activity is almost 1-dimensional probably for this reason. In addition, the difficult part of the computation done by the human brain in this task is already solved in the input that is provided to the network (the brain is not provided with the evidence level for each category, and in fact, it does not know in advance what the second category will be).

- Basic fMRI results are missing and would be useful, before using elaborate analyses. For instance, what are the regions that are more active when a switch is detected?

- The use of methods from physics may obscure some simple facts and simpler explanations. For instance, does the flatter energy landscape in the higher gain condition reflect a smaller number of states visited in the state space of the RNN because the activity of each unit gets in the saturation range? If correct, then it may be a more straightforward way of explaining the results.

- Some results are not as expected as the authors claim, at least in the current form of the paper. For instance, they show that, when trained to identify which of two inputs u1 and u2 is the largest (with u2=1-u1, starting with u1=1 and gradually decreasing u1), a higher gain results in the RNN reporting a switch in dominance before the true switch (e.g. when u1=0.6 and u2=0.4), and vice et versa with a lower gain. In other words, it seems to correspond to a change in criterion or bias in the RNN's decision. The authors should discuss more specifically how this result is related to previous studies and models on gain modulation. An alternative finding could have been that the network output is a more (or less) deterministic function of its inputs, but this aspect is not reported.

Author Response

We appreciate the insightful and constructive feedback from the reviewers regarding our manuscript, "Gain neuromodulation mediates perceptual switches: evidence from pupillometry, fMRI, and RNN Modelling." The comments have provided us with a number of valuable perspectives that will undoubtedly strengthen the impact and clarity of our work.

We recognize the need for a more detailed and comparative analysis of the perceptual tasks used in our pupil and fMRI experiments. To address these points directly: the jittered intertrial intervals (ITIs) in the fMRI work were deemed necessary to effectively deconvolve the BOLD response (see Stottinger et al., 2018). In our fMRI work, each image was randomly preceded and followed by varying ITIs (2, 4, 6, and 8 seconds), ensuring an equitable distribution across sets and subjects. Importantly, our analysis of both fMRI and behavioral studies, including eye tracking data, indicates that perceptual switch behavior – the point at which switches occur – is consistent across modalities. If more predictive or preparatory activity were present in the fMRI version of the task, we would expect earlier switches or choices and altered reaction time distributions – neither of these signatures was observed in the original study (Stottinger et al., 2018). Importantly, this suggests that the additional time available in the fMRI experiments did not significantly alter behavioral outcomes. Thus, our findings suggest that despite the differences in timing and task structure, the behavioural responses remain consistent across both experimental setups. We will clarify this in the revised manuscript.

In response to the reviewer's comments on our computational model, particularly regarding the modelling of noradrenaline (NA) effects in the RNN, we agree that modelling gain as stationary is a substantial approximation. However, given the slow ramping of pupil diameter, which served as our proxy for gain, it is an approximation that we believe is justified: in the revised manuscript, we will run additional simulations to ensure the validity of this approximation. In addition, whilst we agree that the model is more complicated than is needed for the task, we opted for RNN modelling, in lieu of a simpler modelling approach, because we wanted to use RNN modelling as a method for both hypothesis testing and generation. To build the RNN, the only key elements of model structure we had to specify in advance were the inputs and the target outputs of the network. The solution the RNN arrived at, although involving many more parameters than a simpler model, was entirely determined by optimisation (i.e., not our a priori hypotheses). We feel that this strengthens the result considerably. Importantly, this approach also allowed us to be surprised by the results of the model – for instance, we did not anticipate that the effect of gain on the energy landscape to be primarily mediated by inhibitory gain. In the revised manuscript, we will integrate this line of thinking into the paper. We are also sensitive to the fact that this result is both counterintuitive and difficult to study in high-dimensional dynamical systems like RNNs. In revisions, we will provide further analysis of the RNN and build a 2D approximation to the RNN that can be studied on the phase plane to better conceptually illuminate the mechanisms at play.

Furthermore, we agree with the suggestion to consider alternative mechanisms that might contribute to perceptual switches, such as attention and top-down processing. While our study primarily focuses on LC-mediated gain modulation, we acknowledge the complexity of neural processes involved in perception and will expand our discussion to include these potential mechanisms. Furthermore, noting the importance of moderating the causal language used in our manuscript. We will revise our wording to more accurately reflect the correlational nature of our findings and ensure that our conclusions are firmly grounded in the data presented.

In conclusion, we are enthusiastic about the opportunity to refine our manuscript based on these valuable comments. In an updated version, we will address the overall points by providing clearer explanations of our methods, refining our figures for better readability, and ensuring that our conclusions are supported by robust analysis. We believe that these revisions will not only address the concerns raised but also significantly enhance the overall quality of our research. We thank the reviewers for their thorough and thoughtful critiques and look forward to submitting our revised manuscript.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation