Inverted encoding of neural responses to audiovisual stimuli reveals super-additive multisensory enhancement

  1. Queensland Brain Institute, The University of Queensland
  2. School of Psychology, The University of Queensland
  3. School of Psychology, University of Sydney

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Maria Chait
    University College London, London, United Kingdom
  • Senior Editor
    Barbara Shinn-Cunningham
    Carnegie Mellon University, Pittsburgh, United States of America

Reviewer #1 (Public Review):

This study presents a novel application of the inverted encoding (i.e., decoding) approach to detect the correlates of crossmodal integration in the human EEG (electrophysiological) signal. The method is successfully applied to data from a group of 41 participants, performing a spatial localization task on auditory, visual, and audio-visual events. The analyses clearly show a behavioural superiority for audio-visual localization. Like previous studies, the results when using traditional univariate ERP analyses were inconclusive, showing once more the need for alternative, more sophisticated approaches. Instead, the principal approach of this study, harnessing the multivariate nature of the signal, captured clear signs of super-additive responses, considered by many as the hallmark of multisensory integration. Unfortunately, the manuscript lacks many important details in the descriptions of the methodology and analytical pipeline. Although some of these details can eventually be retrieved from the scripts that accompany this paper, the main text should be self-contained and sufficient to gain a clear understanding of what was done. (A list of some of these is included in the comments to the authors). Nevertheless, I believe the main weakness of this work is that the positive results obtained and reported in the results section are conditioned upon eye movements. When artifacts due to eye movements are removed, then the outcomes are no longer significant.

Therefore, whether the authors finally achieved the aims and showed that this method of analysis is truly a reliable way to assess crossmodal integration, does not stand on firm ground. The worst-case scenario is that the results are entirely accounted for by patterns of eye movements in the different conditions. In the best-case scenario, the method might truly work, but further experiments (and/or analyses) would be required to confirm the claims in a conclusive fashion.

If finally successful, this approach could bring important advances in the many fields where multisensory integration has been shown to play a role, by providing a way to bring much-needed coherence across levels of analysis, from behaviour to single-cell electrophysiology. To achieve this, one would have to make sure that the pattern of super-additive effects, the standard self-imposed by the authors as a proxy for multisensory integration, shows up reliably regardless of eye movement or artifact corrections. One first step toward this goal would be, perhaps, to facilitate the understanding of results in context by reporting both the uncorrected and corrected analyses in the main results section. Second, one could try to support the argument given in the discussion, pointing out the origin of the super-additive effects in posterior electrode sites, by also modelling frontal electrode clusters and showing they aren't informative as to the effect of interest.

Reviewer #2 (Public Review):

Summary:

This manuscript seeks to reconcile observations in multisensory perception - from behavior and neural responses. It is intuitively obvious that perceiving a stimulus via two senses results in better performance than one alone. In fact, it is not uncommon to observe that for a perceptual task, the percentage of correct responses seen with two senses is higher than the sum of the percentage correct obtained with each modality individually. i.e. the gains are "superadditive". The gains of adding a second sense are typically larger when the performance with the first sense is relatively poor - this effect is often called the principle of inverse effectiveness. More generally, what this tells us is that performance in a multisensory perceptual task is a non-linear sum of performance for each sensory modality alone.

Despite this abundant evidence of behavioral non-linearity in multisensory integration, evoked responses (EEG) to such sensory stimuli often show little evidence of it - and this is the problem this manuscript tackles. The key assertion made is that univariate analysis of the EEG signal is likely to average out the non-linear effects of integration. This is a reasonable assertion, and their analysis does indeed provide evidence that a multivariate approach can reveal non-linear interactions in the evoked responses.

Strengths:

It is of great value to understand how the process of multisensory integration occurs, and despite a wealth of observations of the benefits of perceiving the world with multiple senses, we still lack a reasonable understanding of how the brain integrates information. For example - what underlies the large individual differences in the benefits of two senses over one? One way to tackle this is via brain imaging, but this is problematic if important features of the processing - such as non-linear interactions are obscured by the lack of specificity of the measurements. The approach they take to the analysis of the EEG data allows the authors to look in more detail at the variation in activity across EEG electrodes, which averaging across electrodes cannot.

This version of the manuscript is well-written and for the most part clear. It shows a good understanding of the non-linear effects described above (where many studies show a poor understanding of "superadditivity" of perceptual performance) and the report of non-linear summation of neural responses is convincing.

A particular strength of the paper is their use of a statistical model of multisensory integration as their "null" model of neural responses, and the "inverted-encoder" which infers an internal representation of the stimulus which can explain the EEG responses. This encoder generates a prediction of decoding performance, which can be used to generate predictions of multisensory decoding from unisensory decoding, or from a sum of the unisensory internal representations.

In behavioural performance, it is frequently observed that the performance increase from two senses is close to what is expected from the optimal integration of information across the senses, in a statistical sense. It can be plausibly explained by assuming that people are able to weigh sensory inputs according to their reliability - and somewhat optimally. Critically the apparent "superadditive" effect on performance described above does not require any non-linearity in the sum of information across the senses but can arise from correctly weighting the information according to reliability.

The authors apply a similar model to predict the neural responses expected to audiovisual stimuli from the neural responses to audio and visual stimuli alone, assuming optimal statistical integration of information. The neural responses to audiovisual stimuli exceed the predictions of this model and this is the main evidence supporting their conclusion, and it is convincing.

Weaknesses:

The main weakness of the manuscript is that their behavioural data show no evidence of performance that exceeds the predictions of these statistical models. In fact, the models predict multisensory performance from unisensory performance pretty well. So this manuscript presents the opposite problem to that which motivated the study - neural interactions across the senses which appear to be more non-linear than perception. This makes it hard to interpret their results, as surely if these nonlinear neural interactions underlie the behaviour, then we should be able to see evidence of it in the behaviour? I cannot offer an easy explanation for this.

Overall, therefore, I applaud the motivation and the sophistication of the analysis method and think it shows great promise for tackling these problems, but the manuscript unfortunately brushes over an important problem specific to the results. It appeals to the higher-level reasoning - that non-linearity is a behavioural hallmark of integration and therefore we should see it in neural responses. Yet it ignores the fact that the behaviour observed here does not exceed the predictions of the "null" model applied to the neural response.

Part of the problem, I think, is that the authors never explain the difference between superadditivity of perceptual performance (proportion correct) and superadditivity of the underlying processing, which is implied by the EEG results but not their behavior. This is of course a difficult matter to describe succinctly or clearly (I somehow doubt I have). It is however worth addressing. The literature is full of confusing claims of superadditivity. I believe these authors understand this distinction and have an opportunity to represent it clearly for the benefit of all.

Author response:

Response to Reviewer #1 (Public Review):

We thank the reviewer for their constructive criticism of our study, their proposed solutions, and for highlighting areas of the methodology and analytical pipeline where explanations were unclear or unsatisfactory. We will take the reviewer’s feedback into account to improve the clarity and readability of the revised manuscript. We acknowledge the importance of ruling out eye movements as a potential confound. We address these concerns briefly below, but a more detailed explanation (and a full breakdown of the relevant analyses, including the corrected and uncorrected results) will be provided in the revised manuscript.

First, the source of EEG activity recorded from the frontal electrodes is often unclear. Without an external reference, it is challenging to resolve the degree to which frontal EEG activity represents neural or muscular responses1. Thus, as a preventative measure against the potential contribution of eye movement activity, for all our EEG analyses, we only included activity from occipital, temporal, and parietal electrodes (the selected electrodes can be seen in the final inset of Figure 3).

Second, as suggested by the reviewer, we re-ran our analyses using the activity measured from the frontal electrodes alone. If the source of the nonlinear decoding accuracy in the AV condition was muscular activity produced by eye movements, we would expect to observe better decoding accuracy from sensors closer to the source. Instead, we found that decoding accuracy from the frontal electrodes (peak d' = 0.08) was less than half that of decoding accuracy from the more posterior electrodes (peak d' = 0.18). These results suggest that the source of neural activity containing information about stimulus position was located over occipito-parietal areas, consistent with our topographical analyses (inset of Figure 4).

Third, we compared the average eye movements between the three main sensory conditions (auditory, visual, and audiovisual). In the visual condition, there was little difference in eye movements corresponding to the five stimulus locations, likely because the visual stimuli were designed to be spatially diffuse. For the auditory and audiovisual conditions, there was more distinction between eye movements corresponding to the stimulus locations. However, these appeared to be the same between auditory and audiovisual conditions. If consistent saccades to audiovisual stimuli had been responsible for the nonlinear decoding we observed, we would expect to find a higher positive correlation between horizontal eye position and stimulus location in the audiovisual condition than in the auditory or visual conditions. Instead, we found no difference in correlation between audiovisual and auditory stimuli, indicating that eye movements were equivalent in these conditions and unlikely to explain better decoding accuracy for audiovisual stimuli.

Finally, we note that the stricter eye movement criterion acknowledged in the Discussion section of the original manuscript resulted in significantly better audiovisual d' than the MLE prediction, but this difference did not survive cluster correction. This is an important distinction to make as, when combined with the results described above, it seems to support our original interpretation that the stricter criterion combined with our conservative measure of (mass-based) cluster correction2 led to type 2 error.

References

(1) Roy, R. N., Charbonnier, S., & Bonnet, S. (2014). Eye blink characterization from frontal EEG electrodes using source separation and pattern recognition algorithms. Biomedical Signal Processing and Control, 14, 256–264.

(2) Pernet, C. R., Latinus, M., Nichols, T. E., & Rousselet, G. A. (2015). Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields: A simulation study. Journal of Neuroscience Methods, 250, 85–93.

Response to Reviewer #2 (Public Review):

We thank the reviewer for their insight and constructive feedback. As emphasized in the review, an interesting question that arises from our results is that, if the neural data exceeds the optimal statistical decision (MLE d'), why doesn’t the behavioural data? We agree with the reviewer’s suggestion that more attention should be devoted to this question, and plan to provide a deeper discussion of the relationship between behavioural and neural super-additivity in the revised manuscript. We also note that while this discrepancy remains unexplained, our results are consistent with the literature. That is, both non-linear neural responses (single-cell recordings) and behavioural responses that match MLE are reliable phenomenon in multisensory integration1,2,3,4.

One possible explanation for this puzzling discrepancy is that behavioural responses occur sometime after the initial neural response to sensory input. There are several subsequent neural processes between perception and a behavioural response5, all of which introduce additional noise that may obscure super-additive perceptual sensitivity. In particular, the mismatch between neural and behavioural accuracy may be the result of additional neural processes that translate sensory activity into a motor response to perform the behavioural task.

Our measure of neural super-additivity (exceeding optimally weighted linear summation) differs from how it is traditionally assessed (exceeding summation of single neuron responses)2. However, neither method has yet fully explained how this neural activity translates to behavioural responses, and we think that more work is needed to resolve the abovementioned discrepancy. However, our method will facilitate this work by providing a reliable method of measuring neural super-additivity in humans, using non-invasive recordings.

References

(1) Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14(3), 257–262.

(2) Ernst, M. O., & Banks, M. S., (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429–433.

(3) Meredith, M. A., & Stein, B. E. (1993). Interactions among converging sensory inputs in the superior colliculus. Science, 221, 389–391.

(4) Stanford, T. R., & Stein, B. E. (2007). Superadditivity in multisensory integration: putting the computation in context. Neuroreport 18, 787–792.

(5) Heekeren, H., Marrett, S. & Ungerleider, L. (2008). The neural systems that mediate human perceptual decision making. Nature Reviews Neuroscience, 9, 467–479.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation