Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorMaria ChaitUniversity College London, London, United Kingdom
- Senior EditorBarbara Shinn-CunninghamCarnegie Mellon University, Pittsburgh, United States of America
Reviewer #1 (Public Review):
This study presents a novel application of the inverted encoding (i.e., decoding) approach to detect the correlates of crossmodal integration in the human EEG (electrophysiological) signal. The method is successfully applied to data from a group of 41 participants, performing a spatial localization task on auditory, visual, and audio-visual events. The analyses clearly show a behavioural superiority for audio-visual localization. Like previous studies, the results when using traditional univariate ERP analyses were inconclusive, showing once more the need for alternative, more sophisticated approaches. Instead, the principal approach of this study, harnessing the multivariate nature of the signal, captured clear signs of super-additive responses, considered by many as the hallmark of multisensory integration. Unfortunately, the manuscript lacks many important details in the descriptions of the methodology and analytical pipeline. Although some of these details can eventually be retrieved from the scripts that accompany this paper, the main text should be self-contained and sufficient to gain a clear understanding of what was done. (A list of some of these is included in the comments to the authors). Nevertheless, I believe the main weakness of this work is that the positive results obtained and reported in the results section are conditioned upon eye movements. When artifacts due to eye movements are removed, then the outcomes are no longer significant.
Therefore, whether the authors finally achieved the aims and showed that this method of analysis is truly a reliable way to assess crossmodal integration, does not stand on firm ground. The worst-case scenario is that the results are entirely accounted for by patterns of eye movements in the different conditions. In the best-case scenario, the method might truly work, but further experiments (and/or analyses) would be required to confirm the claims in a conclusive fashion.
If finally successful, this approach could bring important advances in the many fields where multisensory integration has been shown to play a role, by providing a way to bring much-needed coherence across levels of analysis, from behaviour to single-cell electrophysiology. To achieve this, one would have to make sure that the pattern of super-additive effects, the standard self-imposed by the authors as a proxy for multisensory integration, shows up reliably regardless of eye movement or artifact corrections. One first step toward this goal would be, perhaps, to facilitate the understanding of results in context by reporting both the uncorrected and corrected analyses in the main results section. Second, one could try to support the argument given in the discussion, pointing out the origin of the super-additive effects in posterior electrode sites, by also modelling frontal electrode clusters and showing they aren't informative as to the effect of interest.
Reviewer #2 (Public Review):
Summary:
This manuscript seeks to reconcile observations in multisensory perception - from behavior and neural responses. It is intuitively obvious that perceiving a stimulus via two senses results in better performance than one alone. In fact, it is not uncommon to observe that for a perceptual task, the percentage of correct responses seen with two senses is higher than the sum of the percentage correct obtained with each modality individually. i.e. the gains are "superadditive". The gains of adding a second sense are typically larger when the performance with the first sense is relatively poor - this effect is often called the principle of inverse effectiveness. More generally, what this tells us is that performance in a multisensory perceptual task is a non-linear sum of performance for each sensory modality alone.
Despite this abundant evidence of behavioral non-linearity in multisensory integration, evoked responses (EEG) to such sensory stimuli often show little evidence of it - and this is the problem this manuscript tackles. The key assertion made is that univariate analysis of the EEG signal is likely to average out the non-linear effects of integration. This is a reasonable assertion, and their analysis does indeed provide evidence that a multivariate approach can reveal non-linear interactions in the evoked responses.
Strengths:
It is of great value to understand how the process of multisensory integration occurs, and despite a wealth of observations of the benefits of perceiving the world with multiple senses, we still lack a reasonable understanding of how the brain integrates information. For example - what underlies the large individual differences in the benefits of two senses over one? One way to tackle this is via brain imaging, but this is problematic if important features of the processing - such as non-linear interactions are obscured by the lack of specificity of the measurements. The approach they take to the analysis of the EEG data allows the authors to look in more detail at the variation in activity across EEG electrodes, which averaging across electrodes cannot.
This version of the manuscript is well-written and for the most part clear. It shows a good understanding of the non-linear effects described above (where many studies show a poor understanding of "superadditivity" of perceptual performance) and the report of non-linear summation of neural responses is convincing.
A particular strength of the paper is their use of a statistical model of multisensory integration as their "null" model of neural responses, and the "inverted-encoder" which infers an internal representation of the stimulus which can explain the EEG responses. This encoder generates a prediction of decoding performance, which can be used to generate predictions of multisensory decoding from unisensory decoding, or from a sum of the unisensory internal representations.
In behavioural performance, it is frequently observed that the performance increase from two senses is close to what is expected from the optimal integration of information across the senses, in a statistical sense. It can be plausibly explained by assuming that people are able to weigh sensory inputs according to their reliability - and somewhat optimally. Critically the apparent "superadditive" effect on performance described above does not require any non-linearity in the sum of information across the senses but can arise from correctly weighting the information according to reliability.
The authors apply a similar model to predict the neural responses expected to audiovisual stimuli from the neural responses to audio and visual stimuli alone, assuming optimal statistical integration of information. The neural responses to audiovisual stimuli exceed the predictions of this model and this is the main evidence supporting their conclusion, and it is convincing.
Weaknesses:
The main weakness of the manuscript is that their behavioural data show no evidence of performance that exceeds the predictions of these statistical models. In fact, the models predict multisensory performance from unisensory performance pretty well. So this manuscript presents the opposite problem to that which motivated the study - neural interactions across the senses which appear to be more non-linear than perception. This makes it hard to interpret their results, as surely if these nonlinear neural interactions underlie the behaviour, then we should be able to see evidence of it in the behaviour? I cannot offer an easy explanation for this.
Overall, therefore, I applaud the motivation and the sophistication of the analysis method and think it shows great promise for tackling these problems, but the manuscript unfortunately brushes over an important problem specific to the results. It appeals to the higher-level reasoning - that non-linearity is a behavioural hallmark of integration and therefore we should see it in neural responses. Yet it ignores the fact that the behaviour observed here does not exceed the predictions of the "null" model applied to the neural response.
Part of the problem, I think, is that the authors never explain the difference between superadditivity of perceptual performance (proportion correct) and superadditivity of the underlying processing, which is implied by the EEG results but not their behavior. This is of course a difficult matter to describe succinctly or clearly (I somehow doubt I have). It is however worth addressing. The literature is full of confusing claims of superadditivity. I believe these authors understand this distinction and have an opportunity to represent it clearly for the benefit of all.