Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorNai DingZhejiang University, Hangzhou, China
- Senior EditorAndrew KingUniversity of Oxford, Oxford, United Kingdom
Reviewer #1 (Public review):
Summary:
Zhang et al. report on an ambitious study that investigates multiple aspects of the neural and behavioral underpinnings of auditory-motor surprisal in the context of an auditory-motor learning paradigm (piano keyboard). Using an intricate design comprising several sub-parts and control procedures, they report that early ERPs (50-100 ms latency) reflect violations of established key-pitch mappings.
Strengths:
This is a carefully devised and executed study. The paradigm is quite intricate and, at the same time, addresses multiple aspects of auditory-motor learning, and does so in a rigorous way.
Weaknesses:
Perhaps because of the exhaustive approach, it is sometimes difficult to follow which parts of the experimental design the results come from; there are some questions regarding appropriate statistical methods, the inclusion/treatment of musical background in participants, and the nature (latency & extent) of the identified neural components that detect auditory-motor violations.
Reviewer #2 (Public review):
Summary:
Zhang et al. report an EEG study (n=18) of participants playing a keyboard where the correspondence between keys and pitches is varied to introduce sensory-motor mismatches (discrepancies between sensory inputs and expected sensory consequences of motor commands). They find that the auditory N100 amplitude is enhanced for the initial keystroke following a mapping switch but rapidly attenuates for subsequent keystrokes (showing rapid updating of the forward model), whereas the motor-related P50 amplitude only differentiates trained versus untrained mappings after 30 minutes of goal-directed practice (potentially showing timescales of inverse model updating). Using parallel univariate and mTRF decoding analyses, they conclude that forward models (mapping action to predicted sound) update almost instantly to track short-term context, while inverse models (mapping sound to motor commands) update slowly and require extended, targeted practice.
Strengths
(1) Methodological innovation:
The study utilizes an interesting, continuous auditory-motor paradigm that moves beyond standard trial-by-trial oddball designs, offering a more ecologically valid measure of trial-to-trial adaptation.
(2) Analytical elegance and rigor:
The combination of traditional univariate ERP analyses with multivariate temporal response function (mTRF) decoding is elegant, allowing the authors to successfully dissociate overlapping auditory and motor variance streams.
(3) The dissociation between the rapid adaptation of the N100 forward model and the slower adaptation of the P50 inverse model is interesting.
Weaknesses
(1) Confounded passive listening baseline:
The passive listening control condition lacks an orthogonal behavioural task (e.g., an occasional oddball detection task). Active playing inherently necessitates focused attention on auditory feedback to monitor performance, whereas passive playback does not. The globally weaker stimulus-evoked pattern at electrode Fz during passive listening strongly suggests that the absence of an N100 effect in this condition may simply reflect a lower state of attention, rather than isolating the absence of a motor-driven forward prediction, in particular because the pure sensory suprisal was also enhanced for "firsts" notes, so this could also lead to stronger N1, but this effect may be masked.
(2) Overclaimed theoretical novelty:
The conceptual framing leans excessively on the authors' specific "MirrorNet" framework, presenting foundational, decades-old tenets of the motor control literature (i.e., unsupervised exploration for forward models vs. supervised skill acquisition for inverse models; Wolpert, Jordan, both in the nineties) as their own novel "conjectures." This theory-heavy introduction obscures the paper's actual empirical contribution to the design and the interesting question regarding the distinct temporal adaptation scales of forward versus inverse models. I think some rewriting can improve the paper.
(3) Misplaced surprisal terminology:
In a similar vein, I find the use of the term "auditory-motor surprisal" more theoretical grandstanding than actually useful. The significance statement claims to "extend this principle from sensory processing" but in fact, the concept of sensory motor unexpectedness is again a staple of the forward motor literature. Moreover, nowhere in the paper do they actually estimate sensorimotor surprisal. While the authors compute surprisal for their auditory baseline using IDyOM, their central sensorimotor analysis relies entirely on a simple categorical mismatch (first vs. subsequent keystrokes). The phenomenon can equally be referred to by its established nomenclature-"sensorimotor mismatch" or "sensory motor unexpectedness".
(4) Incremental conceptual advance regarding the N100:
The paper frames the N100 finding as a major discovery, but as far as I know, the attenuation of the auditory N1 to self-generated sounds via accurate motor prediction-and its enhancement during sensorimotor mismatch - is one of the most heavily documented phenomena in the auditory-motor literature (e.g. Timm et al., 2013; Bendixen et al, 2012; 2013). As far as I'm concerned, the authors should clarify that the novelty lies in the novel, elegant design that provides a new way to correct for non-sensory-specific motor-induced attenuation, and characterizing the distinct adaptation timescales of forward versus inverse models -- not in demonstrating N100 modulation by sensorimotor mismatch, which is well-documented, AFAIC.