Study design.

A, Schematics of a processing model of speech production control. Feedback and feedforward processes interactively mediate the control of speech production. When auditory feedback is perturbed and is inconsistent with the predicted consequence from an internal forward model, sensory error is generated and triggers the inverse model in the feedback control. The output of the feedback control further updates the programming of speech-motor representation to compensate for the auditory perturbation. The auditory perturbation and resultant motor compensation could leave traces in either feedback and/or feedforward control processes that mediate sensorimotor learning, even on a trial-by-trial basis. We separately use two paradigms, firstly to test whether sensorimotor learning arises from the changes in the feedback or feedforward processes, and secondly to test whether the learning can be attributed to the updates in the inverse model or in the speech motor representation. B, Real-time pitch perturbation during vowel production. The pitch of auditory feedback is artificially shifted (blue) from the utterance recorded in the microphone (orange). The shifted auditory feedback is sent to earphones, which causes compensation in the opposite direction of the pitch shift (the upward change in the orange line of the utterance after the downward pitch shift in the blue line). C, The trial sequence of the experiment. Participants undergo 320 trials each day for three consecutive days. The perturbation amounts and directions are randomly applied to each trial.

Serial changes of the immediately preceding trial on the compensatory responses of the current trial.

A, Pitch traces of motor compensation in response to upward or downward perturbations. The orange lines denote the averaged pitch trace across participants. The red horizontal bars denote the times of the significant effects of compensation (p < 0.05). B, Temporal averaged compensation in the periods of interest (150-250 ms after the perturbation onset). In C and D, we plot compensation of the current trial, Ct, as a function of compensation in the immediately preceding trial, Ct-1, and as a function of the perturbation in the immediately preceding trial, Pt-1, respectively. C, In the left plots, each dot represents one trial and each black curve represents one participant. The right plots show the relationship between preceding and current trials after averaging across trials and participants. Color dots projecting to the same sign on the x and y axes indicate that Ct is in the same direction as Ct-1. D, Ct as a function of Pt-1 does not show significant effects, suggesting the current compensation is not influenced by the perturbation amount of the immediately preceding trial. E, The effects of current perturbation (Pt), preceding perturbation (Pt-1) and preceding compensation (Ct-1) on current compensation (Ct) revealed by a generalized linear mixed-effect model. Regression coefficients for each predictor are extracted from the model. Current perturbation and preceding compensation exert opposite and attractive influences, respectively, on the current compensation. Error bars denote SEM across trials. In A-D, shaded areas and error bars denote the standard error of mean (SEM) across participants. *p < 0.05, **p < 0.005.

Serial effects across different vowel categories.

A and B, Ct as a function of Ct-1 when participants produce randomly-presented vowels (A) and when participants produce expectedly-presented vowels (B). In both experiments, significant serial changes to the current motor compensation are observed. C, The amount of compensation, characterized by the amplitude of the DoG curve, shows differences between experiments 1, 2a, and 2b, where participants produce the same vowel, different vowels with uncertainty, and different vowels with certainty, respectively. The learning effects are weaker when producing different vowels, compared with those of the same vowels. Error bars denote one SD of the bootstrap distribution. *p < 0.05, **p < 0.005. See also Figure S1.

Inter-syllabic learning persists only within the word boundary.

A and C, Experiment 3a and 3b. Participants are asked to produce three 5-syllable sentences that have the same syntactic structure in each experiment. The position of the two-syllable noun is either at the end (Exp. 3a) or in the middle of the sentence (Exp. 3b). Each character is presented on the screen for 500 ms, followed by a blank for 500 ms. B and D, Relations between the compensation amount of the preceding syllable (x-axis) and serial-learning amount of the current syllable (y-axis) in Exp. 3a and 3b. The mean pitch averaged across 200-300 ms is separately extracted from the perturbed syllable (black dots in the insert plots) and the subsequent syllable (the dot pointed by a black arrow). The grey-shaped rectangle denotes the two-syllabic word. Only when the two syllables are within a word boundary, significant inter-syllabic learning is observed. **p < 0.005. See also Figures S2 and S3.