A single exposure to altered auditory feedback causes observable sensorimotor adaptation in speech

  1. Lana Hantzsch
  2. Benjamin Parrell  Is a corresponding author
  3. Caroline A Niziolek  Is a corresponding author
  1. Waisman Center, University of Wisconsin–Madison, United States
  2. Department of Communication Sciences and Disorders, University of Wisconsin–Madison, United States

Abstract

Sensory errors induce two types of behavioral changes: rapid compensation within a movement and longer-term adaptation of subsequent movements. Although adaptation is hypothesized to occur whenever a sensory error is perceived (including after a single exposure to altered feedback), adaptation of articulatory movements in speech has only been observed after repeated exposure to auditory perturbations, questioning both current theories of speech sensorimotor adaptation and the universality of more general theories of adaptation. We measured single-exposure or ‘one-shot’ learning in a large dataset in which participants were exposed to intermittent, unpredictable perturbations of their speech acoustics. On unperturbed trials immediately following these perturbed trials, participants adjusted their speech to oppose the preceding shift, demonstrating that learning occurs even after a single exposure to auditory error. These results provide critical support for current theories of sensorimotor adaptation in speech and align speech more closely with learning in other motor domains.

Editor's evaluation

The paper establishes the presence of a single-trial adaptation response to the perturbation of the first formant of a vowel in speech production, an effect that should be of interest to the sensorimotor community in general. The analysis is conducted on existing data from 6 published studies and the effects are shown in a convincing fashion. The paper also explores the relationship between the within-trial compensation and the next-trial adaptation.

https://doi.org/10.7554/eLife.73694.sa0

Introduction

Auditory feedback plays a major role in both online execution and refinement of speech motor plans, as observed when the auditory feedback participants receive about their own speech is perturbed in real time (Houde and Jordan, 1998; Purcell and Munhall, 2006b; Tourville et al., 2008; Villacorta et al., 2007). Two types of behavior have been the primary focus of auditory perturbation studies in speech, which most typically alter a speaker’s vowel formants (the resonant frequencies of the vocal tract that distinguish vowels). First, when unpredictable perturbations are delivered, speakers produce a compensation response—an online, within-trial adjustment to oppose the perturbation (Purcell and Munhall, 2006b; Tourville et al., 2008). Second, consistent perturbations lead to sensorimotor adaptation—a learned change in speech behavior that is observable from the onset of a subsequent speech movement and which persists even after the perturbation is removed (Houde and Jordan, 1998; Purcell and Munhall, 2006a).

These behaviors are widely considered to be driven by sensory prediction errors (differences between expected and perceived sensory feedback), although models differ in the proposed mechanism by which this occurs. In the DIVA (Directions Into Velocities of Articulators) model (Tourville and Guenther, 2011), sensory prediction errors lead to feedback-based corrective motor commands (i.e., the within-trial compensation response) which are subsequently incorporated into the feedforward motor program used for future productions of the same syllables, creating the adaptation response (Kawato et al., 1987). An alternative theoretical account of adaptation (Houde and Nagarajan, 2011) suggests sensory prediction errors instead directly lead to updates of internal models in the sensorimotor control system, either to forward models predicting the sensory outcomes of actions (Bastian, 2006; Haith and Krakauer, 2013; Houde and Nagarajan, 2011; Krakauer and Mazzoni, 2011; Shadmehr et al., 2010), to the control policy guiding action (Hadjiosif et al., 2020), or to both (Wolpert et al., 1998; Wolpert and Kawato, 1998).

Both the compensation-based and internal-model hypotheses of sensorimotor adaptation predict that learning in speech occurs progressively, with sensory feedback from each utterance causing updates to feedforward commands, such that changes in speech production should be evident even after a single trial with altered auditory feedback. Such one-shot adaptation has been observed in limb control, where a visuomotor perturbation on an isolated trial affects reach direction on the following trial (Diedrichsen et al., 2005; Joiner et al., 2017; Ruttle, 2021). However, the occurrence of such one-shot adaptation has not been definitively established in speech. Although Cai et al., 2012, observed that first formant (F1) production in the first 50 ms of perturbed trials which closely followed another perturbed trial tended to oppose the preceding perturbation’s direction, more recent work explicitly testing for such single-trial effects did not find evidence of a measurable change (Daliri et al., 2020). This failure to find one-shot adaptation in speech questions both current theories of speech sensorimotor adaptation as well as the universality of domain-general theories (e.g., Houde and Nagarajan, 2011; Kawato et al., 1987; Hadjiosif et al., 2020).

Here, we aim to further investigate the mechanisms underlying sensorimotor adaptation by measuring one-shot adaptation in speech. To detect this potentially small effect, data from six prior studies (Niziolek et al., 2014; Niziolek and Guenther, 2013; Niziolek and Parrell, 2021; Parrell et al., 2017) were compiled for this analysis (131 total participants, 18–40 participants per study). In all studies, participants read aloud monosyllabic words while receiving real-time auditory playback of their speech. On a given trial, this feedback was either veridical (unperturbed trials) or unpredictably perturbed via an upward or downward shift in F1 (perturbed trials) (Figure 1A). Perturbed trials were used to calculate compensation responses, and unperturbed trials which occurred directly after a perturbed trial (post-perturbation trials) were used to calculate one-shot adaptation responses (see Figure 1B). We hypothesized that F1 values would be higher for trials that occurred directly after a downward perturbation and lower in trials that occurred directly after an upward perturbation, such that they echo the preceding compensation response.

Perturbation methodology.

(A) Spectrogram of the word ‘bed’, demonstrating an applied downward F1 perturbation. The F1 frequency of the audio feedback (red) is lowered from the original utterance (yellow). (B) Sample trial sequence from Study 4. Open circles indicate trials in which a perturbation was applied, used to calculate compensation. Closed circles indicate trials in which no perturbation occurred; ‘post-up’ and ‘post-down’ trials were used to calculate one-shot adaptation.

This approach also allows us to test the feedback-command-based hypothesis of adaptation in speech, which suggests that there should be a correlation between the magnitude of compensation and subsequent one-shot adaptation at the trial level. While this correlation has been observed in reaching (Albert and Shadmehr, 2016), most studies have failed to identify such a clear relationship in speech (Daliri, 2021; Franken et al., 2019; Lester-Smith et al., 2020; Parrell et al., 2017; Raharjo et al., 2021), possibly because they did not use such a direct trial-to-trial measurement method. The presence of such a relationship at the trial level would be compatible with both the feedback-command-based and internal-model hypothesis of adaptation; alternatively, the absence of such a relationship would only support the internal-model hypothesis.

Results

Compensation

In the 150–250 ms time window after vowel onset, trials in which an upward F1 shift occurred (up-shifted trials) had reliably lower F1 values (–3.99±0.33 mels (SE)) than trials in which a downward F1 shift occurred (2.69±0.33 mels) (down-shifted trials) (β=–6.93, SE=0.66, p<0.001, d=0.21; Figure 2A, left panel). This was also reflected at the individual level; participants’ average F1 in the same time window was substantially lower across up-shifted trials (–2.75±1.56 mels) than down-shifted trials (4.40±1.25 mels) (paired t-test, t(130) = –7.00, p<0.001, d=0.91; Figure 2B, left panel). Normalized F1 was significantly different from 0 in both up-shifted trials (t(130) = –3.63, p<0.001, d=0.49) and down-shifted trials (t(130) = 7.24, p<0.0001, d=0.89). Additionally, a cluster-based permutation test showed significant differences from 0 starting at 100–125 ms after vowel onset for all trial types (Figure 2A, horizontal bars).

Behavioral responses to auditory perturbations.

(A) Average normalized F1 for trials with upward (blue) or downward (red) perturbations. Error bars show standard error across participants. Highlighted regions illustrate the time periods of interest for compensation (left) and one-shot adaptation (right). Horizontal bars denote times with significant effects (p<0.05; n=131) as determined by cluster-based permutation tests (red and blue: difference from 0, gray: difference between conditions). (B) Probability distributions and boxplots of participants’ average compensation and adaptation responses in the time periods of interest (n=131).

One-shot adaptation

Participants produced one-shot adaptation responses which paralleled the directional pattern seen in the compensation response, though at a lower magnitude. In the 0–100 ms time window after vowel onset, trials that occurred immediately after an upward F1 shift (post-up trials) had F1 values (–1.55±0.26 mels) that were reliably lower than trials that occurred immediately after a downward F1 shift (post-down trials, 0.59±0.27 mels) (β=–2.14, SE=0.53, p<0.001, d=0.079; Figure 2A, right panel). Likewise, participants’ average F1 was lower across post-up trials (–2.08±1.33 mels) than across post-down trials (0.82±1.39 mels) (paired t-test, t(130) = –2.98, p=0.0034, d=0.38; Figure 2B, right panel). Normalized F1 in post-up trials was significantly less than 0 in this time window (t(130) = –3.2, p=0.0016, d=0.38). While normalized F1 in post-down trials was numerically larger than 0, this difference was not significant in the 0–100 ms window (t(130) = 1.2, p=0.23, d=0.15); however, a cluster-based permutation test showed significant differences from 0 across the syllable for all trial types (Figure 2A, horizontal bars).

Relationship between behavioral responses

At the participant level, there was a significant positive relationship between compensation and one-shot adaptation (β=0.14, SE=0.058, p=0.015, η2=0.02), such that participants who produced larger compensation responses tended to adapt more (Figure 3A). Conversely, the trial-level model revealed no main effect of compensation response (β=–0.033, SE=–0.053, p=0.53) (Figure 3B). However, we did observe a small but significant interaction between shift magnitude and compensation response (β=0.16, SE=0.052, p=0.0023, η2=0.0009), such that higher shift magnitudes elicited a stronger effect of compensation on adaptation. Along with the finding that larger shift magnitudes led to larger one-shot adaptation responses (β=7.46, SE=3.34, p=0.03, η2=0.04), this suggests that trial-wise compensation may be predictive of adaptation only at larger shift magnitudes. Stronger evidence for a trial-level effect could be seen by correlating compensation and adaptation within each participant, which yielded a distribution of coefficients whose mean was significantly larger than 0 (mean r=0.21, 78/92 participants r>0, t=11.07, p<0.0001, d=1.6). There was no relationship between the strength of a participant’s correlation between adaptation and compensation and their overall adaptation magnitude (r=–0.005, t=–0.04, p=0.964); in other words, it was not the case that this correlation was only observed in participants who adapted.

Correlation between compensation and one-shot adaptation.

(A) Participant-level correlation. Each participant contributed two data points: their average response to up-shifted and their average response to down-shifted trials. The average applied F1 shift magnitude is displayed via the color gradient (blue = low shift magnitude, yellow = higher shift magnitude). The trend line (y=0.14x+0.93) represents the main effect of compensation on one-shot adaptation obtained from the linear mixed model. (B) Trial-level correlation. Each pair of perturbation and post-perturbation trials is a data point.

Discussion

At both the trial and participant level, one-shot adaptation was detected in post-perturbation trials, where F1 values reliably opposed the perturbation in the previous trial. This shows that learning occurs continuously when the sensorimotor system detects a discrepancy between expected and perceived auditory feedback, as predicted by current models of sensorimotor adaptation in speech. While the magnitude of this one-shot adaptation may be small (1–2 mels), it is relatively substantial when accounting for the fact that a typical perturbation of ~100–150 mels causes an average F1 change of only 40–50 mels over the course of 100 or more trials (Katseff et al., 2012; MacDonald et al., 2010; Munhall et al., 2009; Purcell and Munhall, 2006a). Moreover, our estimate of one-shot adaptation is likely conservative for at least three reasons. First, all but one of the studies in our dataset presented multiple stimulus words in pseudorandom order; in ~53% of the trial pairs, participants pronounced different words on the perturbed and subsequent unperturbed trial. Although sensorimotor learning can generalize across words with the same vowel (Rochet-Capellan et al., 2012), such generalization is only partial, and a larger adaptation effect likely would have emerged with uniform word pairs. Second, our planned analysis evaluated adaptation during the first 100 ms of each vowel. Adaptation magnitude was greater (1.85±0.50 mels) during the 50–150 ms window, which excludes the consonant transition. Lastly, studies analyzed here used random, inconsistent perturbations across trials. Inconsistent perturbations are commonly used in studies of reaching adaptation in order to study learning that occurs after a single exposure (e.g., Diedrichsen et al., 2005; Joiner et al., 2017; Ruttle, 2021); however, such inconsistencies may decrease the rate of adaptation compared to consistent perturbations (Albert et al., 2021). One-shot adaptation in speech may therefore have an even greater magnitude in the typical case where perturbations are consistent across trials.

Though an individual’s average compensation magnitude was reliably predictive of their average one-shot adaptation, a trial-level relationship was present but less reliable: it was mediated by shift magnitude and, when examined at the individual level, present in some but not all participants. It is unclear whether these two behavioral responses have a direct feedforward relationship (as predicted by the DIVA model), or if the observed correlations could best be explained by compensation and one-shot adaptation occurring via separate mechanisms driven by the same sensory error (as may be predicted by internal-model hypotheses). However, the less reliable within-participant relationship may be more consistent with models of adaptation that rely on updates to internal models compared to models that use feedback corrections to update future feedforward commands. Similarly mixed results on the causal relationship between feedback commands and feedforward learning have been reported in the reaching literature (Albert and Shadmehr, 2016; Kim et al., 2021; Tseng et al., 2007).

Overall, these results provide evidence that a single exposure to altered auditory feedback induces ‘one-shot’ adaptation in the speech sensorimotor system. This is consistent with current models of adaptation in speech specifically and in human movement more broadly; within these frameworks, one-shot adaptation is an effect that may continually build upon itself to create more enduring adaptation responses. Further comparison of single-trial vs. longer-timescale sensorimotor learning in the same individuals is warranted to strengthen this claim. The expected relationship between compensation and adaptation was observed reliably at the participant level and somewhat less reliably at the trial level. Our results provide evidence that adaptation in speech may operate in a similar manner as in other motor domains. As a well-learned natural behavior that relies primarily on implicit learning, speech offers a unique, ecologically valid paradigm to further our understanding of the underlying mechanisms driving sensorimotor adaptation.

Materials and methods

Participants

We reanalyzed data from six previous studies examining online compensation responses to formant frequency alterations with similar speech stimuli and perturbation schedules. Data were included if participants met inclusion criteria for their respective studies and if the formant shifts they received were opposite or near-opposite each other in 2D formant space (separated by an angle of 180±20° when plotted together in F1/F2 space). Data from 91 participants met these criteria; 40 of these participants contributed to two of the included studies. All participants were native speakers of American English and reported no history of speech, hearing, or neurological disorders. Informed consent and consent to publish was obtained for all participants. The experimental protocols were approved by the Institutional Review Board of the institutions from which data were collected: the University of Wisconsin–Madison, the Massachusetts Institute of Technology, the University of California, San Francisco, and the University of California, Berkeley. The University of Wisconsin–Madison Minimal Risk Research IRB approved our procedures to analyze the previously collected data (MRR IRB 2017-1509).

Auditory perturbation

Request a detailed protocol

Details of the six studies are provided in Table 1. In all studies, participants spoke aloud monosyllabic English words containing the vowel /ɛ/ (as in head), which were presented as text on a screen. Simultaneously, participants heard real-time auditory feedback of their speech through headphones. On a pseudorandom subset of trials (25–50%), auditory feedback was altered with one of two real-time feedback perturbation systems, Audapter (Cai et al., 2008; Tourville et al., 2013) or Feedback Utility for Speech Production (FUSP) (Katseff et al., 2012; Parrell et al., 2017; Figure 1). Briefly, linear predictive coding (LPC) was used to model the vowel portion of the signal and apply a formant shift in real time during speech. Unaltered trials (50–75% of trials) underwent the same processing pipeline but with no alteration to the formants, such that auditory feedback in all trials had the same (minimal) delay. The magnitude and direction of the applied formant shift varied slightly across studies. Studies 1, 2, 3, and 4 shifted F1 upward and downward at a consistent magnitude (in mels or Hz) that was applied to all participants. Studies 5 and 6 each calculated participant-specific shift magnitudes for both F1 and F2 (in mels or Hz) along a vector pointing from the target vowel /ɛ/ to adjacent vowels /ɪ/ (as in hid) and /æ/ (as in had). For these studies, only the F1 portion of the vector was considered in the analysis; perturbations that increased F1 (/ɛ/ to /æ/) were considered ‘up’ shifts and perturbations that decreased F1 (/ɛ/ to /ɪ/) were considered ‘down’ shifts. All formant values were converted into mels for purposes of this analysis.

Table 1
Summary of the included studies.
Study 1 (Parrell et al., 2017)Study 2 (Parrell et al., 2021)Study 3 (Niziolek and Parrell, 2021)Study 4 (Niziolek and Parrell, 2021)Study 5 (Niziolek and Guenther, 2013)Study 6 (Niziolek et al., 2014)
# of participants included in analysis14/1413/1540/40*40/40*11/1815/17
# of outliers110000
Wordsbeck, bet, deck, debt, pet, techdead, fed, said, shedbed, dead, headbed, dead, headbed, bet, dead, deb, debt, ped, tech, tedhead
# of trials160120240240400800
# of perturbed trials80 (50%)60 (50%)80 (33.33%)80 (33.33%)100 (25%)400 (50%)
F1 shift magnitude (mels)123.6±10125125125107.9±29.994.3±6.8
Perturbation methodFUSPAudapterAudapterAudapterAudapterFUSP
  1. The same group of participants contributed to both studies 3 and 4.

Behavioral measures and statistical analysis

Request a detailed protocol

Our primary measure of interest was one-shot adaptation, an adaptive response that persists in the trial following an isolated perturbation. In order to examine whether one-shot adaptation is related to feedback-based corrections on the previous trial, we additionally measured the online compensation response. These behavioral responses were examined at both the trial level and the participant level.

Trials with a vowel duration of less than 100 ms were excluded from analysis (<1%). Two participants were excluded from the analysis as outliers (average compensation or one-shot response >4 SD from mean).

Compensation

Request a detailed protocol

At the trial level, compensation response was operationalized as the mean normalized F1 produced during the 150–250 ms time window of trials in which a perturbation occurred (perturbation trials). More specifically, participant- and word-specific baseline F1 trajectories were first calculated from the F1 trajectories of unperturbed trials (baseline trials). The F1 trajectory of each perturbation trial was then normalized by subtracting the word-specific baseline mean F1 trajectory from it. The compensation response for each perturbation trial was then defined as the mean F1 value within 150–250 ms after vowel onset, after the typical onset latency of compensation. A 200–300 ms time window was originally planned for this analysis; however, only 46% of produced vowels had a duration of at least 300 ms, whereas 80% of vowels lasted until the end of the 150–250 ms time window.

Average compensation response was also calculated at the participant level, operationalized as a participant’s mean normalized F1 across the 150–250 ms window of their perturbation trials. Again, the F1 trajectory of each perturbation trial was normalized via a participant- and word-specific baseline. Then for each participant, two average F1 trajectories were calculated: one trajectory that averaged the normalized trajectories across all trials containing an upward perturbation and one trajectory that averaged across all trials containing a downward perturbation. The participant’s average compensation response for each perturbation direction (up and down) was calculated as the mean F1 value in the 150–250 ms time window after vowel onset of these averaged perturbation trajectories.

In the trial level analysis, a linear mixed model was employed to investigate the effect of perturbation direction on compensation response: Compensation response ~ perturbation direction + (1 | participant) + (1 | study). Effect size was calculated by dividing β by the residual standard deviation. At the participant level, a paired t-test was used to evaluate the distribution of participants’ mean compensation response to upward perturbations vs. downward perturbations. Additional one-sample t-tests were conducted for each perturbation type against a mean of 0. Cohen’s d was calculated to determine effect size. Examining the entire 0–250 ms window, a cluster-based permutation test was used to find clusters of time points in which the compensation response for each condition differed from 0 and, separately, from each other (Maris and Oostenveld, 2007).

One-shot adaptation

Request a detailed protocol

At the trial level, one-shot adaptation response was calculated as the mean normalized F1 produced in the first 100 ms of unperturbed trials that occurred directly after a perturbed trial (post-perturbation trials). Again, participant- and word-specific baseline trajectories were calculated, though using F1 trajectories from unperturbed trials that directly followed another unperturbed trial (baseline trials). The F1 trajectories of each post-perturbation trial were then normalized by subtracting the word-specific baseline mean F1 trajectory. The one-shot adaptation response for each post-perturbation trial was calculated as the mean F1 value in the first 100 ms of the normalized trajectory. Only F1 values from the initial 100 ms of the vowel were included, limiting the influence of auditory-based feedback control mechanisms, which have a latency of 100–150 ms in speech (Cai et al., 2012; Parrell et al., 2017; Tourville et al., 2008).

At the participant level, the one-shot adaptation response was calculated as a participant’s mean normalized F1 in the first 100 ms of their average post-perturbation trial F1 trajectory. Again, the F1 trajectory of each post-perturbation trial was normalized via a participant- and word-specific baseline. Then, for each participant, two average F1 trajectories were calculated: one trajectory that averaged the normalized trajectories across all trials that occurred after an upward perturbation and one trajectory that averaged across all trials that occurred after a downward perturbation. The participant’s average one-shot adaptation response for each perturbation direction (up and down) was calculated as the mean F1 value in the first 100 ms of these averaged post-perturbation trajectories.

At the trial level, a linear mixed model was employed to investigate the effect of perturbation direction on one-shot adaptation: One-shot adaptation response ~ perturbation direction + (1 | participant) + (1 | study). Effect size was calculated by dividing β by the residual standard deviation. At the participant level, a paired t-test was implemented to assess the distribution of participants’ mean one-shot adaptation response to upward perturbations vs. downward perturbations. Additional one-sample t-tests were conducted for each post-perturbation type against a mean of 0. Cohen’s d was calculated to determine effect size and conduct a power estimation. As in the compensation analysis, a cluster-based permutation test identified clusters of time points in which the adaptation response for each condition differed from 0 and, separately, from each other.

Relationship between behavioral responses

Request a detailed protocol

In order to assess the relationship between compensation and the one-shot adaptation that followed it, we fitted a linear mixed effects model to one-shot adaptation with compensation, perturbation magnitude, and perturbation condition as fixed factors and with participant as a random intercept. To maintain a standardized magnitude measure between the two perturbation directions, compensation and one-shot adaptation responses from upward-shifted trials were multiplied by –1, removing the directional difference between up- and down-perturbation conditions. Separate analyses were conducted at the participant level (averaging across all trials) and at the individual trial level. To avoid problems in the linear models caused by predictors of very different scales, each perturbation magnitude was normalized by dividing by the mean of all perturbation magnitudes across participants. Study was not included as a separate random intercept in the model as it introduced singularity to the model due to its collinearity with participant and shift magnitude.

At the trial level, compensation response was intended to be included as a random slope by participant, however, this slope was removed because the model failed to converge. As a separate test of the within-subject relationship between compensation and one-shot adaptation, a one-sample t-test was conducted on participant correlations between their compensation and adaptation responses (sign-corrected). Pearson’s r was calculated for each participant, correlating compensation with subsequent adaptation at the trial level. A Fisher transformation was used to convert the correlation coefficients into z-scores prior to running the one-sample t-test; the mean was converted back to an r value for interpretation here.

All statistical analysis was conducted in R (R Development Core Team, 2020). Linear mixed effects models and their simplest explanatory models (calculated via stepwise regression) were generated using the lme4 package (Bates et al., 2015). Statistical significance of the final model was assessed with the lmerTest package, which uses the Satterthwaite method to estimate degrees of freedom (Kuznetsova et al., 2017). Power analyses for t-tests were conducted with the pwr package (Champely, 2020). Correlation between compensation and one-shot adaptation was then assessed with a Pearson’s r correlation coefficient using the MuMIn package (Barton, 2020). Effect sizes were calculated using the effectsize package (Ben-Shachar et al., 2020). Data and analysis code is available at https://github.com/blab-lab/postMan, (Hantzsch, 2022a copy archived at swh:1:rev:6cf539d0662552f27d1560a250285e49edde82c4). Some of the functions rely on additional code available at https://github.com/carrien/free-speech, (Hantzsch, 2022b copy archived at swh:1:rev:e065de8fa8c49ac9795f1865df5d171f0869666a).

Data availability

Data and analysis code are available on GitHub at https://github.com/blab-lab/postMan, (copy archived at swh:1:rev:6cf539d0662552f27d1560a250285e49edde82c4).

References

  1. Conference
    1. Cai S
    2. Boucek M
    3. Ghosh S
    4. Guenther FH
    5. Perkell J
    (2008)
    A system for online dynamic perturbation of formant trajectories and results from perturbations of the mandarin triphthong /iau
    Proceedings of the 8th International Seminar on Speech Production. pp. 65–68.
  2. Conference
    1. Niziolek CA
    2. Nagarajan SS
    3. Houde JF
    (2014)
    Sensorimotor adaptation in speech and its effects on auditory monitoring. Program No. 631.14
    Neuroscience Meeting Planner. Washington, DC: Society for Neuroscience, 2014.

Decision letter

  1. Jörn Diedrichsen
    Reviewing Editor; Western University, Canada
  2. Barbara G Shinn-Cunningham
    Senior Editor; Carnegie Mellon University, United States
  3. Jörn Diedrichsen
    Reviewer; Western University, Canada

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "A single exposure to altered auditory feedback causes observable sensorimotor adaptation in speech" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, including Jörn Diedrichsen as Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Barbara Shinn-Cunningham as the Senior Editor.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) The relationship between online correction and subsequent adaptation across perturbation sizes need to be clarified – especially the between and within-person relationships need to be more cleanly separated – see detailed comments by both reviewers.

2) The adaptation response in each direction (up and down perturbation) should be tested against zero.

3) The authors should discuss to what degree adaptation to random perturbations can serve as a good model of adaptation response occurring during systematic perturbations. For reaching there is an extensive literature on potential differences and similarities between random and systematic perturbations.

Reviewer #1 (Recommendations for the authors):

Line 143: Was there a difference between same-word and different-word adaptation in this study?

Line 277-288: It would be good to clarify both in the results and here in the beginning of the paragraph that the compensation, one-shot adaptation response, and perturbation are entered in the model corrected for perturbation direction. This information comes a bit late.

Line 277-288: Was perturbation size included in the model as a continuous variable, or in a one-hot categorical encoding? It seems that the range was 96-125mels across study with one size per study? When you report a significant interaction between perturbation size and compensatory response - what shape did the interaction take? It would be good to show this result broken up by study, so the reader can judge to what degree this maybe drive by differences between studies.

Line 289ff: Using Monte-carlo study to test for within-subject effects is somewhat indirect. Why not use the following simpler and (in my opinion) more convincing analysis: Estimate the slope of the compensation-adaptation for each subject separately? Then one can test is one-sample t-test of all the slopes against zero - or test for differences across perturbation sizes.

Reviewer #2 (Recommendations for the authors):

L 183 What does perturbations being separated by 180 deg in F1/F2 space mean? Please explain.

L 101 Responses to upward and downward shifts are reported to be statistically different than one another. However, it would be important to show that each shift is reliably different than zero. Is this the case? {plus minus} values are given. Are these standard errors, and if not, what are they? Standard errors would be helpful.

Figure 2. What is indicated by the error bars in the time-series plots?

L 123 The interaction effect that is described starting on L 123 needs to be better explained. Ditto for the claim that a Monte Carlo simulation suggests this is not a between subjects effect.

https://doi.org/10.7554/eLife.73694.sa1

Author response

Essential revisions:

1) The relationship between online correction and subsequent adaptation across perturbation sizes need to be clarified – especially the between and within-person relationships need to be more cleanly separated – see detailed comments by both reviewers.

We have clarified the results of our trial-level linear model, explaining the effect of perturbation size on the relationship between online compensation and subsequent adaptation (lines 130-131). We have also followed the advice of Reviewer 1, replacing our Monte Carlo simulation analysis with an evaluation of within-person trial-wise correlations between compensation and adaptation. This analysis did yield evidence that these correlation coefficients were reliably larger than zero across participants, indicating a relationship at the trial level. We have amended our discussion to address this finding (lines 166-177).

2) The adaptation response in each direction (up and down perturbation) should be tested against zero.

We have added this analysis: the adaptation response was significantly different from 0 in the pre-defined time window of interest for trials following upward perturbations (post-up). The response was numerically but not significantly larger than 0 in this time window for trials following downward perturbations (post-down); however, a cluster-based permutation analysis that considered all time points across the syllable showed significant differences from 0 in both post-up and post-down conditions (see horizontal bars on Figure 2A). We have added this result to lines 120-122 and describe the method in lines 263-266 and 295-297.

3) The authors should discuss to what degree adaptation to random perturbations can serve as a good model of adaptation response occurring during systematic perturbations. For reaching there is an extensive literature on potential differences and similarities between random and systematic perturbations.

We have added a brief discussion of this point while attempting to remain within word limits. Specifically, the first paragraph of the discussion has been expanded to discuss how random perturbations very similar to those used here have been used extensively in the reaching literature studying one-shot learning (e.g., Diedrichsen et al., 2005; Joiner et al., 2017; Ruttle et al., 2021), and how inconsistent perturbations have recently been shown to decrease adaptation (Albert et al., 2021). If anything, this suggests that we may be underestimating the magnitude of the adaptation response to more consistent perturbations. As the aim of our study was specifically to demonstrate the existence of adaptation following a single exposure to altered auditory feedback, we feel that this is not a major issue.

Reviewer #1 (Recommendations for the authors):

Line 143: Was there a difference between same-word and different-word adaptation in this study?

Although the proportion of perturbed trials followed by the same vs. different words is close to 50% on average, this proportion is very different across studies, ranging from less than 15% to 100%; in other words, this comparison would, in effect, compare trials from studies with high word overlap with trials from studies with low word overlap (rather than comparing the two types of trials within individuals within a given study). Given that studies varied in many parameters including shift magnitude and proportion of shifted trials, unfortunately, we do not think the datasets used here are appropriate for the proposed analysis.

Line 277-288: It would be good to clarify both in the results and here in the beginning of the paragraph that the compensation, one-shot adaptation response, and perturbation are entered in the model corrected for perturbation direction. This information comes a bit late.

We have moved this information earlier in this paragraph and in the results.

Line 277-288: Was perturbation size included in the model as a continuous variable, or in a one-hot categorical encoding? It seems that the range was 96-125mels across study with one size per study? When you report a significant interaction between perturbation size and compensatory response - what shape did the interaction take? It would be good to show this result broken up by study, so the reader can judge to what degree this maybe drive by differences between studies.

Perturbation size was included as a continuous variable, as it was not always constant within a study. The interaction between perturbation magnitude and compensation was such that a stronger perturbation magnitude was related to a stronger effect of compensation on one-shot adaptation (now clarified on lines 130-131).

Line 289ff: Using Monte-carlo study to test for within-subject effects is somewhat indirect. Why not use the following simpler and (in my opinion) more convincing analysis: Estimate the slope of the compensation-adaptation for each subject separately? Then one can test is one-sample t-test of all the slopes against zero - or test for differences across perturbation sizes.

We thank the reviewer for the suggestion; we have replaced the Monte Carlo simulation with a one-sample t-test of participant compensation-adaptation correlations. The correlation coefficients were converted to z-scores via a Fischer transform prior to running the one-sample t-test. This analysis did yield a distribution of correlation coefficients that was significantly greater than 0, lending more evidence to a trial-level relationship; this is now explained and discussed in lines 133-139 & 166-177.

Reviewer #2 (Recommendations for the authors):

L 183 What does perturbations being separated by 180 deg in F1/F2 space mean? Please explain.

Some studies included perturbations in both the first and second formant (F1 and F2). These perturbations can be conceptualized as a vector, using the F1 perturbation on the x-axis and the F2 perturbation on the y-axis (i.e., plotting in F1/F2 space). Thus, if two perturbations are exactly opposite of each other, the two vectors should be 180 degrees from each other in this plot, pointing in opposite directions. We have clarified this point on line 198.

L 101 Responses to upward and downward shifts are reported to be statistically different than one another. However, it would be important to show that each shift is reliably different than zero. Is this the case?

We now report cluster-based permutation tests to show differences from 0 for all conditions (see explanation above).

{plus minus} values are given. Are these standard errors, and if not, what are they? Standard errors would be helpful.

We have converted these values to standard error (originally standard deviation), and specified this on line 102.

Figure 2. What is indicated by the error bars in the time-series plots?

The error bars indicate standard error across participants. This is now specified in the figure caption.

L 123 The interaction effect that is described starting on L 123 needs to be better explained. Ditto for the claim that a Monte Carlo simulation suggests this is not a between subjects effect.

We clarify the interaction effect, which showed that the relationship between compensation and one-shot adaptation was mediated by shift magnitude: greater shift magnitudes elicited a stronger effect of compensation at the trial level. We have replaced our Monte Carlo simulation with an analysis of within-participant correlation coefficients, as suggested by Reviewer 1, and as described above, now include a discussion of the finding that the distribution of these coefficients is significantly greater than 0, lending additional evidence for a trial-level relationship in the majority of participants.

https://doi.org/10.7554/eLife.73694.sa2

Article and author information

Author details

  1. Lana Hantzsch

    Waisman Center, University of Wisconsin–Madison, Madison, United States
    Contribution
    Data curation, Formal analysis, Methodology, Software, Visualization, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  2. Benjamin Parrell

    1. Waisman Center, University of Wisconsin–Madison, Madison, United States
    2. Department of Communication Sciences and Disorders, University of Wisconsin–Madison, Madison, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Supervision, Visualization, Writing – review and editing
    For correspondence
    bparrell@wisc.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2610-2884
  3. Caroline A Niziolek

    1. Waisman Center, University of Wisconsin–Madison, Madison, United States
    2. Department of Communication Sciences and Disorders, University of Wisconsin–Madison, Madison, United States
    Contribution
    Software, Conceptualization, Data curation, Formal analysis, Supervision, Visualization, Writing – review and editing, Funding acquisition, Investigation, Writing – original draft, Project administration
    For correspondence
    cniziolek@wisc.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6085-1371

Funding

National Institutes of Health (DC014520)

  • Caroline A Niziolek

National Institutes of Health (DC017091)

  • Benjamin Parrell

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was funded by National Institutes of Health grants R01 DC017091 (BP) and R00 DC014520 (CAN).

Ethics

Informed consent and consent to publish was obtained for all participants. The experimental protocols were approved by the Institutional Review Board of the institutions from which data were collected: the University of Wisconsin-Madison, the Massachusetts Institute of Technology, the University of California, San Francisco, and the University of California, Berkeley. The University of Wisconsin-Madison Minimal Risk Research IRB approved our procedures to analyze the previously collected data (MRR IRB 2017-1509).

Senior Editor

  1. Barbara G Shinn-Cunningham, Carnegie Mellon University, United States

Reviewing Editor

  1. Jörn Diedrichsen, Western University, Canada

Reviewer

  1. Jörn Diedrichsen, Western University, Canada

Publication history

  1. Preprint posted: July 26, 2021 (view preprint)
  2. Received: September 9, 2021
  3. Accepted: June 20, 2022
  4. Accepted Manuscript published: July 11, 2022 (version 1)
  5. Version of Record published: July 21, 2022 (version 2)

Copyright

© 2022, Hantzsch et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 764
    Page views
  • 234
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Lana Hantzsch
  2. Benjamin Parrell
  3. Caroline A Niziolek
(2022)
A single exposure to altered auditory feedback causes observable sensorimotor adaptation in speech
eLife 11:e73694.
https://doi.org/10.7554/eLife.73694

Further reading

    1. Neuroscience
    Miriam Menzel, David Gräßel ... Marios Georgiadis
    Research Article Updated

    Disentangling human brain connectivity requires an accurate description of nerve fiber trajectories, unveiled via detailed mapping of axonal orientations. However, this is challenging because axons can cross one another on a micrometer scale. Diffusion magnetic resonance imaging (dMRI) can be used to infer axonal connectivity because it is sensitive to axonal alignment, but it has limited spatial resolution and specificity. Scattered light imaging (SLI) and small-angle X-ray scattering (SAXS) reveal axonal orientations with microscopic resolution and high specificity, respectively. Here, we apply both scattering techniques on the same samples and cross-validate them, laying the groundwork for ground-truth axonal orientation imaging and validating dMRI. We evaluate brain regions that include unidirectional and crossing fibers in human and vervet monkey brain sections. SLI and SAXS quantitatively agree regarding in-plane fiber orientations including crossings, while dMRI agrees in the majority of voxels with small discrepancies. We further use SAXS and dMRI to confirm theoretical predictions regarding SLI determination of through-plane fiber orientations. Scattered light and X-ray imaging can provide quantitative micrometer 3D fiber orientations with high resolution and specificity, facilitating detailed investigations of complex fiber architecture in the animal and human brain.

    1. Computational and Systems Biology
    2. Neuroscience
    Jamie D Costabile, Kaarthik A Balakrishnan ... Martin Haesemeyer
    Research Article

    Brains are not engineered solutions to a well-defined problem but arose through selective pressure acting on random variation. It is therefore unclear how well a model chosen by an experimenter can relate neural activity to experimental conditions. Here we developed 'Model identification of neural encoding (MINE)'. MINE is an accessible framework using convolutional neural networks (CNN) to discover and characterize a model that relates aspects of tasks to neural activity. Although flexible, CNNs are difficult to interpret. We use Taylor decomposition approaches to understand the discovered model and how it maps task features to activity. We apply MINE to a published cortical dataset as well as experiments designed to probe thermoregulatory circuits in zebrafish. MINE allowed us to characterize neurons according to their receptive field and computational complexity, features which anatomically segregate in the brain. We also identified a new class of neurons that integrate thermosensory and behavioral information which eluded us previously when using traditional clustering and regression-based approaches.