Introduction

Speech planning is an important part of human communication and the inability to plan speech is manifest in disorders such as apraxia. But to what extent is targeted vocal planning an entirely human ability? Many animals are capable of volitional control of vocalizations (1,2), but are they also capable of planning to selectively adapt their vocalizations towards a target, such as when striving to reduce the pitch mismatch of a note in a song? Target-specific vocal planning is a cognitive ability that requires extracting or recalling a sensory target and forming or selecting the required motor actions to reach the target. Such planning can be covert or overt. Evidence for covert planning is manifest when a targeted motor change is executed without intermittent practice, e.g., when we instantly imitate a word upon first hearing. Overt planning, by contrast, includes practice, but without access to the sensory experience from which target mismatch could be computed, e.g., when we practice a piano piece by tapping on a table.

The vocal planning abilities in animals and their dependence on sensory experience remain poorly explored. Motor learning has been mostly studied in tasks where a skilled behavioral response must be produced on the spot, such as when a visual target must be hit by a saccade or by an arm reaching movement (36). In this context, motor planning has been shown to enhance motor flexibility, as it allows separation of motor memories when there are conflicting perturbations (7). However, for developmental behaviors such as speech or birdsong that rely on hearing a target early in life (8,9), the roles of practice and of sensory feedback for flexible vocal control and for target-directed adaptation are unknown.

Recovery of a once-learned vocal skill could also be instantaneous (covert), or it might require practice (overt). In support of the former, many motor memories are long-lasting (10), e.g., we can recall the happy-birthday song for years without practice. Some memories are even hard to get rid of such as accents in a foreign language. By contrast, practice-dependent, but feedback-independent recovery is argued for by arm reaching movements: following adaptation to biasing visual feedback, arm movements recover when the bias is either removed or the visual error is artificially clamped to zero (4,5). One explanation put forward is that motor adaptation is volatile and has forgetting built-in (6,11), leading to practice-dependent reappearance of the original motor program even without informative feedback (11). Given these possibilities, we set out to probe songbirds’ skills of recovering their developmental song target when deprived of either practice or of sensory feedback.

Adult vocal performances in songbirds can be altered by applying external reinforcers such as white-noise stimuli (12,13). When the reinforcer is withdrawn, birds recover their original song within hundreds of song attempts (12,1416). We argued that these attempts may be unnecessary and birds could recover their original performance by recalling either 1) the original motor program (1719), or 2) its sensory representation (20,21) plus the mapping required to hit it (15), or, 3) the sensory target (22) and the circuit for translating that into the original program (14) (Fig. 1A). These options might not need sensory feedback. Birds’ large perceptual song memory capacity (23) would argue for such possibilities. That is, birds’ song practice may be mainly expression of deliberate playfulness (24), conferring the skill of vocal flexibility rather than serving to reach a target, evidenced by young birds that explore vocal spaces close to orthogonal to the song-learning direction (25) and are already surprisingly capable of adult-like singing when appropriately stimulated (26).

Recovery of pitch target requires practice.

(A) Three hypotheses on birds’ ability to recover a song target away from their current vocal output (green): 1) they could recall its motor program (M), or 2) its sensory representation (S) plus the mapping (left black arrow) required to hit it, or, 3) the sensory target (T) and the circuit for translating that into the original motor program (arrows). (B) WNm birds were first pitch-reinforced using white noise (WN), then muted, and subsequently unmuted. Pitch recovery from the reinforced (R) state towards the baseline (B) target is evaluated in early (E, no practice) and late (L, with practice) analysis windows (all windows are time-aligned to the first 2 h of songs after withdrawal of reinforcement, E) and compared to recovery in unmuted control birds (WNC). (C) Syllable pitches (dots, red=reinforced syllables) of an example bird that while muted recovered only about 27% of pitch difference to baseline despite three spontaneous unmuting events (arrows). (D) Same bird, spectrograms of example song motifs from 5 epochs: during baseline (B), reinforcement (R) with WN (green bar), spontaneous unmuting (spont. unmut), and during permanent unmuting (early – E and late - L). (E) Example syllables from same 5 epochs. (F) Stack plot of pitch traces (pitch indicated by color, see color scale) of the first 40 targeted syllables in each epoch (‘reinforced’: only traces without WN are shown). (G)Average pitch traces from F), revealing a pitch increase during the pitch-measurement window (dashed black lines) and pitch recovery late after unmuting. (H)WNm birds (blue lines, N = 8) showed a normalized residual pitch (NRP) far from zero several days after reinforcement (circles indicate unmuting events, arrow shows bird from C) unlike WNC birds (gray lines, N = 18). (I) Violin plots of same data restricted to early and late analysis windows (***p < 0.001, *p < 0.05, two-tailed t-test of NRP = 0).

Results

To test whether birds can recover a song syllable without practice, we first reinforced the pitch of a song syllable away from baseline and then we suppressed birds’ singing capacity for a few days by muting their vocal output. We then tested whether the subsequently unmuted song has covertly reverted back to the original target. We used syllable pitch as the targeted song feature, because we found that birds did not reliably recover syllable duration in experiments in which we induced them to shorten or lengthen syllable duration (Fig. S1).

We first drove pitch away from baseline by at least one standard deviation using a white-noise (WN) stimulus delivered whenever the pitch within a 16-ms time window locked to the targeted syllable was above or below a manually set threshold (Fig. 1B, see Methods). We muted these WNm (white-noise reinforced and muted) birds by implanting a bypass cannula into the abdominal air sac (see Methods). While muted, air is leaking from the abdominal air sac and as a result, sub-syringeal air pressure does not build up to exceed the threshold level required for the self-sustained syringeal oscillations (27) that underlie singing. Physical absence of such oscillations essentially strips muted birds from all pitch experience.

Two birds we muted right after the WN-driven pitch change. After keeping birds for four days in the muted state, we permanently unmuted them to record their undisturbed songs. We observed that the two birds had recovered a mere 10% and -6% of their total WN-driven pitch change. We hypothesized that unreinforced singing would initiate the song recovery process in WNm birds that we assumed birds might be able to accomplish while mute. Therefore, we allowed the subsequent 6/8 WNm birds to sing a few hundreds of target syllables without reinforcement prior to muting them.

In some cases, the bypass cannula got clogged during the muted period and birds were spontaneously unmuted, allowing them to produce a few songs before we reopened the cannula (Fig. 1C-G). These spontaneous unmuting events were not detrimental to our experimental procedures, as they allowed us to inspect birds’ current song motor program (Fig. 1C).

After spending 5.1 ± 1.6 days (range 3 − 8 d, N = 8) in the muted state and upon unmuting, WNm birds displayed an average normalized residual pitch (NRP) of 89%, which was far from baseline (p = 6.2 · 10−8, tstat = −23.6, N = 8 birds, two-sided t-test of H0: NRP = 0%, songs analyzed in 2 h time window – early (E), see methods, Fig. 1), suggesting that in the muted state, birds are unable to recover their pre-reinforced songs. The average NRP in WNm birds was comparable to that of unmanipulated control (WNC) birds within the first 2 h after withdrawal of the reinforcer (average NRP = 91%, p = 3.7 · 10−11, tstat = 14.8, two-sided t-test for NRP = 0%, N = 18 WNC birds). Indeed, during 5 days without song practice, birds recovered no more pitch distance than birds normally do within the first 2 h of release from reinforcement (p = 0.82, tstat = −0.23, N = 8 WNm and N = 18 WNC birds, two-sided t-test). These findings did not sensitively depend on the size of the analysis window — we also tested windows of 4 and of 24 h.

Subsequently, after 4 days of unmuted song experience (roughly 9 days after withdrawal of WN), WNm birds displayed an average NRP of 30%, which was significantly different from their NRP within the first 2 h after unmuting (p = 3 · 10−4, tstat = 4.83, N = 8 birds, two-tailed t-test early (E) vs. late (L) time window) but still significantly different from zero (p = 0.04, tstat = 2.59, N = 8 birds, two-tailed t-test, late (L) time window). Overall, these findings suggest that either motor practice, sensory feedback, or both, are necessary for recovery of baseline song.

Next, we therefore tested whether motor experience but not sensory experience is necessary for recovery, similar to arm reaching movements that can be restored without guiding feedback (4,28). In a second group of WNd birds, we provided slightly more song experience (Fig. 2). Instead of muting, WNd birds were deafened through bilateral cochlea removal. This latter manipulation does not suppress the act of singing as does muting, but it eliminates auditory feedback from singing. Deaf birds could gain access to some pitch information via somatosensory stretch and vibration receptors and/or air pressure sensing (29). Our aim was to test whether such putative pitch correlates are sufficient for recovery of baseline pitch (Fig. 2A). However, in the deaf state, WNd birds did not recover baseline pitch even after 4 days of song practice: on the 5th day (late, L) after deafening, their average NRP was still 50%, which was different from zero (p = 0.03, tstat = 2.73, two-tailed t-test of H0: NRP = 0%, N = 10, Fig. 2D) and significantly larger than the average NRP of WNC birds on the 5th day since withdrawal of reinforcement (difference in NRP = 49%, p = 0.003, tstat = 3.34, df = 26, N = 10 WNd and N = 18 WNC birds, two-tailed t-test).

Recovery of pitch target is impaired after deafening.

(A) WNd birds were first pitch-reinforced using white noise (WN) and then deafened by bilateral cochlea removal. Analysis windows (letters) as in Fig. 1. (B) Syllable pitches (dots, red=reinforced syllables) of example WNd bird that shifted pitch down by d’ = −2.7 during WN reinforcement and subsequently did not recover baseline pitch during the test period. (C) WNd birds (N = 10) do not recover baseline pitch without auditory feedback (circles=early window after deafening events, cross=late). (D) Violin plots of same data restricted to early and late analysis windows (***p < 0.001, *p < 0.05, two-tailed t-test of NRP = 0).

We speculated that the lack of pitch recovery in WNd birds could be attributable to the sudden deafening experience, which might be too overwhelming to uphold the plan to recover the original pitch target. WN deaf birds did not sing for an average of 2.3 ± 1.1 days (range 1 to 4 days) after the deafening surgery, which is a strong indication of an acute stressor (30). We thus inspected a third group of birds (dLO, Fig. 3) taken from (31) that learned to shift pitch while deaf and that underwent no invasive treatment between the pitch reinforcing experience and the test period of song recovery.

Deaf birds do not recover pitch target after light-induced mismatch.

(A) dLO birds were first deafened and then pitch-reinforced using a brief light-off (LO) stimulus. Analysis windows (letters) as in Fig. 1. (B) Syllable pitches (dots, blue=LO-reinforced syllables) of example dLO bird that shifted pitch up by d’ = 3.5 within a week, but showed no signs of pitch recovery during the test period. (C) dLO birds (N = 8) do not recover baseline pitch without auditory feedback. (D) Violin plots of same data restricted to the late analysis window (***p < 0.001, two-tailed t-test of NRP = 0).

dLO birds were first deafened, and after they produced stable baseline song for several days, their target syllable pitch was reinforced using pitch-contingent light-off (LO) stimuli, during which the light in the sound recording chamber was briefly turned off upon high- or low-pitch syllable renditions(32). dLO birds displayed an average NRP of 112% on the 5th day since release from LO, which was significantly different from zero (p = 3.7 · 10− 8, tstat = 25.4, N = 8 birds, two-tailed t-test of H0: NRP=0) and was larger than the NRP in WNC birds on the 5th day since release (p = 1.3 · 10− 1, tstat = 14.9, df = 24, N = 8 dLO and N = 18 WNC birds, two-sided t-test). Thus, dLO birds were unable to recover baseline pitch, suggesting that song recovery requires undiminished sensory experience, which includes auditory feedback.

That song practice and sensory experience are required for full recovery of song does not imply that without experience, birds are incapable of making any targeted changes to their songs at all. We therefore inspected birds’ fine-grained vocal output and whether they changed their song in the direction of baseline when deprived of sensory experience. We hypothesized (Fig. 4A), that when birds experience a target mismatch during reinforcement (when they hear that their song deviates from the target), this mismatch will fuel their plan to recover the pitch target, and a portion of this plan they can execute without feedback. If, by contrast, they have no mismatch experience, they will make no corresponding plan. Hence, we predicted that WNd birds that experienced a pitch mismatch would slightly revert their song towards baseline even in the absence of auditory feedback. By contrast, dLO birds that did not experience a mismatch because they did not hear their song while it was reinforced, would not revert towards the target (Fig. 4A).

Target mismatch experience is necessary for revertive pitch changes.

(A) WNd birds heard a target mismatch during reinforcement whereas dLO birds did not. dC birds were not pitch reinforced, their analysis windows matched those of manipulated birds in terms of time-since-deafening. (B, C) Pitch change between the last 2 h of reinforcement (R) and 2 h windows during the test period time-aligned to the first 2 h of song after withdrawal of reinforcement (E) in std for WNd (red, B), dLO (blue, C) birds. (D) WNd (red) perform both early and late pitch changes in the direction of the baseline target (by about one standard deviation, * p < 0.05, one-tailed t-test), similar to WNC (gray) and unlike dLO (blue) birds without mismatch experience. (E) Bootstrapped pitch differences between reinforced WNd (blue) and dLO (red) and 10’000 times randomly matched dC birds, shown for early (solid line) and late (dashed line) analysis windows. The stars indicate the bootstrapped probability of a zero average pitch difference between reinforced and dC birds (n.s. not significant, ** p < 0.01, *** p < 0.001).

Indeed, WNd birds changed their pitch significantly towards baseline already in the first 2 h of their singing since release from reinforcement (relative to the pitch from the last 2 h during reinforcement, d′ = −0.60, p = 0.03, tstat = −2.19, df = 9, N = 10 birds, one-sided t-test of H0: d′ = 0). A significant approach of pitch baseline was still evident after 4 days of practice, (d′ = −1.27, p = 0.02, tstat = −2.35, N = 10 birds, one-sided t-test, Fig. 4B, D), showing that pitch reversion in deaf birds is persistent. Because the average pitch shift in WNd birds was on the order of one standard deviation (d’ ≃ 1), we conclude that without song experience, birds are able to perform target-directed pitch shifts of about the same magnitude as their current exploratory range (i.e., the denominator of the d’ measure).

In contrast, dLO birds showed no signs of reverting pitch, neither in the first 2 h since release of reinforcement (d′ = −0.13, p = 0.36, tstat = −0.37, df = 7, N = 8 birds, one-sided t-test), nor after 4 days of practice (d′ = −0.08, p = 0.43, tstat = −0.18, df = 7, N = 8 birds, one-tailed t-test, Fig. 4C, D). The pitch change in dLO birds was indistinguishable from that in deaf controls (dC) that were not pitch reinforced and had no plan to shift pitch in either direction (Fig. 4A, E). To discount for the effect of time elapsed since deafening, we bootstrapped the difference in d′ between dLO birds and dC birds in matched time windows (see methods). To discount for possible influences of circadian pitch trends, we assessed early and late pitch changes in reinforced birds and in dC birds also in 2 h time windows separated by multiples of 24 h: the result of all these analyses is that significant reversion towards baseline was only seen in WNd birds and very consistently so (Fig. 4E, Table S1), showing that not current auditory feedback, but prior experience of a target mismatch is necessary for pitch reversion. Our results thus argue for a model of song maintenance in which birds extract from target mismatch experience a plan of reducing the mismatch (Fig. 5), and without sensory feedback, this plan is limited by the extent of current song practice.

Schematic illustrating the goal-directed planning of vocal changes.

The song variability from the last few hours (green arrows, black density) limits (blue dashed lines) the song targets (filled black dot in M) that birds can overtly plan without sensory feedback. Sensory experience (green circles under S) is a prerequisite for consolidating motor plans and for reaching (curved blue arrow) a target (T) beyond the planning range, e.g. a song variant produced several days ago.

Discussion

Our work shows that recent auditory experience can drive motor plasticity even while an individual is deprived of such experience, i.e. zebra finches are capable of overt vocal planning. But to reach a distant vocal target beyond the pitch range experienced in the preceding few hours necessitates auditory feedback, which sets a limit to zebra finches’ covert planning ability.

Birds’ failure to recover baseline pitch without guiding sensory feedback agrees with reports that binary reinforcement (as we used) slows down or prevents forgetting of the adapted behavior (5). However, whereas forgetting is fast when sensory errors affect arm movements (5), the contrary applies to birdsong, where pitch learning from artificial sensory errors is slower and less forgotten (33) than is pitch learning from binary reinforcement (12,14). Hence, the commonality of short-term visuo-motor adaptation and of birdsong maintenance is that slow learning leads to slow forgetting, regardless of whether it is due to sensory errors or reinforcement. Such conclusion also agrees with observations that zebra finch song does not recover to pre-manipulated forms, both after restoring auditory feedback after long-term (>5 months) deprivation (34) and after restoring normal syrinx function after long-term (16 weeks) manipulation with beads (35), suggesting that song can spontaneously recover only within some limited time since it was manipulated.

The overt planning ability suggests that recovery of a developmentally learned vocal target is controlled by two hierarchical processes, a highly flexible process with limited scope (d’ ≃ 1, Fig. 4), and a dependent process enabled by experience of the former. Such motor learning based on separate processes for acquisition and retention is usually referred to as motor consolidation (3,36,37). Accordingly, the hierarchically lower process is independent of immediate sensory experience, but its consolidation requires experience. Perhaps then, it is the sensory experience itself that is consolidated, and therefore, consolidation of sensory experience may be a prerequisite for extensive planning.

Consolidation in motor learning generally emerges from anatomically separatated substrates for learning and retention (4). Such separation also applies to songbirds. Both reinforcement learning of pitch and recovery of the original pitch baseline depend on the anterior forebrain pathway and its output, the lateral magnocellular nucleus of the anterior nidopallium (LMAN)(15). LMAN generates a pitch bias that lets birds escape negative pitch reinforcers and recover baseline pitch when reinforcement is withdrawn (13), thus is likely involved in planning. This pitch bias is consolidated outside of LMAN (15,38) in a nonlinear process that is triggered when the bias exceeds a certain magnitude (39). This threshold magnitude is roughly identical to the planning limit we find (d’ ≃ 1), suggesting that the consolidation of LMAN-mediated motor plasticity corresponds to birds’ planning limit. Because LMAN seems capable of executing a motor plan without sensory feedback, our work provides a new perspective on the neural basis of birdsong learning and consolidation in and around LMAN.

The formation of a planned motor change may not require LMAN itself, because pharmacological suppression of LMAN sets the bias to zero, but upon removal of output suppression, the pitch of the song syllable that was targeted by reinforcement jumps by about 1% away from the reinforced pitch zone (40), which corresponds to about dr = 1, about the planning limit we find. Originally, this jump was interpreted as evidence of functional connectivity or an efference copy between the anterior forebrain pathway of which LMAN is part of and some other unspecified variability-generating motor area. However, in our view, a simpler explanation requiring neither functional connectivity nor efference copy is that LMAN is involved in putting a plan into action, which in that case is to produce syllable variants that are unaffected by WN.

Zebra finches’ ability to plan directed song changes could hinge on song memories that feed into LMAN and that could drive neurons there to produce diverse perceptual song variants. LMAN neurons are selective for the bird’s own song but not the target song (20,21), which makes them well suited for planning song in a manner congruent with experience. Furthermore, LMAN neurons show mirrored activity, i.e., similar activity when a zebra finch produces a vocal gesture and when it hears the same gesture played through a loudspeaker (41,42). This mirrored activity has been argued to be involved in translating an auditory target into the corresponding motor command, also known as an inverse model (43). Mirroring in LMAN was observed across the song variability generated over a period of several hours, which is about the same as the experience-dependent pitch planning limit we find. Zebra finches could thus transform a desired pitch change into the corresponding motor plan via LMAN’s aligned sensory and motor representations of recent vocal output.

Our observations in zebra finches could be relevant to other species including humans. The planning abilities we find bear resemblance to human motor imagery for movement learning, which is most effective when subjects already show some competence for the movements to be learned (44), suggesting a recall-dependent process. Naively, human vocal flexibility seems superior to that of zebra finches, since we can flexibly change sound features such as loudness, pitch, and duration to convey emotional state or to comply with the tonal and rhythmical requirements of a musical piece (45,46), whereas zebra finches produce more subtle modulations of their songs e.g. when directing them to a female (47). Nevertheless, a limit of human vocal flexibility is revealed by non-native accents in foreign languages, which are nearly impossible to get rid of in adulthood. Thus, a seeming analogous task to re-pitching of zebra finch song, in humans, is to modify developmentally learned speech patterns.

Our findings help elucidate the meaning of song signals in songbirds and the evolutionary pressures of singing. Because zebra finches seem incapable of large jumps in performance without practice, their current song variants are indicative of the recent song history, implying that song is an honest signal that zebra finches cannot adapt at will to deceive a receiver of this signal. Hence, if high pitch has either an attractive or repelling effect on another bird, a singer must commit to being attractive or repulsive for some time. In extension, we speculate that limited vocal flexibility increases the level of commitment to a group and thereby strengthens social cohesion.

Materials and Methods

All experimental procedures were in accordance with the Veterinary Office of the Canton of Zurich (licenses 123/2010 and 207/2013) or by the French Ministry of Research and the ethical committee Paris-Sud and Centre (CEEA N°59, project 2017-12).

Subjects

We used in total 76 birds. All birds were 100-300 days old (except one 853-day old control bird) and were raised in the animal facility of the University of Zurich or in Saclay. During recording, birds were housed in single cages in custom-made sound-proof recording chambers equipped with a wall microphone (Audio-Technica Pro42), a loudspeaker, and a camera. The day/night cycle was set to 14/10 h except for one muted bird that was in constant light due to a technical problem.

Song Recordings

Vocalizations were saved using custom song-recording software (Labview, National Instruments Inc.). Sounds were recorded with a wall microphone and digitized at 32 kHz. In all birds, we recorded baseline vocal activity for at least 3 days before doing any manipulation (deafening or pitch reinforcement).

Pitch Reinforcement

We calculated pitch (fundamental frequency) as described in(14). To provide pitch reinforcement in real time, we used a two-layer neural network trained to detect a manually clustered syllable containing a harmonic stack(48). We evaluated the fundamental frequency of that syllable in a 16-24-ms time window following detection. For pitch reinforcement, we either broadcast a 50-60-ms long white noise (WN) stimulus through a loudspeaker or briefly switched off the light in the isolation chamber for 100-500 ms (LO) when pitch was below or above a manually set threshold. The WN/ LO stimulus onset occurred 7 ms after the pitch calculation offset. We performed cumulative pitch shifts across several days by adjusting the pitch threshold for WN/LO delivery each day, usually setting it close to the median value of the previous day. Sometimes the threshold was set more than once during a day, in this case we set it close to the median of the pitch values measured so far during that day. All birds were shifted by at least 1 standard deviation (d’ > 1, see Section Pitch Analysis).

Reported pitch values were collected as above, except in muted birds that directly after unmuting produced syllables of lower amplitude and with distorted spectral features (e.g. Fig. 1C, E, F), which resulted in frequent mis-detections by the neural network. In muted birds, we therefore performed semi-automatic (manually corrected) syllable detection and we computed pitch at a fixed time lag after syllable onset.

Duration Reinforcement

Duration reinforcement was performed similarly as pitch reinforcement but instead of measuring the pitch of a targeted syllable, we measured the duration of a targeted song element (either a syllable, a syllable plus the subsequent gap, or just a gap). Onsets and offsets of the targeted element were determined by thresholding of the root-mean square (RMS) sound amplitude.

Bird groups

WN Control (WNC)

18 birds in the control group underwent WN pitch reinforcement (10/18 up-shifted, 8/18 down-shifted). Thereafter, the WN stimulus was withdrawn, and no further experimental manipulation took place.

WN muted (WNm)

in 8 birds, we first reinforced pitch using white noise (WN) auditory stimuli and then we reversibly muted the birds by performing an airsac cannulation.

Normally, when WN stimuli are contingent on low-pitch renditions, birds tend to shift the pitch up, and in 5/6 birds this was indeed the case. However, one bird shifted the pitch down, in an apparent appetitive response to WN, this bird responded appetitively also when the WN contingency was changed, resulting in a net upward shift at the end of the WN period, see also(48). In 2 birds, we targeted high-pitch variants and these birds shifted the pitch down, as expected. Thus, in total, in 6/8 birds (including the bird with the apparent appetitive response), we drove the pitch up and in 2/8 birds, we drove the pitch down.

Two birds underwent the muting surgery directly after withdrawal of WN stimuli. To 6/8 birds (4 up-shifted and 2 down-shifted), we provided the opportunity to sing without WN before the muting surgery. During on average 4h51mins (range 10 mins to 14h), these latter birds produced on average 649 song motifs (56, 100, 400, 458, 480, and 2400 motifs) without WN; the example bird shown in Fig. 1C produced 56 song motifs within 11 minutes during the 30 minutes it was allowed to sing without aversive reinforcement.

WN deaf (WNd)

10 birds were first pitch reinforced (5/10 were up-shifted and 5/10 down-shifted) with WN and then they were deafened by bilateral cochlea removal. WNd birds started to sing on average 3±1 days after deafening (range 2 to 5 days) and were recorded for at least 15 days after the deafening surgery.

Deaf LO (dLO)

8/10 birds from (31) were recorded after the reinforcement period and we analyzed the associated data. These birds were first deafened by bilateral cochlea removal, then they underwent pitch reinforcement with light-off (LO) stimuli that acts as an appetitive stimulus in deaf birds. The lamp in the recording chamber was switched off for 100-500 ms when the pitch was either above or below a manually set threshold (daily threshold adjustment followed the same procedure as for WNm birds). 3/8 birds received LO for low-pitched syllables and 5/8 birds for high-pitched syllables. One of the birds that received LO for high-pitched syllables changed its pitch away from LO instead of towards it, thus we ended up with a balanced data set with 4/8 birds shifting pitch up and 4/8 birds shifting down. dLO birds were recorded for at least 5 days after the deafening surgery. Details of light-induced pitch shifting are described in (31).

Deaf control (dC)

we analyzed 26 syllables from 20 birds taken (12 from (31) and 8 additional ones) that were deafened and then recorded without any further manipulation. We used these birds to discount for pitch changes in WNd and dLO birds due to absence of auditory feedback, see bootstrapping.

WN duration (WNdur)

12 birds underwent duration reinforcement using WN, in 9 birds the targeted sound feature was syllable duration, in 2 birds the targeted feature was syllable-plus-gap duration, and in one bird the targeted feature was gap duration. In 4 birds, the duration was squeezed and in 8 birds the duration was stretched. As in WNC birds, we did no further experimental manipulation after withdrawal of the WN stimulus. One bird changed its duration towards WN showed an apparent appetitive response to WN as for the one muted bird.

Muting

We muted birds by inserting a by-pass cannula into the abdominal air sac (49) as follows. Preparation of by-pass cannula: After incubation in 70% ethanol, we clogged a 7 mm long polyimide tube (diameter 1.2 mm) with sterile paper tissue. We created a suture loop around the cannula and fixed the thread to the cannula with a knot and a drop of tissue glue.

Cannula implantation: We anaesthetized the birds with Isoflurane (1.5-2%) and gave a single injection of Carprofen (4 mg/kg). Subsequently, we applied local analgesic to the skin (2% lidocaine) and removed the feathers covering the right abdomen. We applied Betadine solution on the exposed skin and made a small incision using sterilized scissors. We exposed the right abdominal air sac by shifting aside the fat tissue and punctured it to create an opening. Immediately, we closed the opening by inserting the cannula and by sealing the contact region with tissue glue. With the free end of the glued thread, we made one suture to the lowest rib. We closed the wound in the skin around the cannula with tissue glue and sutures using a new thread. Finally, we applied betadine solution on the wound and lidocaine gel around the injured site. Before releasing the bird to its cage, we removed the clog of the cannula with forceps and verified the air flow through the cannula.

We returned the birds to their home cage and monitored them for signs of suffering. We administered pain killers (Meloxicam 2 mg/kg or Carprofen 2-4 mg/kg) for 2 days after the surgery.

On the following days, we monitored the birds continuously for singing activity. If song was detected, the cannula was inspected for clogging and cleaned. 5 birds unmuted spontaneously, they produced at most 300 songs before the bypass cannula was inspected and the clog was removed to re-mute the bird. To unclog the bypass cannulas, we used sharp forceps and sterile tissue dipped in saline. 6 of 8 birds produced quiet call-like vocalizations even on muted days on which no singing was detected.

Deafening

We bilaterally removed cochleas as described in (31).

Pitch Analysis

In individual birds, we studied the dynamics of pitch recovery during the test period. In WNm birds, the test period started with unmuting, and in all other reinforced birds it started with the end of reinforcement. We analyzed songs in early (E) time windows defined as the first 2 h window during the test period in which the bird produced at least 20 song motifs. We also assessed pitch recovery in late (L) windows defined exactly 4 days after the E window. To make the measurements robust to circadian fluctuations of pitch, we compared the pitch values in early and late windows to pitch values produced in time-aligned windows during the last day of reinforcement (R) and during the last day of baseline (B).

We used this time-of-day matched analysis to produce Fig. 1H, I, Fig. 2. C, D and Fig. 3. C, D. Exceptions where time alignment was not possible are listed in the following:

  • One WNm bird started singing late on the last day of reinforcement (preventing us from time-aligning the R window with the E window), and therefore in this bird we defined R after the end of WN but before muting (in this bird there is more than one day of song after WN and before muting).

  • In two birds (1 WNC and 1 dLO bird), we defined the L window one day earlier (on the 4th day, after 3 days of practice), because there was no data for these birds on the 5th day after reinforcement (our findings did not qualitatively change when we defined the L window on the 6th day instead of the 4th).

  • One WNm bird was housed together with a female during WN reinforcement; this bird did not sing during the time-match 2-h period on the 2nd, 3rd, and 4th day after reinforcement; therefore on those days we computed the mean pitch from all values produced on that day in Fig. 1H.

In early (E) and late (L) analysis windows, we computed the normalized residual pitch (NRP), which is the remaining fraction of pitch shift since release from WN, defined as NRP(X) = (PXPB)/(PRPB), where PX is either the mean pitch in the early (X = E) or late (X = L) window (Fig 1H, I, 2C, D, 3C, D). PR and PB are the mean pitches in the R and B windows, respectively. An NRP of 33% indicates that two-thirds of the reinforced pitch shift have been recovered and an NRP of 0% indicates full recovery of baseline pitch. Note that the NRP measure discounts for differences in the amount of initial pitch shift the birds displayed at the beginning of the test period.

We performed statistical testing of NRP to discount for this diversity in initial pitch. To test the hypothesis that WNm birds recovered their baseline pitch without practice or that WNd or dLO birds recovered baseline pitch without auditory feedback, we performed a two-tailed t-test for NRP = 0.

Our results were qualitatively unchanged when we changed the timing of the L window, as long as there were at least 3 days between E and L windows (because WNC birds need at least 3 days to recover their baseline pitch in the L window, p < 0.05). Thus, giving deaf birds more time did not allow them to recover their baseline pitch. Furthermore, we also tested larger windows of 4 and 24 h duration instead of 2 h and found qualitatively similar results. We further verified that our results did not critically depend on the time-alignment by repeating the NRP tests using the last 2 h of reinforcement as the R windows. Indeed, we found that all results in Fig. 1-3 were unchanged.

We computed the pitch change after reinforcement (Fig. 4) as the difference in mean pitches between window X (first 2 h after withdrawal of the reinforcer X = E, or exactly 4 days later X = L) and the last 2 h of WN/LO reinforcement R in units of sensitivity d′ = (PXPR)/SR, where S is the standard deviation of pitch values in the R window. To test the hypothesis that WNd and dLO birds are able to make targeted pitch changes towards baseline, we performed a one-tailed t-test of the hypothesis H0: d’ < 0. Changes in the way in which we normalized d’ values — dividing by , or — did not qualitatively change the results shown in Fig. 4.

Bootstrapping

To test whether deaf birds indeed make small pitch changes towards a target if and only if they experienced target-mismatch during reinforcement, we bootstrapped the difference in pitch changes between reinforced (WNd and dLO) and deaf control birds (dC). All dC birds were recorded for at least 5 days after they started singing while deaf.

In dC birds, we defined the R, E, and L windows such that they matched those of WNd and dLO birds in terms of days since deafening. Additionally, in dLO birds we chose the windows such that they matched in terms of time-of-day (because LO always ended overnight). Thus, the R windows in dC birds either corresponded to the last 2 h before deafening (as control for WNd birds) or to the last 2 h of the day before E (as control for dLO birds).

For WNd birds, we obtained in total 26 control syllables from 20 dC birds. For dLO birds, we obtain 17 control syllables from 13 dC birds (some dC birds did not provide any useable data because they stopped singing or were not recorded for long enough).

For the bootstrapping procedure, we randomly paired control syllables (N=26 for WNd and N=17 for dLO) one-by-one with matchable syllables from reinforced birds (with replacement), computed the mean pitches PR, PE, PL in corresponding windows, calculated the standard deviation SR, calculated the average pitch changes d′E = (PEPR)/SR and d′L = (PLPR)/SR for both manipulated and control birds, and multiplied these by -1 if the reinforced bird was down-shifted (as we did for d′ above). We then took the differences in average pitch changes between manipulated (WNd and dLO) and dC birds, e.g. d′E,WNd−d′E,dC. We repeated this procedure 10’000 times and plotted the distribution of average pitch change differences between WNd and dC (red) and between dLO and dC (blue) in Fig. 4E and perform bootstrap statistics.

Our results were qualitatively unchanged (only WNd significantly reverted pitch towards baseline) when we aligned the R windows by the time-of-day of the corresponding E windows (two dC birds started singing later on the day of the E window than they stopped singing on the days before; in these two birds we used the R windows instead), see Table S1. Although the d’ values in both groups increased (and in dLO birds, the average d’ in the L windows was positive, p < 0.05, two-tailed t-test), we found a significant pitch difference between WNd or dLO birds in L windows, which upholds our findings that mismatch experience is necessary for pitch reversion. The reason for the increases in d’ likely is that birds further shifted their pitch away from baseline on the last day of reinforcement (after the time-aligned R window). Also, results were robust when we analyzed pitch changes after release from reinforcement in units of NRP: without practice, WNd birds made small and significant pitch changes towards baseline, and dLO birds stayed at NRP ≥ 1.

Acknowledgements

We thank Manon Rolland and Sophie Cavé-Lopez for performing some of the deafening surgeries and their support with the experiments, and Heiko Hörster for providing excellent animal breeding and animal care services.

Funding

Swiss National Science Foundation (Projects 31003A_182638 and 31003A_156976/1) European Research Council (ERC) Advanced Grant (268911, VOTECOM)

Author contributions

Conceptualization: ATZ, AES, RHRH

Investigation: ATZ, AES, NG

Data Curation: ATZ, AES

Formal analysis : ATZ,

RHRH Visualization: ATZ, AES, RHRH

Funding acquisition: NG, RHRH

Supervision: NG, RHRH

Writing – original draft: ATZ, AES, RHRH

Writing – review & editing: ATZ, AES, NG, RHRH

Competing interests

Authors declare that they have no competing interests.

Data and materials availability

Pitch and duration data that support the findings of this study together with the MATLAB scripts to reproduce the analysis and figures will be made available at the ETH Research Collection upon publication of the article. The raw data underlying the pitch measurement is not deposited due to its size but is available from the authors upon reasonable request.