Bilateral lesions in the latDCN reduce syllable and motif duration and syllable amplitude, while inconsistently affecting gap duration.

A: spectrogram of a song motif before (top) and after (bottom) bilateral lesion of the latDCN. The motif is made of three syllables (a, b and c). White dashed line: onset of syllable b, aligned for the two renditions. Blue and red dashed lines: offset of syllable b before and after lesion. B: Mean amplitude of syllable b across 3 consecutive days before (red) and after (blue) lesion (shaded region: standard deviation around the mean). In green, threshold used to measure syllable duration. C: Distribution of syllable b duration for three consecutive days before (blue) and after (red) the lesion. Dark blue and red: average distributions before and after lesion. D: Mean duration of all syllables before and after lesion in all birds (n=33 syllables for n=9 birds, vertical lines: +/- 1 SD, black arrow: example syllable from A-C). E: Mean duration of gaps before and after lesion (n=23 gaps for n=9 birds). F: Mean motif duration (n=9 birds) before and after lesion. G: Mean syllable amplitude before and after lesion (n=33 syllables for n=9 birds).

Auditory responses in the cerebellum of anaesthetized finches.

(A) Neuropixel recording probe locations. (B) Top (i): example song spectrogram. Bottom: Example LF (ii) and HF (iii) units from lobule VI: raster plot (bottom) and PSTH of BOS playback (top, solid line: mean firing rate; solid horizontal line: baseline; vertical bar: 95% confidence interval; vertical dashed lines: motifs onsets/offsets). (C) Distribution of z-scored firing rates across lobules during BOS playback (n=3 birds). Black dots: units with z>2.5. Asterisks: lobules with above-chance proportions (bootstrap test, see Methods and Suppl. Fig. 3). Note that the y scale is piecewise linear. (D) BOS vs reverse BOS selectivity (d’ index per unit, see methods). Dashed lines: significance thresholds (d’>1, d’<-1). Units grouped by lobule (III–X, separated by vertical dashed lines). (E) Cell classification by baseline firing rate and spike width. Inset: cross-correlogram between complex spikes (CS, magenta) and simple spikes (SS, green). Blue: low frequency units, red: high frequency units. (F–K) Neuronal responses to BOS playback across lobules IV–IX. Each line: z-scored PSTH (20 ms bins) averaged across trials. Left: baseline. Right: BOS playback (duration = motif length, bird-specific). BOS motifs were presented at least 5 times and each recording contained at least 20 spikes to compute this average response. Only sites with |z|>2.5 are shown. Lobule IV: 221 recording sites shown. Lobule V: 118 recording sites shown. Lobule VI: 230 recording sites shown. Lobule VII: 179 recording sites shown. Lobule VIII: 202 recordings shown. Lobule IX: 321.

Neuronal activity in the cerebellum of awake, singing zebra finches.

(A) Schematic of recordings during singing with or without probabilistic Distorted Auditory Feedback (DAF). A real-time system triggered white noise upon detection of a target syllable. (B) Example spectrograms of songs with and without DAF. (C) Schematic of passive playback recordings. Stimuli included BOS, reversed BOS (rBOS), conspecific song, and white noise. (D) Histological verification of electrode placement. Electrolytic lesions were made at two depths along the electrode track. (E) Recording site locations across lateral cerebellar lobules (n=15 birds; medio-lateral positions 1–1.4 mm). (F) Single-unit classification by baseline firing rate and spike width. Blue: LF units; red: HF units; green: putative simple spikes (SS); magenta: putative complex spikes (CS). Inset: cross-correlogram of SS and CS. (G) Firing rates were significantly higher during singing compared to quiet baseline for both HF (red) and LF (blue) units. (H) Example recording in lobule IV of an awake bird. (i) Raw trace showing putative CS (asterisks) and SS. (ii) Zoomed trace segment. (iii) Average SS waveform. (iv) Average CS waveform.

Singing-related and playback-evoked responses in the cerebellum, exemplified in lobule V.

(A) Example LF neuron: spectrogram of a song bout (top) with raw recording trace (bottom). (B) Spectrogram of the song motif. (C) Raster plot (bottom) and PSTH (top) of the same LF neuron during singing (green) and BOS playback (magenta). Solid lines: baseline firing rate around singing (green) and playback (magenta). Vertical green and magenta bars: 95% confidence intervals for singing-related and playback-related PSTHs respectively. Vertical dashed lines: syllables onsets/offsets. This unit also responded to reversed BOS, conspecific song, and white noise (Suppl. Fig. 7). (D-E) Same as B–C for an example HF neuron in lobule V. (F) Recording sites (i) and spike shapes of the LF (ii) and HF (iii) neurons shown in C–E. (G–L) Comparison of peak z-scored activity during singing and BOS playback across lobules. Each point: one unit during one syllable. Red = HF cells; blue = LF cells; black = multi-units. Dashed lines: z = 2.5 and unity line.

Proportions of significant singing-related and BOS playback responses in the lobules IV, V, VI, VII, VIII and IX of the cerebellum.

Alignment of singing- and playback-related activity in lobule V.

A: Song spectrogram (top) ad example raster and PSTH during singing (green) and BOS playback (magenta). Vertical green and magenta bars on the side indicate 95% confidence limits for singing-related and playback respectively. Both conditions show partially aligned modulation, with increased firing after syllable a onset and before b offset. B: Cross-correlation between singing- and playback-related PSTHs shown in A (red). Shuffled-data mean (solid black) and ±2 SD interval (dotted black) are shown for comparison. Right: autocorrelation of the singing PSTH. C: Mean normalized cross-covariance between singing- and playback-related PSTHs across all units (red) with jackknife confidence interval (dotted black, see Methods). Right: corresponding normalized autocovariance of singing PSTHs. D: Distribution of peak latencies in cross-covariance for all recorded neurons showing a significant peak in singing and playback PSTHs covariance. Negative values indicate leading activity during singing compared to playback.

Effect of DAF on neuronal activity in lobule V.

A: Spectrogram of a motif without (top) and with (bottom) DAF. B: PSTH and raster of an example unit during singing; blue dots and line indicate trials with DAF. Gray shade indicates the duration of the DAF. C: PSTH and raster of the same unit during white noise playback outside singing episodes (top: onset, bottom: offset). Gray shade indicates noise playback. D: Z-scored difference in firing rate between song renditions with vs without DAF across 164 sites in lobules IV–VI and others (20 ms bins). Green dashed line: DAF onset. Black asterisks: significant differences (t-test, p<0.05). Black arrow: example in B. E: Same as D for white noise playback onset outside singing episodes. Color code: z-scored firing relative to baseline. Black asterisks: significant changes (t-test, p<0.05, difference in firing rate between the bin of interest and a reference bin situated 100 ms before white noise playback onset). Black arrow: example in C. F–I: Z-scored average firing rates before and after noise onset during singing (DAF) vs playback (WN). Singing values represent the difference between renditions with vs without DAF. White noise significantly modulated firing during playback but not during singing (related t-tests: F–IV, -1.88 ns vs -6.12, p>0.05 vs p<0.001; G–V, -0.27 ns vs -9.88, p>0.05 vs p<0.001; H–VI, -1.76 ns vs -6.7, p>0.05 vs p<0.001; I–VII–X, -0.87 ns vs -4.17, p>0.05 vs p<0.001).

Singing-related activity is strongly modulated around syllable boundaries in lobule IV.

A: Spectrogram of a song bout (top) and filtered LFP trace (bottom). B: Recording location (i) and spike shape of the recorded neuron (ii). C: Spectrogram of a song motif. D: Raster plot (bottom) and PSTH (top) of the same HF neuron during singing (green) and BOS playback (magenta). Solid lines: baseline firing rate around singing (green) and playback (magenta). Vertical green and magenta bars: 95% confidence intervals for singing-related and playback-related PSTHs respectively. Vertical dashed lines: syllables onsets/offsets. E: z-scored activity during singing vs BOS playback for all lobule IV neurons (n=3 birds). Each point: one unit during one syllable. Red = HF cells, blue = LF cells. Star-shaped points = example neuron from panel E during syllables a–c. F– G: Mean firing rate of HF units (i&ii) and all units (iii&iv) in lobule IV aligned to syllable onsets (i&iii) and offsets (ii&iv). Red: activity during spontaneous singing; blue: activity aligned to randomly sampled instants within motifs. The motif-averaged firing rate was subtracted; shading indicates ± SEM. Numbers in insets denote (syllable, unit) case number. Asterisks mark significant firing rate modulation at boundaries. G: Same analysis as F for lobule V.

Coordinates for stereotaxic surgery

A: Nissl stained coronal slices of an example control bird (i) and a bird with lesioned lateral DCN (ii). B: Effect of sham lesions on control birds. The sham lesion didn’t significantly reduce the duration of syllables (t(19) = 0.79, p>0.05, related t-test on log transformed durations (see Methods)). C: Mean syllable duration as a function of the lesion efficacy. Each blue dot represents a bird. D: Same as Figure1D. E: Same as Figure1D but with compensation in reduction of syllable amplitudes (t(33) = -5, p<0.001, related t-test on log transformed durations). F: Numbering of the cerebellar lobules at the level of the sagittal symmetry plane (i) and at the level of the lateral cerebellum (ii). The laterality corresponds to the laterality of the lateral deep cerebellar nuclei.

A: Classification of SU in awake birds according to the log of the ISI and firing rate during baseline as in (Van Dijck, et al., 2013). B: Same as in A but in anesthetised recordings. The colour codes are the same as in figure 3. C,i: Superimposed LFP traces of putative Simple Spikes from the recording shown in figure 3,I. A,ii: Superimposed LFP traces of putative Complex Spikes from the same recording. D: Cross-correlogram between putative Complex Spikes and Simple Spikes showing a pause in Simple Spike firing straight after the occurrence of a Complex Spike.

Significance of the responses to BOS in anesthetised birds.

A: Histogram of bootstrap distributions of the proportion of cases (recording site x syllable) with z-score above 2.5 recorded in lobule III. The bootstrap was performed by measuring the proportion of z-scored firing rates above 3 during randomly distributed intervals outside the BOS stimuli. The procedure was repeated 100 times. The vertical red line represents the proportion shown in figure 2. The significance of the proportions from figure 2 is indicated by an asterisk. B,C,D,E,F,G,H: same as A but for the lobules IV, V, VI, VII, VIII, IX, X.

Response of the neuronal activity in cerebellar lobules to the onset of white noise.

A: Each line represents the colour coded z-scored psth of a recording site in the lobule III. Only the recording sites are selected for which z>2.5 or z<-2.5 after noise onset. The green vertical dashed line represents the white noise onset. The bin for the psth is 20 ms. The colour code bar is on the right. B,C,D,E,F,G,H,I,J: same as A but for the lobules IV, V, VIa, VIb, VII, VIII, IXa, IXc and X.

Selectivity of cerebellar neurons in anesthetised recordings.

Each dot represents the selectivity of the neuronal activity of the recording sites to BOS vs conspecific song stimuli measured by d’. The horizontal dashed lines at d’=1/-1 represent the level of significance. d’>1 means the neuron is more selective to BOS that reverse BOS or conspecific song. d’<-1 means the neuron is more selective to conspecific song or reverse BOS than to BOS. The recording sites are sorted by lobule: lobule III, IV, V, VIa, VIb, VII, VIII, IXa, IXb, IXc, X, outside cb. The selectivity d’ was measured as in (Solis & Doupe, 1997) and is a normalised difference of the mean firing rate during the two types of audio stimuli.

Response of the neuronal activity in cerebellar lobules to the onset of BOS.

A: each line represents the colour coded z-scored psth of a recording site in the lobule III. Only the recording sites are selected for which z>2.5 or z<-2.5 after BOS onset. The green vertical dashed line represents the white noise onset. The bin for the psth is 20 ms. The colour code bar is on the right. Left: Baseline activity average over the same number of trials as BOS playbacks. Right: Average activity during BOS playback. B: same as A but for the lobule X.

A: Same neuron and legend as figure 4,C but for reversed BOS: ra stands for reversed a, rb for reversed b. B: Same neuron and legend as figure 4,C but for other conspecific song. The conspecific song motif consists of the syllables s,t,u,v,w. C: Same neuron and legend as figure 4,C but for White Noise.

Selectivity of cerebellar neurons in awake recordings.

A: Each dot represents the selectivity of the neuronal activity of the recording sites to BOS vs reverse BOS stimuli measured by d’. The horizontal dashed lines at d’=1/-1 represent the level of significance. d’>1 means the neuron is more selective to BOS than reverse BOS or conspecific song. d’<-1 means the neuron is more selective to conspecific song or reverse BOS than to BOS. The recording sites are sorted by lobule: outside cb, below the lobules, lobule IV, V, VIb, VII, VIII, IX, X. B: Each dot represents the selectivity of the neuronal activity to BOS vs reverse BOS stimuli measured by d’. The measure of selectivity d’ was proposed in (Solis & Doupe, 1997) and is a normalized difference of the mean firing rate during the two types of audio stimuli.

Correlations between the firing rate and syllable duration.

A: Principle of measuring the correlation between the duration of a syllable and the firing rate during the syllable production. Example spectrogram (top) and neuronal firing (bottom) for 3 renditions of the motif. B: Example of a significant correlation between a multiunit neuronal activity and the syllable duration. C: Proportion of cases (syllable x recording site) featuring a significant correlation between the firing rate and the syllable duration in different lobules. D: Bootstrap analysis showing the significance of the proportion of significant correlations for the lobule V.

Bootstrap analysis for correlation between duration and detrended firing rate per lobule, pitch and firing rate in premotor window, amplitude and firing rate in premotor window and spectral entropy and firing rate in premotor window.

The bootstrapped distribution of correlation coefficients is shown in blue. The number of significant correlations found in the real, non-shuffled data set is shown as a red vertical line. A, B, C, D: Bootstrap analysis for the correlation between the duration of a syllable and the activity during syllables for the lobules IV, V, VI and other lobules. E, F, G, H: Bootstrap analysis for the correlation between the amplitude of a syllable and the activity during a premotor window of the syllable for the lobules IV, V, VI and other lobules. I, J, K, L: Bootstrap analysis for the correlation between the pitch of a syllable and the activity during a premotor window of the syllable for the lobules IV, V, VI and other lobules lobules. M, N, O, P: Bootstrap analysis for the correlation between the spectral entropy of a syllable and the activity during a premotor window of the syllable for the lobules IV, V, VI and other. The title of each subplot specifies the lobule and feature for which the bootstrap analysis was conducted as well as the number of (syllable x recording site) cases with significant correlations among the total number of cases. The presence of an asterisk means that the proportion of (syllable x recording site) cases with significant correlations is significant according to the bootstrap analysis. Black asterisk represents significance without Bonferroni correction. Blue asterisk with Bonferroni correction

Same as supplementary figure 10 but without detrending the firing rate

Same as supplementary figure 10 but for HF units.

Same as supplementary figure 10 but for LF units.

Correlations between neuronal activity and song features: We measured the correlation between the firing rate and the acoustic features (pitch, amplitude and spectral entropy, (Sober et al., 2008)) as well as the duration of the syllables (Suppl Fig 9-13). When measuring the correlation between a feature of a syllable and the firing rate, we compensated for the fluctuations or drifts of the firing rate on the long-time scales by detrending the firing rate (Suppl Fig 10, see Methods). We also measured the correlation between the duration of the syllables and the firing rate without detrending (Suppl Fig 11), since detrending in itself may induce spurious correlations (Harris, 2021). We only report correlations that were present whether detrending was applied or not to avoid artefactual detection of correlations. In the lobule V, the proportion of neurons that displayed significant correlation in their firing rate with the duration of the syllables amounts to 15% of all cases (syllablexrecording site, see example on Suppl Fig 9) and this proportion is unlikely to be found by chance (bootstrap analysis in Suppl Fig 9, 10 and 11). The significant correlations between syllable duration and firing rate in lobule V are present in the population of HF as well as LF cells (Suppl Fig 11B and Suppl Fig 12B). In the other lobules of the cerebellum, the proportion of significant correlations with the syllable duration is around 10% and is not significant when detrending the firing rate (Suppl Fig 9 and Suppl Fig 10). Among all other acoustic parameters, the pitch and spectral entropy of syllables was found to significantly correlate with the neuronal activities in lobules IV only (Suppl Fig 10 and 11), and when separating cell type we found significant correlation of these acoustic parameters with the HF firing rate only (Suppl Fig 12 and 13). Altogether, this correlation analysis should be interpreted carefully (Harris, 2021) but we believe it may point to a role of lobule V in tracking or modulating syllable duration fluctuations.

A: Activity of the neuron from figure 4 aligned on movement onset of the bird. Raster plot (bottom) and average firing rate PSTH (top, black). Each line of the raster plot corresponds to a movement of the body or head of the bird outside singing. Time 0 corresponds to the onset of the movement. B: Activity of neuron from figure 7 aligned on movement onset of the bird. Horizontal black lines represent the average firing rate and 95% confidence interval (dashed) during the same baseline period as in figures 4 and 7 respectively.

Same figure 7,F,G but for lobule VI.

Total number of units recorded in anesthetized birds.

Significant units are units with z scored firing rate during BOS above 2.5. HF are high firing single units. LH are low firing single units.

Alignment of motor and audio related activity in all recorded lobules of the cerebellum.

Nb_recs is the number of recorded units during singing and BOS playback. Nb_sig is the number of recorded units with a significant alignment between singing and BOS evoked activity. HF and LF indicate the number of HF and LF units with significant alignments of singing and BOS evoked activity.