(A) Pitch training paradigm. White noise (WN) feedback is delivered during renditions of a specific target syllable when its pitch is either below or above a threshold (pink fill in ‘pitch’ histogram), depending on whether the objective is to train pitch shifts up or down, respectively. The schematic shows training for upward pitch shifts. Over the course of training (3–10 hr, mean ~6 hr), birds progressively modify pitch in the direction that escapes WN so that in this example the ‘Trained’ distribution is shifted upwards relative to ‘Baseline’. (B) Summary of the magnitude of pitch change across experiments (mean ± standard error of the mean [SEM], N = 11 experimental trajectories over 3 birds); individual points represent the mean for individual birds. *p < 0.05, Wilcoxon signed-rank test. (C) Spectrogram of a song bout in an example experiment, with the syllable ‘F’ targeted with pitch-contingent WN. Scale bar, 500 ms; y-axis, 0.5–7.0 kHz. (D) Change in cross-covariance during learning for target syllables. Left: mean ± SEM cross-covariance for ‘Baseline’and ‘Trained’ periods (last quarter of renditions during the training session) across experiments (N = 38 LMAN–RA site pairs, 11 experiments, 3 birds). Right: mean ± SEM change in cross-covariance over the course of training. Black bar indicates time bins with values significantly different from zero (thin, p < 0.05; thick, p < 0.005, Wilcoxon signed-rank test). (E) Same as (D), but for non-target syllables. For a given experiment, there was only one target syllable, but multiple non-target syllables (mean, 4.2 syllables). Thus for each LMAN–RA site pair, data were first averaged across all non-target syllables before plotting (N = 38 pairs, 11 experiments, 3 birds). (F) Summary of change in LMAN–RA cross-covariance during training for target and non-target syllables. For each combination of paired LMAN–RA sites and syllables, we computed the average change in cross-covariance (Trained – Baseline) in a 15-ms window centered at the peak of the average end-of-training cross-covariance (−3 ms) [N = 38 (Target) and 158 (Non-target) combinations of paired sites and syllables, across 11 experiments in 3 birds]. *p < 0.05, mixed effects model (fixed intercept and effect of syllable type; random effect of intercept and syllable type grouped by experiment ID). #p < 0.05, mixed effects model (fixed intercept and random effect of intercept grouped by experiment ID). (G) Time course of pitch change. Each training trajectory was analyzed by binning renditions into four training stages with equal numbers of renditions (i.e., quartiles). The average pitch change across experiments is plotted for each training stage (N = 10 experiments in 3 birds; excluding one experiment for which neural data was recorded only during baseline and the end of training.). Spacing of stages along the x-axis maintains the relative timing of stages (time of median rendition for each stage relative to median baseline rendition: stages 1, 2, 3, and 4: 1.02, 2.35, 3.73, and 5.46 hr). *p < 0.05 vs. 0, Wilcoxon signed-rank test. (H) Time course of change in LMAN–RA cross-covariance for the target syllable for the same experiments illustrated in (G) (N = 37 LMAN–RA site pairs, 10 experiments, 3 birds). *p < 0.05 vs. 0, Wilcoxon signed-rank test; ##p < 0.005, last two vs. first two training quartiles.