Introduction

Motivational drive affects attention and associated sensory responses to external events (Allen et al., 2019; Aton, 2013; Hindmarsh Sten et al., 2021)). Hungry or thirsty animals are more alert to food or water cues, which evoke larger brainwide neural responses (Allen et al., 2019; Burgess et al., 2018). Courtship also influences behavioral and neural responses to sensory events (Sakata and Brainard, 2009, 2008; Skals et al., 2005; Zhang et al., 2018), but it remains unclear how representations of self-produced behaviors change with the presence of an audience during courtship. Here we recorded auditory cortical neurons in birds singing alone and to females to test how auditory feedback signals depend on courtship-associated changes in motivational state.

Adult male zebra finches practice undirected song alone and also direct courtship song to females (Kao et al., 2008; Zann, 1996). Both undirected and directed songs combine introductory notes, calls, and a stereotyped sequence of phonologically distinct syllables called motifs (Hyland Bruno and Tchernichovski, 2019; Rajan and Doupe, 2013). The acoustic structure of undirected and directed motifs is highly similar in adult finches, enabling singing-related neural activity to be precisely aligned and compared across contexts (Kao et al., 2008; Woolley et al., 2014; Zann, 1996). A first goal of this study was to examine motif-aligned discharge in auditory cortical neurons to test if neural representations depend on courtship state.

A second goal was to test how perceived song errors are represented across performance modes. In past work, perceived errors during undirected singing were experimentally controlled with syllable-targeted distorted auditory feedback (DAF) (Andalman and Fee, 2009; Tumer and Brainard, 2007). Though DAF does not capture all aspects of natural song learning, it brings song evaluation under experimental control (Ali et al., 2013; Canopoli et al., 2014; Hoffmann et al., 2016). Birds learn to change the way they sing a syllable to avoid DAF, and several studies support a reinforcement learning framework. First, a dopaminergic (DA) projection from the ventral tegmental area (VTA) to Area X, the striatal nucleus of the song system, is necessary for both natural and DAF-based learning (Duffy et al., 2022; Gadagkar et al., 2016; Hisey et al., 2018; Hoffmann et al., 2016; Xiao et al., 2018a; Ali et al., 2013; Harding, 2004; Scharff and Nottebohm, 1991). Second, Area X projecting DA neurons exhibit reward-prediction error (RPE)-like signals during singing: phasic activations following undistorted renditions and suppressions following distorted ones (Gadagkar et al., 2016). Third, photoactivation or suppression of DA release in Area X can reinforce or suppress syllable variations (Hisey et al., 2018; Xiao et al., 2018a). The role of RPE-like DA signals in evaluating both reward and song outcomes suggest common principles for birdsong learning and more classic reinforcement learning commonly studied in hungry or thirsty mammals (Chen and Goldberg, 2020).

Recently we discovered that DA signals retune during courtship. Specifically, DAF-associated DA signals known to evaluate undirected song are uniformly reduced during female-directed singing - as if a bird does not attend to his own mistakes (Roeser et al., 2023). This discovery raises two possibilities. First, DA error signals might be retuned at the level of VTA, for example by behavioral state-dependent modulation of intrinsic or synaptic excitability of VTA DA neurons (Xiao et al., 2018b). If this is the case, then error responses in upstream inputs to VTA could exhibit similar tuning to auditory feedback during directed and undirected song, but their influence on DA spiking is altered. Another possibility is that brainwide responses to auditory feedback differ when alone versus when courting a female, analogous to how motivational drives such as thirst or hunger retune widespread neural responses to water or food cues (Allen et al., 2019; Burgess et al., 2018), or how auditory cortical signaling is affected by attention (Ahissar et al., 1992; Hubel et al., 1959). This idea predicts that auditory representations even in a primary cortical area would change with possible attentional and motivational changes associated with a transition from singing alone to singing to a potential mate.

To test these possibilities we recorded single neuronal activity in Field L in birds singing alone and to females. Field L is a primary auditory cortical area situated at the bottom of a cortical hierarchy that projects into multiple higher auditory areas that, in turn, project to VTA (Bottjer et al., 2000; Foster and Bottjer, 1998; Mandelblat-Cerf et al., 2014; Moore and Woolley, 2019). Here we report that auditory responses to both bird’s own song and DAF-induced errors changed in heterogeneous ways at the transition from undirected to courtship directed singing. Thus a bird’s auditory representations of its own song vary across practice and courtship performance modes. Together with past work showing a re-tuning of premotor signals during courtship (Kao et al., 2008; Singh Alvarado et al., 2021; Woolley et al., 2014), our discovery that a sensory area retunes is consistent with brainwide changes in neural responsiveness during courtship.

Results

Singing-related discharge in Field L is altered during courtship

Because we were interested in how Field L activity changed during female directed song, we recorded at least 20 song motifs during undirected singing and then presented females to elicit courtship song (n=161 neurons, n=11 birds, Methods). Consistent with past work (Keller and Hahnloser, 2009), we observed heterogeneous singing-related firing in Field L neurons, ranging from highly temporally precise to variable discharge across motif renditions (Figure 1A-C). To compare neural responses between undirected and directed singing, we focused on motif-aligned activity to compare acoustically similar vocalizations in the undirected and directed states (Figure 1A-F). For each neuron, we analyzed mean firing rates, burst fraction, and the temporal precision of motif-locked discharge depended on courtship state, using analyses previously described (Methods)(Kao et al., 2008; Sakata and Brainard, 2009; Woolley et al., 2014). Burst fraction was computed as the fraction of spikes occurring in burst events, defined as three (or more) spikes with two (or more) consecutive interspike intervals less than the 25th percentile of the ISI distribution (Methods). The temporal precision of neuronal song locked firing was computed as the intermotif correlation coefficient (IMCC, Methods) (Goldberg and Fee, 2010; Olveczky et al., 2005;Kao et al., 2008; Sakata and Brainard, 2009; Woolley et al., 2014).

Singing related discharge can retune during courtship.

(A-C) Three example neurons with stable firing properties across alone and female-directed song conditions. Top to bottom: single trial example spectrograms, spike discharge, corresponding spike raster plots, and rate histograms (aligned to motif onset) from motif renditions singing alone (black) and to the female (green). Black vertical scale bar for spiking activity is 0.2 mV, y-axis limits of spectrograms are 0 to 8 kHz. (D-F) Data plotted as in A-C for three example neurons with significant context-dependent changes in firing. (H-J) Scatter plots of mean firing rates (H), IMCC values (I) and neuronal burst fraction (J) for 161 neurons when singing alone and to the female. Red: neurons with significant change across conditions (p<0.01, Methods).

At the population level, neither the temporal precision nor mean firing rate of neurons changed with courtship (Figure 1H-I). A small but significant increase in burst fraction was observed (paired t-test, p=1.16✕10−4, n=161 neurons, Figure 1J). Thus, Field L neurons did not uniformly change their singing-related firing patterns at the transition to courtship singing. But individual neurons could exhibit significant changes in discharge with changes in courtship state, including increases or decreases in their average motif-locked firing rate (n=24 increase; n=22 decrease, p<0.01), burst fraction (n=21 increase; n=3 decrease, p<0.01), or precision of timing within the motif (n=11 increase; n=6 decrease, p<0.01) (Figure 1D-F, p values derived from Monte Carlo shuffles by condition, Methods).

Singing-related performance error signals can retune during courtship

Past work showed that Field L neurons can exhibit responses to distorted auditory feedback (DAF) during undirected singing (Keller and Hahnloser, 2009). To test if error responses exist and/or retune during courtship, we recorded Field L neurons during lone and directed singing as we controlled perceived song quality with syllable-targeted DAF (Andalman and Fee, 2009; Tumer and Brainard, 2007). For each bird, a specific syllable was probabilistically targeted with a 50 ms song-like sound played through speakers surrounding the bird (Methods) (Chen et al., 2019; Gadagkar et al., 2016; Hamaguchi et al., 2014). To quantify error responses across the population of neurons and across conditions, we compared the activity between randomly interleaved renditions of distorted and undistorted songs. For each neuron and for each behavioral condition (alone or directed), we first performed the same analysis as in a previous Field L recording study during undirected singing (Keller and Hahnloser, 2009) (Methods). This analysis identified 58/161 neurons as error responsive in undirected song, and 55/161 during directed singing, and 33/161 exhibiting error responses during both conditions. Visual inspection of error responses in motif-aligned rasters revealed that while some of these neurons unambiguously exhibited robust DAF responses (Figure 2D), many exhibited extremely subtle ones (Figure 2–figure supplement 2A-B).

Error responses in Field L neurons can retune during courtship.

(A-C) Three example neurons with different error responses across courtship conditions. Top to bottom: single trial example spectrograms and spiking activity for undistorted and distorted trials, corresponding raster plots (blue vertical bar denotes feedback target time in undistorted renditions; red shading denotes actual feedback time on distorted renditions; pink vertical dotted line denotes onset and offset of song motif). Corresponding rate histograms for undistorted renditions (blue trace) and distorted renditions (red trace). Below are the same plots, but for songs directed to the female. Bottom: z-scored difference between undistorted and distorted rate histograms for singing alone (black trace) and singing to female (green trace). All data are time-aligned to the onset of the motif. (D-F) Data plotted as in A-C for three neurons with similar error responses across courtship conditions. Error scores for each neuron and condition are enumerated as insets in the histogram. Scale bar for spiking activity is 0.2 mV. Vertical axis limits for spectrograms are 0 to 8 kHz.

Statistical tests defining Field L neurons as error-responsive or not in a binary fashion may not be suitable if the underlying population of error responses exist on a continuum from error non-responsive to responsive. To score each neuron’s error response and identify the shape of the error response distributions across the population of neurons, we computed the z-scored difference between target onset–aligned distorted and undistorted rate histograms (target onset defined as the median DAF onset time relative to distorted syllable onset, n = 161 neurons in 12 birds), as previously described (Gadagkar et al., 2016). We defined the error response as the average z-scored difference in firing between undistorted and distorted renditions during the 100 ms interval following target onset (Methods). In past recordings from VTA, this analysis yielded a bimodal distribution of z-scored error responses which made classification of error-responsive neurons straightforward. Yet when we plotted the distribution of error responses across the 161 Field L neurons recorded during both undirected and directed singing, we observed unimodal distributions consistent with a continuum of Field L responses (Figure 2–figure supplement 1). This analysis suggests that no true threshold exists to unambiguously define a neuron as an error neuron. To visualize a distribution of error scores for parts of the song that were never targeted with DAF, we repeated the same analysis for a time window of the motif preceding the target time (i.e. when the auditory feedback was undistorted across all trials). As expected, we observed significantly lower error scores (Figure 2–figure supplement 1, p<10−10, Wilcoxon signed-rank test, Methods). For the purposes of examining courtship-state dependent re-tuning of error signals we conservatively defined ‘error neurons’ as those with an absolute z-scored difference in firing rate between distorted and undistorted renditions greater than 2.5 in either the directed or undirected conditions (Methods).

To test if courtship state affected these most robust error responses in our dataset, for each error neuron we compared the z-scored error response across undirected and directed singing. Interestingly, though VTAx DA neurons uniformly decreased their error response during courtship singing (Roeser et al., 2023), Field L neurons exhibited heterogenous re-tuning: some neurons exhibited error responses only when singing to the female (n=22), others exhibited error signal attenuation during female-directed song (n=11), while others were not significantly affected by courtship state (n=10, Figure 2). Together, these data show that Field L responses to distorted auditory feedback during singing can retune during courtship.

Discussion

By recording from the primary auditory cortex in zebra finches singing alone and to females, we discovered, for the first time to our knowledge, that auditory representations of an animal’s own vocalizations change with an audience. These findings extend past work in finches showing that social context affects auditory responses to the calls of conspecifics (Angeloni and Geffen, 2018; Menardy et al., 2014, 2012; Remage-Healey et al., 2010)). More broadly, these findings support the idea that auditory cortical activity is modulated not just by acoustic features (Gervain and Geffen, 2019; King et al., 2018) but also diverse phenomena such as attention (Hubel et al., 1959), primary rewards (David et al., 2012), task parameters (Downer et al., 2015; Fritz et al., 2003); (Angeloni and Geffen, 2018; Menardy et al., 2014, 2012; Remage-Healey et al., 2010), and even hormone levels (Angeloni and Geffen, 2018; Menardy et al., 2014, 2012; Remage-Healey et al., 2010).

Detecting differences between predicted and actual sensory feedback is a crucial aspect of motor learning (Keller and Mrsic-Flogel, 2018), and auditory cortical neurons in diverse species can signal mismatch errors (Eliades and Wang, 2008; Keller and Hahnloser, 2009; Parras et al., 2021; Ulanovsky et al., 2003). In past recordings from VTA we observed a bimodal distribution of error responses, and the error-responding neurons were the ones that projected to Area X (Gadagkar et al., 2016). Yet in cortex there is uncertainty about the suitability of classifying single neurons by response profile, as task-relevant parameters that can be decoded from neuronal populations may not be apparent when examining single neuronal representations (Kaufman et al., 2014; Mante et al., 2013; Rigotti et al., 2013; Williams et al., 2018). We observed a continuum of error responses during both directed and undirected singing which made it difficult to unambiguously define a neuron as being error responsive or not.

Because the main goal of this study was to test if courtship-associated reduction in error signaling, recently observed in VTA DA neurons (Roeser et al., 2023), resulted from a local process in VTA or reflected a brainwide re-tuning of auditory responsiveness, we imposed an arbitrary threshold of 2.5 in the z-scored error response to test if the most robust error signals in Field L depended on courtship state. Surprisingly, we discovered that Field L neurons could retune at the transition from lone to courtship singing in diverse ways, consistent with a brainwide process that does not fully explain the uniform error signal attenuation observed in VTA. Interestingly, Area X and LMAN neurons uniformly increase their temporal precision and reduce their burstiness during courtship singing (Kao et al., 2008; Roeser et al., 2023; Sakata and Brainard, 2008; Singh Alvarado et al., 2021; Woolley et al., 2014). In contrast, Field L neurons could exhibit increases or decreases in burstiness, temporal precision, or mean rate with courtship.

An open question is how Field L receives information about whether or not a female is present, and how this information influences neural activity. Several neuromodulatory systems with possible information about courtship state project to songbird auditory forebrain, including acetylcholine (Shea and Margoliash, 2010), serotonin (Yip et al., 2020), and norepinephrine (Cardin and Schmidt, 2004). Estrogens can also rapidly modulate auditory firing (Remage-Healey et al., 2010; Scarpa et al., 2022). Yet how the courtship state may be orchestrated across multiple brain regions remains unclear. One possibility is that hypothalamic nuclei such as the medial preoptic nucleus (MPOA) initiate the courtship state. The MPOA projects to multiple brainstem neuromodulatory areas which in turn project broadly throughout the forebrain, including the song system and auditory system (Riters and Alger, 2004)(Ben-Tov et al., 2023; Castelino and Ball, 2005; Singh Alvarado et al., 2021). It will be interesting in future studies to examine the neural signals propagating through these pathways at transitions into and out of the courtship state.

Materials and Methods

Subjects

Sixteen adult male zebra finches (>90 days post hatch) were the subjects of this study. Animal care and experiments were carried out in accordance with NIH guidelines and were approved by the Cornell Institutional Animal Care and Use Committee.

Surgery and awake-behaving electrophysiology

For chronic neural recordings, subjects were anesthetized with isoflurane inhalation and mounted on a stereotaxic instrument for probe implantation. All probes were 16-channel moveable electrode bundles (Innovative Neurophysiology). Probes were implanted 1.5-2.0 mm anterior and 1.5-2.0 mm lateral of the bifurcation of the mid-sagittal sinus to target Field L (Keller & Hahnloser 2009) at a head angle of 80 degrees (measured as the angle from the tip of the beak to the center of the ear bars relative to the horizontal plane). The end of the cannula was implanted 1.5 mm ventral to the surface of the brain. Birds were then placed alone in a sound isolation chamber (12 hour light/dark cycle) with ad libitum food and water and allowed 1 day to recover post op before being subjected to the distorted auditory feedback (DAF) protocol. 2-3 days were allotted for habituation to DAF and to ensure the bird began to spontaneously sing a sufficient number of motifs in social isolation before neural recordings. DAF was implemented with a custom LabView acquisition program that analyzed song syllables in real-time and delivered syllable-targeted feedback. DAF (50 ms broadband noise bandpass filtered at 1.5-8 kHz to match frequency range of zebra finch song) was played over speakers in the recording chamber on top of a specific target syllable randomly on 50% of motif renditions. Experiments were carried out in the male’s home cage, which was inside a sound isolation chamber. When the homecage lights came on each day, recording began and the male was left alone to sing at least 40 undirected song motifs. Female directed motifs were then recorded by presenting a female in the chamber in ∼10 minute intervals throughout the day until at least 40 directed song motifs were elicited (ref andreas paper). Electrode placement was verified at the end of the experiment with small electrolytic lesions, histology, and dark field imaging. 11 of the 16 implanted birds yielded single unit recordings and sang sufficient motifs for the experiment. Many channels on the probes recorded multi-unit activity, which were taken note of but not analyzed in this study.

Neural Recording and Analysis

Neural signals were acquired with the Intan RHD recording controller and 16-channel Intan headstages that directly interfaced with the moveable bundles. Sampling rate was set to 20kHz and recording was manually controlled with the Intan recording software, where a 60 Hz notch filter was applied and spiking activity of single units could be visualized in real time. Audio data and a real-time digital copy of the DAF signal were simultaneously recorded with neural data in the recording controller such that all data could be easily time aligned. A custom MATLAB GUI was used for visualizing song and neural data, and for spike sorting. Neural recordings were bandpass filtered between 0.4 kHz and 6-8 kHz and single unit spiking activity was manually sorted as previously described (Goldberg and Fee, 2010). Motif aligned spiking activity was time-warped to the median duration of undirected or directed motifs. Firing rate (FR) histograms were computed by binning spiking events in 10 ms windows and smoothing with a 3-bin moving average. To calculate the significance of the FR changes across behavioral contexts, we randomly assigned spike trains from each song motif trial as undirected or directed groups while conserving trial numbers, then calculated FR values for the randomized dataset, as previously described (Goldberg and Fee, 2010). This shuffling without replacement was repeated 10,000 times for each neuron, yielding 10,000 new changes in FR values. Measured changes in firing rate that were greater than the 99th percentile of the shuffled distribution were considered significant, as in previous studies (Sakata and Brainard, 2008). This procedure was repeated for IMCC and burst fraction. A p value less than 0.01 was considered significant to account for multiple comparisons. To quantify the degree to which neuronal firing was time-locked to song, the intermotif correlation coefficient (IMCC) was calculated as described previously (Chen et al., 2019; Kao et al., 2008; Olveczky et al., 2005). To compute IMCC, motif-aligned FR was mean-subtracted and smoothed with a Gaussian kernel of 20 ms SD, resulting in a rate vector, ri, for each motif. IMCC was defined as the mean of all pairwise correlation coefficients between ri as follows:

We defined bursts as events containing three or more spikes with consecutive interspike intervals less than the 25th percentile of the 25th percentile of the interspike interval distributions during singing. Burst fraction was calculated as the fraction of spikes during song motifs that occurred during bursts.

We quantified the error responsiveness for each neuron in two ways, following approaches previously described (Chen et al., 2019; Gadagkar et al., 2016; Keller and Hahnloser, 2009). Spike counts in 30 ms time windows shifted in 5 ms steps up to 50 ms after DAF offset were generated for undistorted and distorted trials. A WRS test assessed the significance of each 30 ms window, and only neurons with 2 or more consecutively significant windows were considered a significant error response. Neurons with two or more subsequent windows with P<0.05 were considered significant. (Keller and Hahnloser, 2009). We also calculated the z-scored difference between smoothed firing rates in undistorted and distorted trials for the time window 100 ms after feedback onset, and defined the error score as the average z-scored difference across the three bins centered around the absolute maximum difference during the response window. To test if error responses were attributable to DAF, we compared absolute error score distributions derived from the same analysis in the 100 ms window preceding target onset in the motif and therefore not associated with DAF. The pre-DAF error score distribution was significantly less in both conditions: a paired Wilcoxon signed-rank test indicated significantly lower error scores pre-DAF than post-DAF during undirected song (p<0.001, median pre-DAF: 0.82, median post-DAF:1.39) and directed song (p<0.001, median pre-DAF: 0.81, median post-DAF: 1.41). 3.4% of pre-DAF and 16.5% of post-DAF error scores were greater than 2.5. Given the continuum of post-DAF error scores in both directed and undirected conditions (Figure 2–figure supplement 1), we conservatively chose a threshold of 2.5 to define an error response. Neurons with error responses greater than 2.5 in only one condition (undirected versus directed) were considered to have retuned; neurons with error scores greater than 2.5 in both conditions were considered not to have retuned. Our results did not fundamentally change with an even more strict definition for an error response. With a more stringent threshold of 3, 12 neurons exhibit error response only when singing to the female, and 8 neurons exhibit error signal attenuation during female-directed song, and 6 neurons exhibit a stable error response across conditions.

Figure Legends

Distributions of absolute error responses in Field L neurons

Scatter plot where each dot represents the absolute z-scored error response of a single neuron during the 100 ms interval after the onset of DAF (blue) and 100 ms before DAF onset (orange) (Methods). Corresponding histograms for each condition are projected along the x and y axes. Gray dotted lines indicate the 2.5 cutoff threshold used to define error neurons.

Neurons exemplifying a broad range of error responses in Field L.

(A-C) Top to bottom: example spectrograms, spike discharge, corresponding spike raster plots, and rate histograms for undistorted (blue) and distorted (red) trials aligned to motif onset (blue vertical bar denotes feedback target time in undistorted renditions; red shading denotes DAF; pink vertical dotted line denotes onset and offset of song motif). Error scores for each neuron and condition are enumerated as insets in the histogram. Scale bar for spiking activity is 0.2 mV. Vertical axis limits for spectrograms are 0 to 8 kHz.