In vivo assessment of the neural substrate linked with vocal imitation accuracy

  1. Julie Hamaide
  2. Kristina Lukacova
  3. Jasmien Orije
  4. Georgios A Keliris
  5. Marleen Verhoye
  6. Annemie Van der Linden  Is a corresponding author
  1. Bio-Imaging Lab, Department of Biomedical Sciences, University of Antwerp, Belgium
  2. Centre of Biosciences, Institute of Animal Biochemistry and Genetics, Slovak Academy of Sciences, Slovakia

Abstract

Human speech and bird song are acoustically complex communication signals that are learned by imitation during a sensitive period early in life. Although the brain areas indispensable for speech and song learning are known, the neural circuits important for enhanced or reduced vocal performance remain unclear. By combining in vivo structural Magnetic Resonance Imaging with song analyses in juvenile male zebra finches during song learning and beyond, we reveal that song imitation accuracy correlates with the structural architecture of four distinct brain areas, none of which pertain to the song control system. Furthermore, the structural properties of a secondary auditory area in the left hemisphere, are capable to predict future song copying accuracy, already at the earliest stages of learning, before initiating vocal practicing. These findings appoint novel brain regions important for song learning outcome and inform that ultimate performance in part depends on factors experienced before vocal practicing.

Introduction

Human speech and bird song are highly complex and rapid motor behaviours that are learned by imitation from adults and serve to produce complex communication signals vital for social interactions (Brainard and Doupe, 2013). Both are acquired during a temporally well-defined sensitive period early in life which consists of a sensory learning phase where a perceptual target of speech sounds or a song model are memorised, followed by a sensorimotor learning phase, where human infants or juvenile birds engage in intense vocal practicing and will try to match the spectro-temporal characteristics of their own vocalisations to the previously established perceptual targets based on multi-sensory feedback (Doupe and Kuhl, 1999; Tchernichovski et al., 2001). Zebra finches, an established model to study certain aspects of human speech acquisition (Brainard and Doupe, 2013), learn to sing a single song that crystallises (i.e. remains unchanged) for life after the sensorimotor learning phase. The success of the song learning process can be accurately quantified by computing the acoustic similarity between the song learned and sang by the pupil and the original tutor song that is used as a reference (Tchernichovski et al., 2000). Intriguingly, only zebra finch males learn to sing. As a result, juvenile female zebra finches do not experience a sensorimotor and crystallisation phase for song learning like male zebra finches do. However, female zebra finches do rely on early exposure to song to form an auditory memory of tutor song to develop song preference (Hauber et al., 2013; Bolhuis and Moorman, 2015; Chen et al., 2016).

In songbirds, sensory learning is thought to depend on a distributed set of areas including several auditory regions (Bolhuis and Gahr, 2006; Hahnloser and Kotowicz, 2010). The motor aspect of song is encoded by cortical areas (Wild, 1997), which receive input from the vocal basal ganglia to introduce variability in vocal motor output during sensorimotor practicing (Bottjer et al., 1984). Auditory feedback signals that encode errors in own performance were detected in the secondary auditory areas (Keller and Hahnloser, 2009) and are likely transmitted to the dopaminergic midbrain nuclei (Mandelblat-Cerf et al., 2014), via the ventral pallidum (Chen et al., 2019). The dopaminergic midbrain nuclei and ventral pallidum were recently found to steer vocalisations towards the desired vocal target during sensorimotor song learning (Hisey et al., 2018; Chen et al., 2019). Each of these pathways is indispensable for sensorimotor learning to succeed, and during the sensorimotor learning phase, the precise interplay of these systems is tightly set. Furthermore, the behavioural difference between male and female zebra finches (only males sing) is reflected in the structural organisation of the zebra finch brain. That is, the song control system, a circuitry composed of well-delineated brain areas that are interconnected by fibre pathways and that are responsible for singing behaviour, is more enhanced in males compared to females (MacDougall-Shackleton and Ball, 1999). These structural differences include disparities in local volume, interconnectivity as well as intrinsic microstructural tissue properties (e.g. soma size of LMAN) (Nixdorf-Bergweiler, 1998). The differences between male and female birds in volume of the song control system nuclei arise during the song learning period (when juvenile males actively engage in vocal motor practicing [Nixdorf-Bergweiler, 1996].

Importantly, however, much like human speech, bird song is a complex culturally transmitted socially learned communication skill. Indeed, several studies suggest that social factors other than merely auditory access to a singing tutor (e.g. via auditory playback) are important for vocal learning. For example, live tutoring results in higher song similarity scores compared to what is achieved by tape tutored birds (Derégnaucourt et al., 2013). Social feedback that juveniles receive from the adult male and female zebra finch (its caretakers) during the sensory and sensorimotor period of vocal learning is capable of influencing song learning accuracy levels (Chen et al., 2016; Carouso-Peck and Goldstein, 2019). Therefore, it is expected that additional brain areas and neural circuits outside the traditionally studied auditory and song control system may be involved and perhaps are capable of influencing ultimate song performance (i.e. learning accuracy) levels.

These foregoing studies that aimed at unravelling which neural circuits are indispensable for song learning to succeed, used highly targeted methods such as permanent lesioning, temporarily inactivating or optogenetic neuromodulation of specific brain regions during for example tutoring sessions. Despite the exquisite spatial and/or temporal resolution of the latter research tools, they require strong a priori hypotheses of regions expected to be involved and findings obtained with such methods inform on the implication of (only) these specific brain regions (neural populations) in the process under investigation. In contrast, alternative research tools exist that enable to repeatedly and over extended time frames capture the structural and/or functional properties of the entire brain when subjects advance from the basic to advanced performance levels. Combining this brain-wide information with data-driven processing methods enable an unbiased identification of neural correlates that would otherwise be missed when using a hypothesis-driven region-of-interest (ROI) research strategy. Over the last decades, Magnetic Resonance Imaging (MRI) in combination with voxel-based analyses emerged as a prominent tool to uncover, follow and quantify plastic changes of brain function and structure arising along training and learning paradigms in humans and small animal species (Dayan and Cohen, 2011; Sagi et al., 2012; Zatorre et al., 2012). Despite the correlational nature of such findings, they benefit from a significant advantage as they allow to establish spatio-temporal maps that indicate when –along a training or learning paradigm– and where in the brain specific neuroplastic events occur and, as such, provide a strong basis for further in depth testing using highly sensitive and specific research methods that are capable of understanding the precise functional implications of the previously identified targets.

Inspired by such imaging studies, we set up a longitudinal study in both male and female zebra finches where we repeatedly collected structural MRI data of the entire zebra finch brain at six time points during and one time point after the critical period for song learning (Figure 1A). Using voxel-based statistical testing, we established spatio-temporal maps that revealed that (1) sex differences in local tissue microstructure exclusively co-localise with the song control nuclei and arise along the song learning process, and (2) most myelin-containing brain structures exhibit structural maturational changes between 20 and 200 dph. Further, starting from the advanced sensorimotor learning phase when the male pupils have already mastered most of the tutor song, and proceeding up to the stage where they reach full song crystallisation, we recorded the song sung by the male pupils the first days after the MRI session and computed the song learning accuracy level of the pupils by comparing the spectro-temporal properties of the pupil song to those of the tutor song. Exploiting the advantages of in vivo MRI, we performed brain-wide voxel-based correlational analyses to explore for relationships between song learning accuracy and local tissue volume (3D structural imaging) or the intrinsic tissue properties (Diffusion Tensor Imaging (DTI)) along the different stages of the song learning process. Even though the strength of DTI mainly lies in its sensitivity to detect changes in white matter, many studies have demonstrated its contribution in identifying microstructural learning-related changes in grey matter areas such as for example the hippocampus in humans and rats (Sagi et al., 2012) (for review Hamaide et al., 2016). Likewise, we discovered that the structural properties of specific parts of the secondary auditory cortices, that is left caudomedial nidopallium (NCM) and the caudomedial mesopallium (CM), and unexpectedly, the ventral pallidum (VP) and the fronto-arcopallial tract (tFA) correlate with song imitation accuracy, while the different nuclei of the SCS seemed not involved. Indepth analyses revealed that the structural properties of left NCM and the tFA mainly reflect relationships between performance and structure at the population-level, while the structural architecture of the CM and VP appears to change along the sensorimotor learning process, in individual birds. Fascinatingly, we also discovered that the structural properties of left NCM are predictive of future learning accuracy already before pupils actively engage in vocal practicing during the early phases of sensory learning.

Monitoring the maturing zebra finch brain in male and female finches.

(A) Study design. Structural MRI data were acquired during the sensory phase (20, 30 dph), sensorimotor phase (40, 65 dph), crystallisation phase (90, 120 dph) and well after song is mastered (200 dph) in male zebra finches. Structural MRI data were obtained at the same time points in female zebra finches. During each imaging session, we collected a DTI and 3D anatomical scan. The first days after each imaging session, we recorded the song sung by the juvenile male birds (), starting from 65 dph (only male zebra finches sing). (B-D) Statistical parametric maps highlighting voxels that display an interaction between age*sex for Fractional Anisotropy (FA), one of the parameters of DTI, during ontogeny (B-D). The crosshairs converge at the arcopallium (B), rostro-lateral Area X surroundings (C), and caudal Area X surroundings (D). The maps are thresholded at puncorrected <0.001; kE ≥30 voxels (B-D), and overlaid on the population-based template. The statistical maps are colour-coded according to the scales on the right. (E) Statistical brain maps illustrating the main effect of age of FA for male and female zebra finches. The results are displayed in accordance with pFWE <0.001 kE ≥5 voxels, and overlaid on the population-based template. The statistical maps are colour-coded according to the scales on the right. FA values range from 0 (black) to 1 (white). The areas demarcated by white-dotted lines refer to clusters identified in the interaction between age and sex with 1 (RA) and 2 (Area X) (B-D). The third and sixth rows present an average FA map calculated from FA maps of male birds obtained at 200 dph and serves to identify the anatomical (mostly white matter) structures covered by a significant cluster. See Supplementary file 1 for statistics. Abbreviations: LaM: mesopallial lamina; LFM: supreme frontal lamina; LFS: superior frontal lamina; tOM: occipitomesencephalic tract; CP: posterior commissure; FPL: lateral prosencephalic fascicle; tFA: fronto-arcopallial tract; Do: dorsal; Ve: ventral; Ro: rostral; Ca: caudal.

Results

Longitudinal structural MRI changes in the brains of maturing male and female zebra finches

We set up a longitudinal study where we repeatedly collected structural MRI data of the entire zebra finch brain (Figure 1A). These data consist of 3D anatomical scans and DTI scans. The 3D anatomical dataset enabled us to assess regional changes in brain volume that arise over time (brain development) or between male and female zebra finch brains (sex differences; these data have been published Hamaide et al., 2017). The DTI datasets allow to establish spatiotemporal maps that indicate when and where in the brain neuroplastic changes in tissue microstructure occur (Hamaide et al., 2016). In the current study, we focus on the Fractional Anisotropy (FA) outcome, a metric derived from DTI data. FA quantifies the directional dependence of water diffusion and hence indirectly reflects specific microstructural tissue characteristics (Beaulieu, 2002). Note that alterations in FA can be caused by a wide variety of microstructural tissue re-organisations including altered axonal integrity, myelination, axon diameter and density, change in cellular morphology, etc. (Beaulieu, 2002; Zatorre et al., 2012; Dyrby et al., 2018). Using non-invasive in vivo structural MRI, we have been able to detect sex differences in local volume in the maturing zebra finch brain (Hamaide et al., 2018a) and in intrinsic microstructural tissue properties in adult birds (Hamaide et al., 2017). The present data enables us to extend on the latter, as the present study includes DTI data obtained in juvenile zebra finches.

To unveil specific brain structures that display a sexual dimorph developmental trajectory of DTI properties that matches with the sexual dimorph song production behaviour, we tested for an interaction between age and sex using a voxel-based repeated measures ANOVA on the smoothed DTI parameter maps. Only those clusters that survive pFWE <0.05 and kE ≥ 5 voxels were considered significant. Such an interaction was detected in clusters bilateral at the Area X surroundings and the arcopallium (Figure 1B–D).

Since several brain areas displayed an interaction between age and sex over time, the main effect of age was explored in male and female birds separately. The data illustrate that most myelin containing structures and lamina of the juvenile zebra finch brain mature between 20 and 200 dph (Figure 1E). These lamina carry diverse collections of axonal fibres connecting distinct brain areas. Both in male and female birds, the clusters displaying a significant difference in FA covers almost the entire path of the occipitomesencephalic tract (tOM), from the arcopallium passing the rostral border of the thalamus before traveling ventrally towards the diencephalon. Furthermore, parts of the superior frontal lamina (LFS), mesopallial lamina (LaM), the caudo-dorsal extension of the Area X surroundings, anterior commissure, lateral prosencephalic fascicle (FPL) and fronto-arcopallial tract (tFA) could be observed as well. Interestingly, in females (less in males) a small portion of Field L showed a change as well.

In addition, several clusters identified by the (voxel-based) interaction between age and sex over time (Figure 1B) were only found to be significantly changing during ontogeny in males (indicated by the white dotted boxes in Figure 1E).

Song performance improves even after crystallisation

We extracted the acoustical properties of individual song syllables at each age of the male zebra finches to evaluate how the spectral and temporal structure of the syllables evolve from (advanced) plastic to fully mature stereotyped and crystallised song. The duration of the inter-syllable intervals gradually shortens from sensorimotor to song crystallisation and even after song crystallisation towards 200 dph (p<0.0001 F(3, 38.2)=13.8789; Supplementary file 2). This indicates that birds gradually sing faster. Syllable Wiener entropy scores decrease during the sensorimotor phase (p=0.0032 F(3, 37.8)=5.48; Figure 2—figure supplement 1), meaning that the syllables gain in tonality. None of the pitch-related measures, or frequency and amplitude modulation changed over time (Supplementary file 2). Further, by quantifying the standard deviation of the spectro-temporal features, we observe that syllables are sung with lower acoustic variability when song crystallises between 90 and 120 dph (Figure 2—figure supplement 1). These analyses informed that the syllables gain tonality (lower acoustic noisiness), become more structured (less noisy) and less acoustically diverse along sensorimotor song maturation. Furthermore, while the spectral content of syllables mainly forms during the sensorimotor phase, the temporal properties of the songs continue to change beyond the crystallisation phase. This corroborates previous studies in zebra finches (Glaze and Troyer, 2013).

To quantitatively assess fine-scale changes in song performance and to estimate song learning accuracy, we evaluated the progression of song learning from the advanced sensorimotor phase (65 days post hatching (dph)), over the crystallisation phase (90–120 dph), to fully crystallised song (200 dph; Figure 1A). We quantified how successful the juvenile birds (pupils) learned, that is copied, the tutor song, by computing the spectral similarity between the pupil and the tutor song (Tchernichovski et al., 2000). Song similarity to tutor song increased gradually from 65 to 200 dph (p=0.0251 F(3, 37.0)=3.4890; Figure 2A) reaching similar levels as described by others (Tchernichovski et al., 2000; Derégnaucourt and Gahr, 2013). Moreover, as song crystallisation results in a highly stereotyped, consistent order of syllables within a motif, sequence stereotypy also increased gradually from 65 to 200 dph (p=0.0052 F(3, 38.4)=4.7904, Figure 2B).

Figure 2 with 1 supplement see all
Song similarity improves beyond crystallisation.

Graph A and B refer to respectively song similarity to tutor song and song stereotypy in function of age. Both increase from 65 to 200 dph (mixed-effect model main effect of age: song similarity: p=0.0251 F(3, 37.0)=3.4890; sequence stereotypy: p=0.0052 F(3, 38.4)=4.7904). Each thin coloured or grey line refers to the average performance of an individual bird over the different ages. The bold black line presents the average group performance (mean ± SEM; n = 14; 20 data points per time point per bird). The colour-code of the lines in A encodes tutor identity, that is birds raised by the same tutor share the same colour (see Supplementary file 12). The colour-code illustrates that song similarity is dependent on tutor identity (mixed-effect model main effect for tutor: p=0.0159 F(7, 6.1)=6.7597). Asterisks indicate significant differences over time identified by a mixed model analysis with post hoc Tukey’s HSD test. *: p<0.05; **p<0.01. Abbreviations: dph: days post-hatching.

Notably, the overall variability in song similarity between birds was quite large indicating that not all birds copy the tutor song equally well. Prior studies have shown that successful song learning not only depends on the ability of the juveniles to hear the song model: social interactions between the juvenile and adult birds are crucial mediators in determining the overall quality of the song copy (Chen et al., 2016; Carouso-Peck and Goldstein, 2019). Consistent with this, we observed that song similarity depended on tutor identity, that is, juveniles will consistently sing a good or less good tutor song copy depending on their tutor (p=0.0159 F(7,6.1)=6.7597; Figure 2A – colours reflect tutor identity). This main effect of tutoring bird on song similarity outcome did not seem to depend on the length of the songs, as based on similarity scores obtained at 200 dph, no differences in the number of syllables for birds having a ‘high’ (>68%) or ‘low’ (<68%) similarity score could be observed (Supplementary file 3).

Song learning accuracy traces back to the CM, VP, tFA and NCM in the sensorimotor learning phase

Even though song performance improves from the sensorimotor to the crystallisation phase (and even beyond), not all birds learn to reproduce the tutor song equally well. Therefore, we set out to explore whether better song performance (i.e. more accurate tutor song copying), correlates with a specific structural signature in the brain. Inspired by ample in vivo imaging studies describing training- or learning-induced brain-behaviour relationships (Zatorre et al., 2012), we performed brain-wide voxel-based statistical analyses to highlight potential brain sites that present a correlation between song learning accuracy (% similarity between pupil and tutor song) and local volume (log-transformed modulated jacobian determinant (log mwj)) or intrinsic tissue properties derived from the DTI metrics, that is Fractional Anisotropy (FA) (Beaulieu, 2002; Zatorre et al., 2012). These brain-wide voxel-based analyses uncovered four clusters (Figures 34 and Supplementary file 4). Using various atlases of the zebra finch brain (http://www.zebrafinchatlas.org/; Nixdorf and Bischof, 2007; Poirier et al., 2008; Karten et al., 2013) and high-resolution tract tracings within the zebra finch brain (Hamaide et al., 2017), we identified that these clusters co-localise with two secondary auditory areas, that is the caudomedial nidopallium (NCM) and caudal mesopallium (CM), with a white matter tract that connects the basorostral nucleus to the arcopallium (frontoarcopallial tract (tFA) Wild and Farabaugh, 1996), and with an area at the base of the telencephalon termed the ventral pallidum (VP). The VP has a direct role in sensorimotor song learning (Chen et al., 2019) and it contains many fibres of passage that project from the midbrain dopaminergic nuclei to Area X of the anterior forebrain pathway and Area X-DLM projections that pertain to the song control system (Gale et al., 2008).

Figure 3 with 1 supplement see all
Song imitation accuracy correlates positively with Fractional Anisotropy (FA) in the tFA (A–B), VP (C–D) and NCM (E–H).

The statistical maps (A, C, E, G) present the outcome of the voxel-based multiple regression testing for a correlation between song similarity and FA (n = 14). The crosshairs point to the tFA in the left hemisphere (A), the VP (C), and NCM in the left (E) and right (G) hemisphere. Results are overlaid on the population-based MRI template and scaled according to the colour-code (T values) on the left of each statistical map. Only voxels that reached puncorrected <0.001 and take part of a cluster of at least 40 contiguous voxels are displayed. Graphs B, D, F and H visualise the nature of the correlation between song similarity and FA where individual data points are colour-coded according to bird-identity (i.e. one colour = one bird). The average within-bird correlation is presented by the coloured lines, while the black dashed line indicates the overall association between song similarity and FA, disregarding bird-identity or age. ‘r’ is the repeated-measures correlation (rmcorr) coefficient. The * indicates a significant rmcorr correlation between FA and % similarity in the VP (p=0.001) and right NCM (p=0.0121). Abbreviations: CA: anterior commissure; CP: posterior commissure; En: entopallium; MLd: dorsal part of the lateral mesencephalic nucleus; rot: nucleus rotundus; tFA: fronto-arcopallial tract; VP: ventral pallidum; Le: left; Ca: caudal; Ro: rostral; Ve ventral. See Supplementary file 4 for statistics.

Song imitation accuracy correlates negatively with the local volume of the VP (A–B) and the CM (C–D).

The statistical parametric maps present the outcome of the voxel-based multiple regression testing for a correlation between song similarity and local tissue volume (n = 14) and are visualised at pFWE <0.05 and kE ≥ 80 voxels, and overlaid on the population-based template. The crosshairs point to the VP (A) or the CM in the left hemisphere (C). T-values are colour-coded according to the scale immediately left to the SPMs. Graphs B and D inform on the nature of the association between song similarity (%) and log-transformed modulated jacobian determinant (log mwj; a metric reflecting local tissue volume). More specifically, the individual data points of the graphs are colour-coded according to bird-identity (i.e. one colour = one bird). The average within-bird correlation is presented by the coloured lines, while the dashed black line indicates the overall association between song similarity and log mwj, disregarding bird-identity or age. ‘r’ is the repeated-measures correlation (rmcorr) coefficient. The * indicates a significant rmcorr correlation between logmwj and % similarity in the VP (p=0.0057) and left CM (p=0.0126). Abbreviations: CA: anterior commissure; CM: caudal mesopallium; CN: caudal nidopallium; CP: posterior commissure; DLM: medial part of the dorsolateral nucleus of the anterior thalamus; En: entopallium; MLd: dorsal part of the lateral mesencephalic nucleus; tOM: occipitomesencephalic tract; TSM: septo-mesencephalic tract; VP: ventral pallidum; Le: left; Ca: caudal; Ve: ventral; Ro: rostral. See Supplementary file 5 for statistics.

Correlations between song similarity and FA were observed in the left tFA (peak: pFWE <0.001 T = 6.81; Figure 3A), and in left NCM (rostral NCM; peak: pFWE = 0.019 T = 5.69; Figure 3E and Figure 3—figure supplement 1). Furthermore, we found an additional cluster midsagittal near the striatum and mesopallium, extending laterally and caudo-ventrally adjacent to the septomesencephalic tract (TSM; sub-peak next to the TSM in the left hemisphere: pFWE = 0.002 T = 6.38; Figure 3C). Based on this spatial pattern and in accordance with the Karten-Mitra zebra finch brain atlas (http://www.zebrafinchatlas.org/; Karten et al., 2013), we identified this area as the VP. Interestingly, when inspecting the statistical maps at a less conservative threshold (puncorrected <0.001 kE ≥40 voxels), clusters could be also observed at the right tFA (peak: pFWE = 0.001 T = 6.42) and the right NCM (peak: pFWE = 0.032 T = 5.55; Figure 3G). Also at this less conservative threshold, we observe that the cluster covering the left NCM extends rostro-laterally towards the CM (sub-peak of NCM cluster: pFWE = 0.194 T = 5.01; Figure 3G and Figure 3—figure supplement 1).

To test if learning accuracy correlates with the local volume of specific brain areas (log mwj), we performed voxel-based analysis and found a significant anticorrelation in two brain areas, that is the VP (peak: pFWE <0.001 T = 8.06; Figure 4A) and the medial and lateral portions of the CM rostral to field L (resp. CMM and CLM) potentially including nucleus avalanche (Av; left: peak: pFWE = 0.001 T = 7.10; right: peak: pFWE <0.001 T = 7.42; Figure 4C).

Learning-related relationships versus between-bird variance

The voxel-based correlation analysis detects an overall association between song performance (% similarity) and the structural properties of the brain without taking the repeated measures into account. As a result, these analyses cannot infer whether the brain-behaviour associations are mainly driven by between-subject variation in performance and structure, or whether individual improvements in song imitation relate to specific structural properties of the clusters at specific learning periods. To make this distinction, we first extracted for each bird and each time point separately the mean log mwj or mean FA from the voxel-based clusters. Next, we performed (i) Spearman’s correlation analysis (ρ) to characterise potential correlations between the structural properties of the cluster-based ROIs and song similarity at 65 or 200 dph, and (ii) a repeated-measures correlation analysis (rmcorr Bakdash and Marusich, 2017; Figure 3B,D,F,H; Figure 4B,D) which takes for each bird the repeated-measures into account and can provide inference on the common association between brain structure and song similarity across the group of birds. In summary, this analysis can provide inference on potential learning-related changes in local brain structure. The outcome of the Spearman’s correlation and rmcorr analyses, including Benjamini-Hochberg correction for multiple comparisons, are summarised in Table 1 and Supplementary file 6, 7.

Table 1
Summary of within- and between-subject correlations of the cluster-based ROIs.
Correlation betweenCluster-based ROIHemisphereRmcorr65 dph200 dph
RPSpearmans’ ρPSpearmans’ ρP
% similarity and FAtFALeft0.12900.42000.78460.00090.77140.0012
Right0.02150.89400.70990.00450.71430.0041
NCMLeft0.25600.10600.67470.00810.68350.0070
Right0.38800.01210.63960.01380.56920.0336
VP0.49600.00100.81540.00040.78900.0008
% similarity and log mwjVP−0.42900.0057−0.79780.0006−0.44180.1138
CMLeft−0.39100.0126−0.52970.0514−0.30550.2882
Right−0.41600.0075−0.47690.0846−0.19120.5126
  1. ‘log mwj’ refers to the log-transformed, modulated and warped jacobian determinants; FA stands for Fractional Anisotropy, one of the DTI metrics. ‘r’ is the repeated-measures correlation coefficient of the within-subject correlation analyses. Spearmans’ ρ informs on potential correlations between the MRI parameters and song similarity at a specific time point between birds. Tests that survive Benjamini-Hochberg FDR correction for multiple comparisons are highlighted in bold (Supplementary file 6, 7) Abbreviations: dph: days post hatching.

In two of the clusters identified above, that is the NCM and the tFA, the association between song similarity and FA appeared to be driven by between-subject variance (65 dph: NCM: left: p=0.0081; right: p=0.0138; tFA: left: p=0.0009; right: p=0.0045; 200 dph: NCM: left: p=0.0070; right: p=0.0336; tFA: left: p=0.0012; right: p=0.0041). Surprisingly, a small cluster in the right NCM displayed, in addition, a significant repeated-measures correlation. This indicates that when individual birds improved their performance, FA increased accordingly in the right NCM.

In contrast, the CM presented no between-subject correlations at any age. This suggests that birds that sing a better copy of the tutor song do not typically exhibit a smaller or larger volume of this specific part of the CM. However, individual improvements in song learning resulted in a lower local volume of the CM (left: p=0.0126 rmcorr = −0.391; right: p=0.0075 rmcorr = −0.416; Figure 4D). The VP, on the other hand, presented significant repeated-measures correlations between song similarity and local volume or FA (log mwj: p=0.0057 rmcorr = −0.429; FA: p=0.0010 rmcorr = 0.496; Figure 4B). This suggests that when individual birds learn to produce a more accurate copy of the tutor song, the VP scales down (low mwj) and obtains a more structured organisation (FA). Furthermore, local volume and FA correlate significantly with song similarity in the VP during the sensorimotor phase at 65 dph (log mwj: p=0.0006 Spearman ρ = −0.7978; FA: p=0.0004 Spearman ρ = 0.8154); however, only the correlation between song similarity and FA is maintained until after song crystallisation (p=0.0008 Spearman ρ = 0.7890). The results of the correlation analyses are summarised in Table 1 and Supplementary file 6, 7.

Together, these findings suggest that birds that sing a better copy of the tutor song have higher FA values in NCM, the VP and the tFA, both during the sensorimotor phase as well as after song crystallisation. Furthermore, learning-related individual advances in producing a more accurate acoustic copy of the tutor song correlate with local tissue structure in the caudal mesopallium and VP. Overall, higher song similarity is related to a smaller volume of the VP and CM or a higher FA (more ordered structure) in the VP, tFA and NCM. Higher FA might refer to a more accurate alignment of fibres or increased myelination in the VP and tFA (Beaulieu, 2002), while in grey matter-like structures such as NCM, higher FA values might allude to changes in cell morphology (spines and dendrite branching), alignment or density, etc. (Zatorre et al., 2012).

Song learning accuracy does not trace back to the song control system

The song control system is known to be crucial during the song learning process and its constituents are known to change in volume and in microstructural tissue properties during the first four months of post-hatch life (Figure 1E and Nixdorf-Bergweiler (1996) Journal of Comparative Neurology). Ample studies have shown that lesioning components of this circuitry during the song learning phase will prevent the pupils from copying the tutor song (Scharff and Nottebohm, 1991; Brainard and Doupe, 2001). Intriguingly, none of the clusters observed in the current study, when correlating between song performance and brain structure, overlapped with any component of the song control system. Previous studies of the authors, however, have shown that the MRI methods used in this study are sensitive enough to detect correlations between song performance and brain structure in the song control system nuclei for example HVC (Hamaide et al., 2018b; Orije et al., 2020). To better understand this lack of clusters co-localised with the song control system nuclei in this study, we delineated the song control nuclei based on DTI maps and extracted the structural properties (MRI parameters) of these regions-of-interest, that is HVC, Area X, LMAN and RA. Then, we tested for correlations between the structural readouts and song similarity. No correlations could be observed (Figure 5). Even when including the regions which changed significantly during ontogeny in male zebra finches, that is surrounding Area X and RA (Figure 1B,D) in the correlation analysis, no association between brain structure and song similarity could be identified (Figure 5C,F). This shows that although the song control system might than be responsible for enabling song learning, the ultimate song performance level (% similarity with tutor song) is determined by a complex sets of circuits that synapse onto the song control system.

Repeated measures correlation showing no correlation between % song similarity and Fractional Anisotropy (FA) of song control system regions.

HVC (A), Area X (B), LMAN (D) and RA (E) and the surroundings of song control system nuclei Area X (C) and RA (F) as defined by the clusters derived from interaction age*sex shown in Figure 1B,D. The individual data points of the graphs are colour-coded according to bird-identity (i.e. one colour = one bird). The average within-bird correlation is presented by the coloured lines, while the dashed black line indicates the overall association between song similarity and FA, disregarding bird-identity or age. ‘r’ is the repeated-measures correlation (rmcorr) coefficient.

Microstructural characteristics of NCM as defined by FA can predict future good or bad learning outcome even before pupils engage in vocal practicing in the sensory learning phase

The Spearman correlation analyses uncovered that FA values in the VP, NCM and tFA present a clear between-subject correlation with song learning accuracy. This suggests that in the sensorimotor phase good or bad learners are characterised by a distinct structural MRI parameter readout in these regions. Next, based on the scans of the same birds acquired during the sensory (20, 30 dph) and early sensorimotor (40 dph) phases, we evaluated whether similar signs of future good or bad song copying outcome would already be visible in the structural properties of these regions in the early sensorimotor phase, or even before sensorimotor practicing during the sensory learning phase when birds memorise the tutor song but are not yet fully engaged in trial-and-error vocal practicing (Brainard and Doupe, 2013).

To this end, we divided the group of male birds into ‘good’ and ‘bad’ learners based on the overall song performance obtained at 65–200 dph. More specifically, good learners (n = 7) always sung acoustically accurate copies (>65–68% song similarity to tutor song), while bad learners (n = 5) never produced a copy better than 65–68% similarity to tutor song. Birds that traversed the 65–68% interval throughout the study (n = 2) were assigned to the ‘bad learners’ group as at the end of the song learning phase they reached similarity scores < 68% (Figure 6A). Next, we tested for an interaction between age (20-30-40 dph) and future learning accuracy (good, bad) in the cluster-based ROIs (Figure 6—figure supplement 1). None of cluster-based ROIs survived FDR correction for multiple comparisons when testing for an interaction between age and future learning accuracy. Interestingly, FA in the left NCM displayed a significant main effect of learning accuracy (good versus bad) already at the ages 20–40 dph (p<0.0001 F(1, 12.3)=39.2690; Figure 6B). Furthermore, FA in left NCM at 20 dph was positively correlated (p=0.01, ρ = 0.662) to the % song similarity at 200 dph (Figure 6—figure supplement 2). To our surprise, future good learners consistently demonstrated higher FA values in left NCM compared to bad learners already at the earliest stages of the sensory learning phase. Thus, this structural signature is already present much before the juvenile finches engage in vocal practicing, when the pupils have been exposed to their tutor for several days (Immelmann, 1969). This result was specific to the left NCM as none of the other regions identified in the voxel-based correlation analysis showed a similar predictive relationship (Figure 6C, Figure 6—figure supplements 1 and 2).

Figure 6 with 2 supplements see all
Fractional anisotropy in left NCM predicts future song learning accuracy.

Graph A presents the learning curve of the good (green; n = 7) and bad (red; n = 7) learning birds from 65 to 200 dph. Details on the distinction between good and bad learners can be found in the Results section and in Supplementary file 2. Graphs B and C present the difference of FA in NCM between good (green) and bad (red) vocal learners during the sensory (20–30 dph) and early sensorimotor (40 dph) phase in, respectively, the left (B) and right (C) hemisphere. Each line represents repeated measures obtained from one bird. The * indicates a significant main effect of future learning accuracy (good versus bad learners) for FA in left NCM (mixed model: p<0.0001 F(1,12.3)=39.2690).

Discussion

We employed hypothesis-free data-driven brain-wide in vivo structural MRI tools and used song similarity to tutor song as a proxy for vocal learning accuracy, starting from the advanced sensorimotor phase up to post-crystallisation song refinement. This unbiased approach led to the observation that the structural properties of the secondary auditory cortices, that is left NCM, the CM, the VP, and –unexpectedly– the tFA correlate with imitation accuracy. Furthermore, between- and within- subject correlation analyses revealed that the structural properties of left NCM and the tFA are mainly caused by between-subject variation in performance and structure, while the structural architecture of the CM and VP appears to change along the sensorimotor learning process, in individual birds. Importantly, we also demonstrated that the structural properties of left NCM during the sensory phase (i.e. when birds establish a memory of the tutor song but have not yet initiated sensorimotor vocal practicing) were different in birds that in the future would become good or bad song learners, as such predicting this quality. Overall, the present findings (i) add a new dimension to previously published data as we provide clear evidence of relationships between performance levels and the structural properties of four specific areas, (ii) identify a novel not-yet-explored brain area (tFA) in the context of song learning which deserves in-depth investigation in future studies and (iii) uncover that future performance levels can be predicted based on the structural properties of a secondary auditory region at the earliest stages of song learning.

The observation that the structural properties of NCM are predictive of future song copying accuracy already at 20 dph is consistent with a previously established functional role of NCM in establishing a memory of the tutor song during the sensory phase (Bolhuis and Gahr, 2006; London and Clayton, 2008; Hahnloser and Kotowicz, 2010; Ahmadiantehrani and London, 2017). Furthermore, tutor-song evoked immediate early gene (IEG) expression levels appear stronger in the left compared to the right NCM and song similarity correlates with the extent of lateralised IEG expression (Moorman et al., 2012). Intriguingly, our observation that mainly the left rather than the right NCM presents a correlation between tissue microstructure and tutor song imitation accuracy (Figure 3E–H and Figure 6B–C) is reminiscent of leftward lateralisation of speech and language processing in humans (Bishop, 2013), and is consistent with a recently discovered asymmetry in tissue microstructure in the planum temporale (related to auditory speech processing) in human subjects (Ocklenburg et al., 2018), likewise measured by diffusion MRI.

It is noteworthy that the quality of the song learning process (similarity between pupil and tutor song) can only be assessed when juvenile birds engage in vocal practicing and thus it is always a product of both sensory and sensorimotor learning. Benefiting from the non-invasive nature of MRI, however, we were able to opt for a longitudinal study design where we followed birds for a total period of 6 months, from very early on, that is before birds engage in vocal practicing at 20 dph, up to 200 dph when they sing a fully crystallised song. As such, we were able ‘to go back in time’ and to relate song learning accuracy levels obtained at 65 dph (and older) to the structural properties of specific brain regions of the same birds at 20 dph. Importantly however, we should note that the study design employed in this study does not allow distinguishing between the implications of innate learning bias (innate properties of the pupil) and social enhancers of the tutor (social enhancers that promote learning in pupils). Such tests require a carefully balanced/controlled study design where genetic brothers are raised in different conditions: (1) by its biological father (tutor = bio father), (2) by foster fathers (one foster father (tutor) per genetic brother). Given that the behaviour of the tutor and social interactions between the tutor and the juvenile males have been shown to be important influencers of the juveniles’ song learning performance (Chen et al., 2016), the rearing conditions and tutor exposure should be carefully controlled for. A yoked experimental design similar to what was used by Chen et al. (2016) could help understand the effect of social interaction versus innate learning bias (auditory experience with limited visual and physical interactions). Furthermore, a recent study has found that the interactions between juvenile males and their (foster) mother also have important effects on the juvenile males’ song maturation/performance (Carouso-Peck and Goldstein, 2019). Therefore, additional conditions that control for effects of learning by social enhancement of adult females should be incorporated as well. Lastly, delaying tutor exposure to after the first measurement (e.g. first (MRI) measure at 30 dph and introduction to tutor at 31 dph), can help differentiate between innate learning bias and social enhancement of vocal learning.

It is generally known that in normal rearing conditions song learning accuracy improves with age. We also clearly observed this effect of learning over time in the current study (Figure 2). The current study design does not allow to perfectly quantify to what extent the correlations observed in the voxel-based clusters are driven by age or by general brain developmental processes. However, based on several observations, we speculate that the relationships observed in this study (Figures 34) are mainly reflecting brain-behaviour relationships related to learning accuracy (the latter of which also improves with age) rather than age-related brain changes (that accidently connect to improvements in learning). Firstly, a previous study by our lab investigated brain development in (juvenile) zebra finches (Hamaide et al., 2018a). This study clearly shows that most changes in brain volume occur relatively early (before 65 dph), and that the changes affect large portions of the brains. Furthermore, the same study shows that relatively large, widespread brain areas decrease in volume from 65 dph to 200 dph (the same time frame as this study). The clusters detected in the current study are much smaller and may perhaps overlap with only a small fraction of these large clusters. If the correlations would mainly be driven by ‘aging’ or brain development, we would expect a similar profile of clusters as those that were found in the overall effect of time (Figure 1B) and in the brain development study (Hamaide et al., 2018a). Secondly, Figure 2B clearly indicates that not all birds follow a similar learning curve. That is, for some birds, song similarity increases more between 65 and 90 dph, while other pupils show the steepest increase in performance level between 90 and 200 dph. Moreover, not all birds reach similar levels of song learning accuracy. This observation indicates that there is an important source between-subject variance and that age or time only cannot sufficient explain performance levels. Based on these observations, we performed Spearman correlation analyses based on the cluster-based ROI data obtained selectively at one specific age, that is 65 dph or 200 dph, to ensure that no time-related effects (brain development) could obscure the outcome. These analyses clearly informed that between-bird variance in song learning accuracy drive the correlations detected by the voxel-based analyses. Thirdly, based on literature, it is known that the song control system nuclei change in volume with age (Nixdorf-Bergweiler, 1006 Journal of Comparative Neurology) and are involved in the song learning process. As clearly stated above, we do not find any relationship between song learning accuracy and the structural properties of the brain in the song control system or its immediate surroundings. Based on these observations, we conclude that even though we cannot fully remove age-effects, we strongly believe that the current findings are mainly driven by correlations between performance levels and the structural characteristics of the brain rather than purely brain development effects.

While the structural properties of left NCM already appear to differentiate between future good and bad performing birds in the sensory phase, the local volume and/or intrinsic tissue properties of the VP and tFA are not different between birds with a differential future learning outcome. Furthermore, the local volume of the CM and VP decreases when birds progressively become better at singing the tutor song. These findings suggest that the structural properties of the CM, VP and tFA change along the sensorimotor phase. Sensorimotor song refinement requires mechanisms that link motor commands with the associated sensory feedback such that it enables to detect and correct errors in own performance (Murphy et al., 2017). Interestingly, recent evidence appoints an important role to respectively the secondary auditory area CM, the VP and dopaminergic midbrain nuclei that synapse onto the vocal basal ganglia in sensorimotor learning (Keller and Hahnloser, 2009; Hisey et al., 2018; Chen et al., 2019). More specifically, the CM is reciprocally connected with NCM (Vates et al., 1996) and presents clear song-selective responses (Bauer et al., 2008). A specific sub-field within the CM, the nucleus Avalanche (Av) exhibits singing-driven IEG expression that correlates with the amount of singing in normally hearing and in deafened birds (Jarvis and Nottebohm, 1997). Av-projecting HVC neurons convey premotor signals to the Av and genetic ablation of HVCAv neurons after sensory learning significantly impairs sensorimotor learning in juvenile male zebra finches (Roberts et al., 2017). In sum, its song selective neural responses (Bauer et al., 2008), its reciprocal connectivity with NCM (Vates et al., 1996), and premotor input from HVC (Roberts et al., 2017) that might participate in the generation of error-detection signals (Keller and Hahnloser, 2009), set the CM as prime target capable of comparing own song performance towards pre-set performance goals such as the memorised tutor song. Our findings complement these previous reports by showing that besides the functional properties, also the structural features of the CM change at par with performance throughout the sensorimotor learning phase.

Gale et al. (2008) identified that the VP carries (i) neuronal projections from the arcopallium to dopaminergic midbrain nuclei –a pathway important in error-detection and correction mechanisms necessary for sensorimotor learning (Mandelblat-Cerf et al., 2014)– and (ii) axonal collaterals of the DLM-projecting Area X cells –that make up the basal ganglia-thalamic component of the song control system– project to the VP where they synapse onto VTA/SNc projecting neurons (Gale and Perkel, 2010). Using optogenetic neuromodulation, a recent study found mechanistic evidence that these dopaminergic Area X projecting VTA neurons are indispensable for sensorimotor song learning in ontogeny (Hisey et al., 2018). Furthermore, VP neurons signal performance error during singing and lesioning VP in juvenile male zebra finches significantly impairs song learning (Chen et al., 2019). Besides carrying dopaminergic projections, the VP contains cholinergic neurons as well. Neurons located in the VP send cholinergic projections to two cortical regions responsible for the motor aspects of singing, that is HVC and RA (Li and Sakaguchi, 1997), and are capable of suppressing HVCs’ neural responses to birds’ own song by manipulation of the cholinergic projection neurons originating from the VP (Shea and Margoliash, 2003). In sum, the VP appears an important integration centre, where several pathways converge and perhaps form a closed loop system where dopaminergic midbrain nuclei can affect the basal ganglia to affect song output and vice versa, or cholinergic projections affect premotor cortical nuclei HVC and RA, based on error-signals originating from upstream auditory cortices (Mandelblat-Cerf et al., 2014). Our findings clearly complement functional studies by showing that the volume and most probably the organisation of fibres of passage becomes rearranged when birds achieve higher song copying accuracy levels.

MRI studies assessing brain-behaviour relationships along training programs to master complex motor skills have observed bi-directional neuroplastic changes (Gryga et al., 2012), that is, certain brain regions expand while others contract in response to training. Furthermore, depending on the training intensity and the timing of investigation, distinct parameter readouts can be obtained as also different neuroplastic mechanisms might be at play (Sampaio-Baptista et al., 2014). We speculate that continued improvements in song similarity as a form of vocal motor practicing might evolve towards an optimised and ‘automatic performance’ where redundant circuitries are pruned to facilitate optimal performance. In mammals, extended training leads to decreased number of task-activated neurons in the sensorimotor striatum, when proceeding from initial phases of novel skill learning to habitual performance of the task (Ashby et al., 2010). Also in birds, the vocal basal ganglia circuitry appears more important for initial song learning but functionally disengages with producing well-learned songs after song crystallisation (Doupe et al., 2005). The DTI metrics, on the other hand, provide an indirect estimate of the underlying tissue microstructure based on quantifying the average diffusion properties of water protons in a voxel. More specifically, FA quantifies the directional dependence of water diffusion (Beaulieu, 2002). As a result, alterations to FA are notoriously biologically unspecific as they can be caused by a wide variety of microstructural tissue re-organisations including altered axonal integrity, myelination, axon diameter and density, change in cellular morphology, etc. (Beaulieu, 2002; Zatorre et al., 2012; Dyrby et al., 2018). Moreover, the biological underpinnings responsible for the MRI readout are most probably always reflecting different processes happening in concert, in a coordinated way involving various different cell types. To unambiguously pinpoint the biological mechanisms responsible for the observed structural difference between good and bad learners, additional studies at the cellular and molecular level are required.

Our data-driven brain-wide analyses also pinpointed a fourth structure, which has not yet been investigated thoroughly in the context of zebra finch song learning, that is a cluster co-localised with the tFA. The tFA carries projections from the basorostral nucleus in the rostral forebrain to the lateral arcopallium and caudolateral nidopallium. More specifically, the basorostral nucleus sends somatosensory and auditory information to the lateral arcopallium (Wild and Farabaugh, 1996), which in turn projects to jaw premotor neurons and vocal and respiratory effectors (Wild and Krützfeldt, 2012). Also in humans, somatosensory information from facial skin and muscles of the vocal tract is vital for proper perception and production of speech (Tremblay et al., 2003; Ito et al., 2009). Alternatively, these somatosensory and auditory descending projections may serve to control beak movements that modulate gape size during singing, as gape size can affect the acoustic properties of individual syllables (Hoese et al., 2000). Taken together, even though this tract is not considered part of the traditional song control system, it might carry neuronal projections that are necessary for proper adjustment of vocalisations and be strengthened by vocal practicing. The precise role of the basorostral nucleus and the tFA in sensorimotor song learning needs to be further investigated in future studies that employ tools capable of dissection the observed relationship between brain structure and song copying accuracy level up to the mechanistic causal level.

In conclusion, the present findings clearly illustrate that as pupils produce more accurate copies of the tutor song, the structural properties of the CM, VP and tFA change. Most intriguingly, however, the final song performance relates to the microstructural tissue properties of the secondary auditory area NCM already at the early start of the sensory learning phase. Overall, our findings together with several parallel findings in humans, open new avenues in understanding brain-behaviour relationships related to speech acquisition and performance, both of which are imperative for successful communication, and again underscore the importance of early life rearing conditions and role models in defining future proficiency levels of complex multi-sensory learning processes.

Materials and methods

Key resources table
Reagent type
(species) or
resource
DesignationSource or
reference
IdentifiersAdditional
information
Strain, strain background (Taeniopygia guttata, Male)OtherLocal breeding program
Chemical compound, drugisofluraneAbbott, Illinois, USA
Software, algorithmStatistical Parametric MappingSPMRRID:SCR_007037
AMIRAAmiraRRID:SCR_007353
MatlabMatlabRRID:SCR_001622
Sound Analysis ProSAPRRID:SCR_016003

Animals and ethics statement

Request a detailed protocol

Male (n = 16) and female (n = 19) zebra finches (Taeniopyiga guttata; only males sing), bred in the local animal facility and were housed in individual cages together with an adult male (tutor), an adult female and one or two other juvenile zebra finches. At around 10 dph the juvenile birds were randomly assigned to an adult couple. This way some birds were co-housed with their biological parents, while others were raised by foster parents (see Supplementary file 12). Each cage was shielded from its neighbouring cages so that the juvenile birds could hear all other birds of the room (6–12 other tutors and many other juveniles), but could interact (visual and auditory) with only one adult male bird. Research has shown that the juvenile birds will prefer to copy the song of the adult male bird with whom they can interact with (Eales, 1989). The ambient room temperature and humidity was controlled, the light-dark cycle was kept constant at 12 h-12h, and food and water was available ad libitum at all times. In addition, from the initiation of the breeding program until the juvenile birds reached the age of 30 days post hatching (dph) egg food was provided as well. The Committee on Animal Care and Use at the University of Antwerp (Belgium) approved all experimental procedures (permit number 2012–43 and 2016–05) and all efforts were made to minimise animal suffering.

Data statement

Request a detailed protocol

All data acquired and processed in this study are available online (DOI https://doi.org/10.5061/dryad.mkkwh70vj).

Study design

Request a detailed protocol

We obtained MRI data of each bird during the sensory phase (20 and 30 dph), sensorimotor phase (40 and 65 dph), crystallisation phase (90 and 120 dph) and one last time point well beyond the critical period for song learning (200 dph; Figure 1A). Each imaging session, we collected a 3D anatomical scan and Diffusion Tensor Imaging (DTI) data to evaluate respectively gross neuro-anatomy (volumetric analyses) and alterations to white matter tracts or intrinsic tissue properties. Starting from the advanced sensorimotor phase (i.e. 65 dph), we recorded the songs sung by the juvenile males the first day after each imaging session.

Song recordings and analyses

Request a detailed protocol

To quantitatively evaluate the progression of sensorimotor learning and song refinement in male birds, we analysed the first 20 (undirected) songs sung in the morning after ‘lights on’ in Sound Analysis Pro (SAP[Tchernichovski et al., 2000]; http://soundanalysispro.com/). The undirected songs sung by the juvenile and adult male zebra finches and tutors were recorded in custom-build sound attenuation chambers equipped with the automated song detection setup implemented in SAP. All song analyses were performed off-line and calls and introductory notes were omitted from all analyses. First, the motif length (ms) of the first 20 songs sung during the morning (starting from the initiation of the photophase) was measured after which each individual motif was manually segmented into its different syllables based on sharp changes in amplitude and frequency. The latter measure was chosen to avert inconsistent determination of the syllable ending caused by more silent singing towards the last part of the syllable. Second, several acoustic features that reflect the spectro-temporal structure of individual syllables were quantified, that is (1) pitch-related measures that inform on the perceived tone of sounds (including pitch, mean frequency, peak frequency and goodness of pitch), (2) Wiener entropy that quantifies the tonality of sounds and is expressed on a logarithmic scale where white noise approaches ‘0’ and pure tones are characterised by large, negative Wiener entropy values, (3) syllable and inter-syllable interval duration. Furthermore, to evaluate syllable feature variability over the different ages, the standard deviation, as an estimate for vocal variability (Scharff and Nottebohm, 1991), was defined for each acoustic property. Next, similarity to tutor song was measured between song motifs using an automated procedure in SAP that quantifies the acoustic similarity between two songs based on pitch, FM, AM, goodness of pitch and Wiener entropy (Tchernichovski et al., 2000). Song similarity was calculated using the default settings of SAP (asymmetric comparisons of mean values, minimum duration 10 ms, 10 × 10 comparisons), and % similarity was used for statistical testing. Further, according to the method conceptualised by Scharff and Nottebohm, motif sequence stereotypy was computed, based on visual assessment of sequence consistency and linearity (Scharff and Nottebohm, 1991). Sequence linearity reflects how consistent notes are ordered within the song motif by counting the different transition types of each syllable of the motif. Sequence consistency quantifies how often a particular syllable sequence occurs over different renditions of a specific motif. Song sequence stereotypy is defined as the average of sequence linearity and sequence consistency.

MRI data acquisition

Request a detailed protocol

All MRI data were acquired on 7 T horizontal MR system (PharmaScan, 70/16 US, Bruker BioSpin GmbH, Germany) and a gradient insert (maximal strength: 400 mT/m; Bruker BioSpin, Germany), combined with a quadrature transmit volume coil, linear array receive coil designed for mice, following a previously described protocol (Hamaide et al., 2017). First, the zebra finches were anaesthetised with isoflurane (IsoFlo, Abbott, IL; induction: 2.0–2.5%; maintenance: 1.4–1.6%). While anaesthetised, the physiological condition of the birds was monitored closely by means of a pressure sensitive pad placed under the chest of the bird to detect the breathing rate, and a cloacal thermistor probe connected to a warm air feedback system to maintain the birds’ body temperature within narrow physiological ranges (40.0 ± 0.2) °C (MR-compatible Small Animal Monitoring and Gating system, SA Instruments, Inc). Next, we collected DTI data using a four-shot spin echo (SE) echo planar imaging pulse sequence with the following parameters: TE 22 ms, TR 7000 ms, FOV (20 × 15) mm2, acquisition matrix (105 × 79), in-plane resolution (0.19 × 0.19) mm2, slice thickness 0.24 mm, b-value 670 s/mm², diffusion gradient duration (δ) 4 ms, diffusion gradient separation (Δ) 12 ms, 60 unique non-collinear diffusion gradient directions and 21 non-diffusion-weighted (b0) volumes. The entire DTI protocol was repeated twice to increase the SNR (total DTI scanning duration: 72 min). The field-of-view included the telencephalon and diencephalon, which contain the auditory system and brain areas implicated in song control, the cerebellum and parts of the mesencephalon. Last, we collected a T2-weigthed 3D Rapid Acquisition with Relaxation Enhancement (RARE) dataset with these settings: TE 55 ms, TR 2500 ms, RARE factor 8, FOV (18 × 16×10) mm3, matrix (256 × 92×64) zero-filled to (256 × 228×142), spatial resolution (0.07 × 0.17×0.16) mm3 zero-filled to (0.07 × 0.07×0.07) mm3, scan duration 29 min. The FOV of the 3D scan encapsulated the entire zebra finch brain. The entire scanning protocol took no longer than 2.5 hr. All animals recovered uneventfully after discontinuation of the anaesthesia.

MRI data processing

Request a detailed protocol

We processed both DTI and 3D RARE scans for voxel-based analyses following in house established protocols (Hamaide et al., 2017; Hamaide et al., 2018a; Hamaide et al., 2018b), and using the following software packages: Amira (v5.4.0, FEI; https://www.fei.com/software/amira-3d-for-life-sciences/), ANTs (Advanced Normalization tools; (Avants et al., 2011); http://stnava.github.io/ANTs/), FSL (FMRIB Software Library; (Jenkinson et al., 2012); https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FSL), and SPM12 (Statistical Parametric Mapping, version r 6225, Wellcome Trust Centre for Neuroimaging, London, UK, http://www.fil.ion.ucl.ac.uk/spm/) equipped with the Diffusion II toolbox (https://sourceforge.net/projects/diffusion.spmtools.p/) and DARTEL tools (Ashburner, 2007).

Deformation-based morphometry

Request a detailed protocol

First, we masked the individual 3D RARE scans (Amira 5.4.0) so that the datasets only include brain tissue. Second, we used the serial longitudinal registration (SLR) toolbox embedded in SPM12 to create one average ‘within-subject’ 3D dataset for each animal based on the individual masked 3D RARE scans acquired at the different ages (Ashburner and Ridgway, 2012). The SLR generated an average ‘within-subject’ 3D (‘midpoint average’) and jacobian determinant (j) maps. The latter maps are derived from the deformation field that contains the spatial transformation that characterises the warp between the midpoint average and each individual time point image, and encode for each voxel the relative volume at a specific age with respect to the midpoint average. Next, the midpoint averages of all animals were inputted in the ‘buildtemplateparallel’ function of the Advanced Normalization Tools (ANTs Avants et al., 2011) to create a between-subject population-based template. We used this initial ‘between-subject’ template to extract tissue probability maps reflecting mainly grey matter, white matter, or cerebrospinal fluid using the FMRIB Automated Segmentation Tool (FAST Zhang et al., 2001) embedded in FSL. The three probability maps created in this step were used as tissue class priors for segmenting all individual midpoint averages using the default settings of the (old)segment batch in SPM12. The resulting tissue segments of the midpoint averages were used to create a segment-based template in Diffeomorphic Anatomical Registration through Exponentiated Lie Algebra (DARTEL) (Ashburner, 2007). Next, the ANTs-based T2-weighted between-subject template was warped via the ‘DARTEL: existing template’ batch to spatially match with the segment template and this template was used as anatomical reference space for all voxel-based analyses (referred to as ‘population-based template’). The jacobian determinant maps were warped to the reference space using the flow fields produced by DARTEL (with modulation to preserve relative volume differences existing between different subjects). Lastly, the warped modulated jacobian determinant maps were log-transformed to convert exponential growth patterns to linear patterns (Ashburner and Ridgway, 2015) and smoothed using a Gaussian kernel with FWHM (0.14 × 0.14×0.14) mm2.

Diffusion tensor imaging

Request a detailed protocol

The DTI data were pre-processed in the Diffusion II toolbox embedded in SPM12. First, we realigned the DTI volumes to correct for subject motion following a two-step procedure: an initial estimation based exclusively on the b0 images was followed by a linear registration including all (b0 and diffusion-weighted) volumes. Second, we co-registered the realigned DTI volumes to the individual 3D dataset acquired at the same age using normalised mutual information as objective function for inter-modal within-subject registration. In parallel, each individual masked 3D RARE dataset was bias corrected and spatially normalised to the population-based template using a 12-parameter affine global transformation followed by nonlinear deformations. These estimated spatial normalisation parameters were applied to the co-registered DTI volumes so that the DTI data spatially matched the population-based template space. During this writing step, the diffusion data were upsampled to an isotropic resolution of 0.19 µm. In parallel, the diffusion vectors were adapted to account for potential (linear) rotations incurred by the realignment, co-registration and normalisation procedures using the ‘copy and reorient diffusion information’ tool of the Diffusion II toolbox. Then, the diffusion tensor model was applied to the diffusion-weighted and b0-data to estimate the diffusion tensor and Eigenvalues (λ1, λ2, λ3). The Eigenvalues represent the principle axes of the radii of the 3D diffusion ellipsoid. Based on the Eigenvalues, the Fractional Anisotropy (FA) maps were computed. FA is scaled between 0 and 1; where 0 refers to isotropic and 1 anisotropic diffusion properties. Typically, one expects high FA values in white matter regions that contain many coherently organised myelinated fibre tracts. The FA maps of male zebra finches 200 dph were averaged together to create an average FA map using the image calculator of SPM (e.g. third and sixth panel in Figure 1E).

Several quality controls were performed throughout the data acquisition and processing procedures. Those included a visual inspection for ghosting, excessive movement, and spatial correspondence of registered images to the reference space (after co-registration and spatial normalisation procedures). Last, the DTI parameter maps were smoothed using a Gaussian kernel with FWHM of (0.38 × 0.38×0.38) mm3.

Statistical analyses: song

Request a detailed protocol

To analyse whether the song parameters change from 65 to 200 dph, we set up mixed-effect models including age as fixed effect, subject as random effect, subject*age as random slope and –only for the syllable level– syllable identity as random effect nested within subject. Furthermore, to determine whether song % similarity was dependent on the tutor by which the birds were raised, a mixed-effect model was performed with ‘tutor identity’ as fixed effect and ‘bird identity’ as random effect. We used the restricted maximum likelihood method to fit the data and assessed significance using F-tests with Kenward-Roger approximation. If a significant main effect could be observed for any of the song features examined, Tukey’s HSD (Honest Significant Difference) post hoc tests were performed to situate when in time actual changes occur.

Statistical analyses: MRI

Request a detailed protocol

Even though the MRI data were thoroughly checked at several stages in the pre-processing, an additional quality control was performed based on outlier detection. This analysis identified four outliers which, upon visual inspection of the datasets, appeared in two out of four to be driven by excessively large ventricles at the 20 dph time points and appeared normal thereafter. The other two animals showed an abnormal cerebellar folding patterns (leading to suboptimal subsequent image registration) and were therefore excluded from all analyses. In addition, one animal had one missing DTI scan, leaving 30 zebra finches (14 males and 16 females) for voxel-based statistical testing of interactions and main effects of DTI parameter maps. Furthermore, technical issues with the recordings of one tutor caused that we were not able to quantify song similarity to tutor song of two juvenile birds (one being the MRI-based outlier). Other technical issues lead to the loss of song recordings of two juvenile birds at 90 dph. This leaves 54 data points for voxel-based statistical correlation analyses (12 juveniles with 4 time points and 2 juveniles with 3 time points).

We have used a two-step approach to analyse the MRI datasets. First, instead of deciding where to look by manually drawing ROIs (of for example the song control and auditory nuclei), we used data-driven image analysis techniques that are capable of localising the specific brain sites where a brain-behaviour relationship exists. Therefore, we performed brain-wide voxel-based statistical analyses to identify (1) brain regions where sex differences in local tissue properties originate throughout the song learning process, (2) which brain sites mature between 20 and 200 dph and (3) which brain sites exhibit a significant relationship between performance (similarity) and the structural properties of the brain (DTI or local volume). Second, after establishing where in the brain these relationships exist, we aimed at better understanding the nature of the outcome of the voxel-based correlational analyses. Therefore, we extracted the average DTI or DBM parameter value for each cluster using ‘cluster-based ROIs’ and used those average parameter values to perform repeated-measures correlation analyses (rmcorr) and to create visual representations of the ‘nature’ of the correlation. The outcome of the DBM analysis was published in Hamaide et al. (2018a), while the male data were used in this study to correlate with the song outcome of the same birds.

Voxel-based repeated-measures ANOVA: analysis of interactions (age*sex) and main effects

Request a detailed protocol

All voxel-wise statistical tests were executed in SPM12. A repeated-measures ANOVA was performed on smoothed DTI parameter maps, including ‘subject’ as random factor, and ‘sex’ and ‘age’ as fixed factors. This design allowed for testing within-subject effects including interactions (age*sex) and main effects (age). Unless explicitly stated, only clusters that survived a family-wise error (FWE) correction thresholded at pFWE <0.05 combined with a minimal cluster size (kE) of at least 5 voxels for DTI analyses, were considered significant. All statistical maps are displayed overlaid onto the population-based template.

A similar analysis, performed on the smoothed modulated jacobian determinant maps, was published previously Hamaide et al. (2018a). For a comprehensive overview of the results, we refer the reader to that paper.

Voxel-based statistical correlation analyses between structural MRI and song parameters

Request a detailed protocol

To explore potential relationships between robust measures of song performance and the structural properties of the songbird brain, voxel-based multiple regressions were set up between the smoothed MRI parameter maps (log mwj and FA maps) and % similarity or sequence stereotypy. This protocol is based on Hamaide et al. (2018b); Orije et al. (2020). When searching for correlations between local tissue volume and the song features, total brain volume was added to the statistical design as additional covariate. All statistical analyses were performed on the entire brain (brain tissue within FOV) and no thresholds were applied on the smoothed MRI parameter maps. Unless explicitly stated, we used the following two criteria to assess the significance of a cluster: (1) clusters should contain at least 5 or 20 contiguous voxels for respectively DTI and 3D RARE analyses (number of contiguous voxels is represented by kE) and (2) the ‘peak voxel’ (based on T values) of the cluster should survive a family-wise error (FWE) correction for multiple comparisons thresholded at pFWE <0.05 (Roiser et al., 2016). Only clusters where both criteria were satisfied were considered significant. These cluster sizes correspond to the following volumes: DTI: volume of cluster of 5 voxels is 0.04332 mm³ and 3D RARE: volume of cluster of 20 voxels is 0.03808 mm³. Based on Nixdorf-Bergweiler (1996, Journal of Comparative Neurology), these cluster-size thresholds (5 voxels for DTI; 20 voxels for 3D RARE) are small enough to detect differences even the smallest areas of the song control system nuclei. Furthermore, using the same acquisition protocols (identical voxel sizes), we have been able to detect structural differences between groups or over time in these small areas (DTI: Hamaide et al., 2017; DTI and 3D RARE: Hamaide et al. (2018b); 3D RARE: Hamaide et al., 2018a). To overcome concerns that statistical correlation analyses was done on some of the measures coming from the same individuals, we reanalysed the data with a more stringent Sandwich Estimator (SwE) toolbox (http://www.nisox.org/Software/SwE/) approach and a classic ROI-based analysis as outlined in the ‘response to reviewers’ (p3-10). We could detect the same regions, but they would not be considered significant if we would apply the same selection criteria for assessing the significance of a cluster as outlined above (Roiser et al., 2016). As the purpose was exploratory and validated with subsequent cluster-based ROI correlation analysis, we preserved the initial outcome conform the analysis of our earlier studies (Hamaide et al., 2018b; Orije et al., 2020).

In contrast with song similarity, no supra-threshold voxels could be observed between any combination of smoothed MRI parameter maps and song sequence stereotypy. Clusters detected by the voxel-based multiple regression analysis and voxel-based repeated measures ANOVA were converted to ROIs (termed ‘cluster-based ROIs’, conversion at puncorrected <0.001 kE ≥40 voxels) of which the mean DTI metrics or modulated log-transformed jacobian determinant were extracted for post hoc statistical testing. Extracting the previously identified clusters at a more liberal cluster makes them slightly larger, that is DTI: volume of cluster of 40 voxels is 0.34656 mm³; 3D RARE: volume of cluster of 40 voxels is 0.07616 mm³. This cluster-based ROI approach is identical to the methods used in our other studies for example (Hamaide et al., 2018a; Hamaide et al., 2018b; Anckaerts et al., 2019).

ROI-based analyses

Request a detailed protocol

The voxel-wise multiple regression in SPM does not allow including a random effect for bird identity. Consequently, by inserting repeated-measures data we violate the assumption of independency of measures. To correct for this potential confound, we performed two additional tests on the cluster-based ROI data derived from the voxel based multiple regression analysis and voxel based analysis of the interaction (FA: age*sex). Additionally, we delineated song control nuclei: HVC, LMAN, RA and Area X, from which we extracted the mean DTI metrics to examine whether there are correlations to song similarity within the song control system. Firstly, we employed a repeated-measures correlation analysis (Bakdash and Marusich, 2017) in Rstudio (version 1.1.383, Rstudio, Boston, MA; http://www.rstudio.com/). This latter test informs on the existence of consistent within-subject correlations between the two variables. Hence, this analysis informs on potential song learning-related structural changes in the brain. Secondly, Spearman’s ρ was calculated on data obtained at one specific age (65 dph, 200 dph) to test potential sources of between-bird variance in driving the correlations observed by the voxel-based analyses. An additional correlation analysis was run between the FA values at 20 dph and the song similarity at 200 dph, to determine whether the microstructural tissue properties early in life already relate to song learning proficiency later in life. Spearmans’ ρ was computed in JMP (Version 13, SAS Institute Inc, Cary, NC, 1989–2007).

To explore the possibility of predicting future song learning outcome (good or bad), we ran a mixed-effect model analyses on the cluster-based ROI parameters extracted from MRI data obtained at 20, 30 and 40 dph. Bird identity was added a random factor, and age (20, 30, 40 dph) and learning outcome (good, bad) were included as fixed effects in the model. The restricted Maximum Likelihood method was used to fit the data and significance was assessed using F-tests with the Kenward-Roger approximation.

All additional tests on the cluster-based ROIs were corrected for multiple comparisons using false discovery rate (FDR) based on the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995). In brief, this method ranks all p-values from smallest (rank i = 1) to largest (rank i = n). For each rank, the Benjamini-Hochberg critical value, (i/m)Q, is calculated. Where i is the rank, m is the total number of tests and Q is the false discovery rate set at 0.05. All p-values smaller than and including the largest p-value where p<(i/m)Q were considered significant. Overall, Benjamini-Hochberg is generally preferred method for multiple comparison correction since it controls for false positive discoveries, but also minimises false negatives (Jafari and Ansari-Pour, 2019).

The outcomes of all statistical tests performed in this study are summarised in the Supplementary file 1, 411.

Data availability

All figures are provided with the relevant source data. All data acquired and processed in this study are available online (DOI https://doi.org/10.5061/dryad.mkkwh70vj).

The following data sets were generated
    1. Hamaide J
    2. Orije J
    3. Lukacova K
    4. Keliris GA
    5. Verhoye M
    6. Van der Linden A
    (2020) Dryad Digital Repository
    Data from: In vivo assessment of the neural substrate linked with vocal imitation accuracy.
    https://doi.org/10.5061/dryad.mkkwh70vj

References

    1. Ashburner J
    2. Ridgway GR
    (2015)
    Brain Mapping
    383–394, Tensor-Based Morphometry A2 - Toga, Brain Mapping, Waltham, Academic Press.
    1. Hoese WJ
    2. Podos J
    3. Boetticher NC
    4. Nowicki S
    (2000)
    "Vocal tract function in birdsong production: experimental manipulation of beak movements."
    The Journal of Experimental Biology 203:1845–1855.
  1. Book
    1. Immelmann K
    (1969)
    Song development in the zebra finch and other estrildid finches
    In: Hinde RA, editors. Bird Vocalizations. Cambridge: Cambridge University Press. pp. 61–74.
  2. Book
    1. Nixdorf BE
    2. Bischof H-J
    (2007)
    A stereotaxic atlas of the brain of the zebra finch, Taeniopygia guttata, with special emphasis on telencephalic visual and song system nuclei in transverse and sagittal sections
    Bethesda (MD: National Library of Medicine (US), National Center for Biotechnology Information, Bethesda (MD).

Decision letter

  1. Tom Smulders
    Reviewing Editor; Newcastle University, United Kingdom
  2. Barbara G Shinn-Cunningham
    Senior Editor; Carnegie Mellon University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This is an intriguing paper that identifies novel brain areas involved in influencing the copying accuracy of birdsong. The paper identifies areas not previously linked to song learning, hence opening up new future investigations. Together with the extensive response to reviewers, this paper is an extensive resource for the birdsong community.

Decision letter after peer review:

Thank you for submitting your article "In vivo assessment of the neural substrate linked with vocal imitation accuracy" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a guest Reviewing Editor and Barbara Shinn-Cunningham as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This study represents a very interesting new approach in identifying brain areas involved in accurate copying of tutor songs by male zebra finches. Using a whole-brain data-driven structural MRI approach, the authors identify a number of brain areas that either change alongside the improvement in copying as the birds mature, or structurally predict how well a bird will copy the tutor's song on an inter-individual level. Interestingly, none of these structures are in the traditionally-identified song control circuitry (although many are in areas related to auditory processing).

All three reviewers thought this was an interesting paper, but all three would like to see some clarifications of particular points and/or some re-analyses to drive home the message even more strongly. Below, I list the revisions that are required to improve the paper for publication in eLife.

Essential revisions:

Most of the essential revisions relate to the way the data analysis was performed and/or how the data were presented and discussed. No additional experiments are required.

1) A bit more detail on the birds and their experiences would be welcome. Clearly, some of the birds were exposed to the same tutor. But were some of the juveniles the offspring of the same parents? It would be helpful to know to what extent this information could be used to control for innate learning biases. Also, we are not requesting additional experiments, but it would have helped interpret the early left NCM FA values and learning outcomes if the juveniles had not been exposed to a tutor until after 30 days, to separate effects of the tutor song from innate properties of the birds.

2) MRI analysis 1 – Repeated measures: The authors used a two-step approach: first they ignore the fact that some of their measures were coming from the same individuals to perform their statistical analyses at the voxel level, and then, based on these results, used a ROI-based approach to take into account the repeated measures. We think that this approach is flawed because the selection of the voxels to determine the ROI is inaccurate (since each data point was considered to correspond to one subject). The result section should not present results where the repeated measure aspect of the dataset is not taken into account (first section of Results). We are aware that SPM does not currently allow analysing longitudinal datasets where the number of measures is not the same for all the subjects (as this is the case here). It seems that the authors have two options: (1) either they discard the 2 subjects for which they only have 3 data points and use a within-subject design for balanced designs in SPM; (2) or, even better, keeping their 14 subjects, they use the SwE toolbox (http://www.nisox.org/Software/SwE/) that seems to be able to handle unbalanced longitudinal datasets. Mean centering should allow the distinguishing between within- and between-subject effects (cf Guillaume, Hua et al., 2014, NeuroImage, 94).

3) MRI analysis 2 – explaining for non-experts: The manuscript falls short for a general audience in detailing how various structures were identified and assigned significance. One issue relates to the requirement for clusters > 40 contiguous voxels. What is the diameter of a sphere containing that many voxels? And does this volume threshold exclude smaller song nuclei, such as HVC, LMAN, DLM, or Avalanche? Finally, it would help a general reader to report the scale of FA, rather than just reporting absolute values.

4) MRI analysis 3 – correcting for multiple comparisons: The authors need to choose how they want to correct for multiple comparison in their voxel-based approach (voxel wise or cluster wise). If the authors choose a cluster-based approach, they should justify the first p value threshold used to obtain the clusters (recent published recommendations about how to choose these thresholds should be followed and mentioned). If they choose a voxel-size approach, they should justify their minimum cluster size.

5) MRI analysis 4 – Positive controls: A potentially noteworthy feature of the current study is that the only significant anatomical changes were detected in regions outside of the classical song system. But numerous studies have shown that the structure of various song control nuclei (HVC, RA, Area X) changes markedly over the period in which these measurements were made (increasing in volume between 20 and 60 days, and increasing in myelination between 20 and 100 days, eg). Further, some early structural changes in the song system (spine density and dynamics in HVC) are correlated with copying outcome. I would be more confident in the current results if the authors could show that their method is sensitive to structural changes within the song system that are known to occur during development, even (or perhaps especially) if these changes are not correlated with song learning outcomes.

6) Relating MRI to song learning – age: Please clarify how age is controlled for or used in the analysis. Do the authors just go from the assumption (based on the data, maybe) that copying accuracy increases with age, and that the two variables are therefore inextricably confounded? Or is there a way to separate maturation (age) effects from changes related to copying accuracy? It will also make it easier for the reader to understand statements like "However, individual improvements in song learning resulted in a lower local volume of the CM (left: p=0.0126; right: p=0.0075; Figure 3D)", which now may be difficult for some readers to assess, because age is not visible in the figure.

7) Relating MRI to song learning – representing changes in copying accuracy; This study includes multiple measures, using Fractional Anisotropy in Figure 2, and local volume in Figure 3. In both figures we see correlations between song similarity and MRI measure, but we do not see the time course of song learning. Therefore, statistical claims such as correlation between improvements in song learning and a lower local volume of the CM cannot be judged visually from the data as currently presented. To address this, authors should present figures, similar to Figures 2 and 3 but instead of showing the similarity vs MRI, present the similarity gains vs. MRI. For example: for each bird, you present similarity (day 90) – similarity (day 65) vs. MRI on day 90. This will allow the reader to judge visually (and not only statistically) if any of the MRI measure correlates with learning. In addition, it would be nice to have a figure illustrating this statement (e.g., showing similarity gains vs. right NCM activation): "Surprisingly, a small cluster in the right NCM displayed, in addition, a significant repeated-measures correlation."

8) Relating MRI to song learning – dichotomizing copying accuracy: Regarding the correlation between MRI properties at days 20/40dph and learning accuracy, the authors should justify why they dichotomised song accuracy (good vs. bad learners) rather than simply taking the% of song similarity. Why don't the authors test whether MRI properties at day 20 (or 30) allow predicting vocal learning accuracy at day 200 (expressed as% of song similarity)?

9) Discussion – mechanisms: The authors are careful to note the lack of explanatory power in these correlative measurements, which is good. But they need to say more about how they think these developmental changes might relate to learning. What are we supposed to make with the finding that the FA changes over development when there is no link to what that value represents in the songbird's brain? Going back to an earlier point, it would help to see that this method can detect FA changes related to increased myelination of the song system, which is dramatic and presumably should generate a large signal. That said, the authors need to discuss in depth how such changes (decreased volume, increased FA) could be related to better learning.

10) Discussion – Novelty: Prior studies have shown correlations between NCM functional properties and song learning outcome, between CM functional properties and vocal error detection, and between VP and song copying. A strong feature of the current study is that it provides independent validation that these regions correlate with song copying, but given the earlier work, the current findings are not wholly novel even if they are useful contributions. On the other hand, the tFA result is entirely novel but what this fiber tract is needs to be more fully described. The authors are quick to link it to the projections from the basorostral nucleus, but we are uncertain whether such a precise assignment can be made with these methods. Is this distinct from other fiber tracts in this general region, including parts of the occipitomesencephalic tract? Showing some conventional histology of the tFA in relation to the MRI data would be helpful here. And the VP finding is quite timely, given the recently emerging evidence of the role of this structure in song learning. Further, VP is the only region that showed significant correlations in both FA and volume with learning. Perhaps the manuscript should highlight the tFA and VP findings more strongly, while casting the NCM and CM data are more confirmatory in nature, to emphasize novelty.

[Editors' note: the decision after resubmission follows.]

Thank you for resubmitting your article "in vivo assessment of the neural substrate linked with vocal imitation accuracy" for consideration by eLife. Your revised article has been evaluated by a guest Reviewing Editor and Barbara Shinn-Cunningham as the Senior Editor.

We really appreciate the time you have put into writing long, thought-out responses to each of the reviewers' comments.

However, looking at the revised manuscript, it looks like very little has changed. The reasons the reviewers make these constructive comments is not so that you can explain things to them, but so that you can change the manuscript in such a way that readers with similar questions to the reviewers would find their questions already answered in the manuscript. Thank you for the changes you have already made.

I am therefore requesting that you please incorporate the responses you have made to the reviewers into the revised manuscript. Once I receive this revised manuscript, I will send it out for a second review with regards to the technical side of the MRI protocols and analysis methods.

I will here summarize which changes still need to be made to the manuscript itself:

1) Thank you for adding the table to the Supplementary materials. Could you please also add a few sentences to the Discussion laying out what your data can and cannot distinguish between, and what the obvious next studies would be to work out those distinctions? You have done this in the rebuttal, so it should not be difficult to add a bit to the Discussion.

2) This is probably the most important change. Since you have run the analysis now in a more appropriate manner, we feel that you should replace the original analysis with the new analysis, not just add the new analysis as an addendum to the paper. If the new analysis changes the outcomes of the study, then the Results and Discussion should be changed accordingly.

3) Thanks for what you've already added. However, I think you misunderstood the main question asked by the (non-MRI specialist) reviewers here. They just wanted to know how big those clusters were in real life, and how this compares to known song structures. You have clearly done all the calculations for the rebuttal. Now please incorporate that information also into the manuscript.

4) Does adding "peak voxel" to that sentence clarify the reviewers' question? I am not expert in this area, so cannot judge this. I will assume for now that it does.

5) Please do add the additional song control structure data to the manuscript. Other readers, thinking the same as the reviewers, will appreciate that the method can detect changes over time, as should be the case in the song system, but these do not correlate with copying accuracy. You may also want to refer to your 2018 paper when discussing these extra data.

6) Every reader is going to wonder whether the correlation between changes in MRI signal (FA, etc) and song copying accuracy is just a side-effect of both changing with time. So you have to address this in the analysis. If it is, as you say, purely a question of brain behaviour correlations, and age does not mediate this relationship, then show that. If it turns out that age is a major mediator, and removes the correlation between brain and behaviour, then please discuss why the correlation does exist in some brain areas and not in others, which also change with age.

7) This point is related to point 6: by losing time, we don't know whether the main reason for the correlations is that both change over time in a similar way, or that the copying accuracy actually explains the "noise" in the trend over time. So it would be good if the authors could think of some way that allows readers to understand the distinction between parallel trends over time and (not-age-related) correlations between brain and behaviour.

8) Please do add Figure REB8 to the manuscript, wherever you see fit, and add reference to it in the Results.

9) Thank you for a good explanation. Please add some of it to the Discussion, so all readers can benefit from this insight.

10) Can you add something about the Hamaide et al., 2017, paper and how you have done your best to identify the tract (and VP) as best as possible in the Discussion? I am happy for you to keep the emphasis as is on the three main points (if consistent with the new analyses, see (2) above).

a) Please add this justification to the manuscript (in a much shorter form, of course)

b) If this is really relatively uncommon, maybe add another half sentence about why Ashburner and Ridgeway recommend this.

Thank you for the rest of the changes you have already made.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Firstly, I would like to apologize for how long this has taken. The holidays got in the way of finding and selecting an extra MRI-expert to give an opinion on the dispute between yourself and the initial MRI expert among the reviewers.

We have now received this second opinion, and it can be found below. The evaluation has been overseen by a guest Reviewing Editor and Barbara Shinn-Cunningham as the Senior Editor. The reviewers have opted to remain anonymous. The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

This extra review refers specifically to point number 2 in the previous decision letter. I would like to emphasize that all the other points still stand and would need to be acted upon in order for the paper to be accepted. However, I believe that for most of them, this is not difficult. In addition, because the reviews and the response to reviewers will be published alongside the manuscript, I don't mind if you refer in the main text to the response to reviewers to save space. However, it is crucial that such references are in the main text, because many readers will not scroll down to the reviewers' comments and the responses to such.

As for point number 2, where we asked you to replace the original analysis with the new SWe based analysis, we have now asked an independent expert for their opinion. Their response is as follows:

Reviewer #4:

The authors rely on the correlation analysis method to identify unique brain regions (voxels) in the songbird brains, of which the FA, an integrated/vectorized readout of the diffusion-based MRI signal, varies to specific song learning behavior during development. Previous reviewers raise the concerns on the statistic validity to specify the unique brain regions, in particular, the "circular issue" to define the ROIs based on the selected voxels (beyond a significance threshold).

From the revised manuscript, similar brain regions were highlighted using the new analysis method, which is encouraging. However, the authors have to assign smaller voxel sizes to preserve individual voxels above the statistic threshold, which may break the compensatory/correction rules for multiple comparison problems. This issue has been well reported in the literature and is faced by neuroimaging researchers routinely. In most cases, it is due to the rather small sample size to present the population given a certain size of variability. In the animal MRI field, it is a known problem.

The authors do observe some reliably detected spatial patterns in the songbird brains (n=14). One of the challenges for the voxel-wise analysis is to precisely register the brains from individual subjects to the same template. The mismatch of voxels across subjects leads to pseudo-negative statistic estimates, but if a smoothing step (averaging voxels) is applied, it may reduce the potential FA value differences across different conditions. It is a dilemma.

Here, I suggested two tentative ways to deal with the problem:

An intriguing observation is the symmetric observation of the brain regions (left and right brain nuclei are identified and voxel counts are provided in tables). One possible way to deal with the statistical issue is to create a mirror image for each subject. Then, the authors can just focus on the one-side hemisphere to redo their analysis (hopefully with sufficient power).

The second way is to define the ROI based on the songbird anatomy, but not by the voxel-wise analysis results. If the atlas-ROI can show specific correlation features, it can serve as an alternative way to support the voxel-wise results.

Overall, I see that the main results are convincing and novel. The authors should apply the correct statistical analysis strategy to retrieve their major discoveries in a more convincing way.

https://doi.org/10.7554/eLife.49941.sa1

Author response

Essential revisions:

Most of the essential revisions relate to the way the data analysis was performed and/or how the data were presented and discussed. No additional experiments are required.

1) A bit more detail on the birds and their experiences would be welcome. Clearly, some of the birds were exposed to the same tutor. But were some of the juveniles the offspring of the same parents?

Yes, some of the juveniles were the offspring of the same parents. We added an overview that indicates for each juvenile who were the biological and foster parents (if applicable), in the Materials and methods section as Supplementary file 12.

It would be helpful to know to what extent this information could be used to control for innate learning biases.

Based on the current dataset we cannot make conclusions about the implications of innate learning bias. Such tests require a carefully balanced/controlled study design where genetic brothers are raised in different conditions: (1) by its biological father (tutor = bio father), (2) by foster fathers (one foster father (tutor) per genetic brother).

Given that the behaviour of the tutor and social interactions between the tutor and the juvenile males have been shown to be important influencers of the juveniles’ song learning performance (Chen et al., PNAS), the rearing conditions and tutor exposure should be carefully controlled for. A yoked experimental design similar to what was used by Chen, et al., 2016, could help understand the effect of social interaction versus innate learning bias (auditory experience with limited visual and physical interactions).

Furthermore, a recent study has found that the interactions between juvenile males and their (foster) mother also have important effects on the juvenile males’ song maturation/performance (Carouso-Peck and Goldstein, 2019). Therefore, after the introduction of a tutor, the juvenile male birds should no longer be housed together with an adult female zebra finch (to avoid any potential social influences on vocal learning/performance due to social interactions with adult female zebra finches).

We added the following to the Discussion section:

“Importantly however, we should note that the study design employed in this study does not allow distinguishing between the implications of innate learning bias (innate properties of the pupil) and social enhancers of the tutor (social enhancers that promote learning in pupils). […] Lastly, delaying tutor exposure to after the first measurement (e.g. first (MRI) measure at 30 dph and introduction to tutor at 31 dph), can help differentiate between innate learning bias and social enhancement of vocal learning.”

Also, we are not requesting additional experiments, but it would have helped interpret the early left NCM FA values and learning outcomes if the juveniles had not been exposed to a tutor until after 30 days, to separate effects of the tutor song from innate properties of the birds.

Yes, we definitely agree! The goal of this study was to explore brain-behaviour relationships that arise along vocal learning. Therefore, we designed a longitudinal study where we raised 14 juvenile male zebra finches in close-to-normal rearing conditions (in small groups: adult male couple + 1 to 3 juveniles) and collected MRI data along with song recordings at distinct phases in the song learning process. When designing the study, we were not aware of social enhancers of vocal learning and therefore did not take this effect into account. Of course, we will take this finding and the reviewers’ suggestion into account when designing future studies!

More specifically, if we were to perform a more in-depth study relating to innate learning bias and social enhancers of vocal learning, we would definitely delay tutor exposure so as to investigate potential effects of innate learning biases (collect data before 30 dph) and potential changes in MRI readout elicited by exposure to the tutor at 30 dph (collect MRI data after 30 dph). Furthermore, we would include a higher number of birds divided over different social rearing conditions as briefly outlined above.

We added the following to the Discussion section:

“Lastly, delaying tutor exposure to after the first measurement (e.g. first (MRI) measure at 30 dph and introduction to tutor at 31 dph), can help differentiate between innate learning bias and social enhancement of vocal learning.”

2) MRI analysis 1 – Repeated measures: The authors used a two-step approach: first they ignore the fact that some of their measures were coming from the same individuals to perform their statistical analyses at the voxel level, and then, based on these results, used a ROI-based approach to take into account the repeated measures. We think that this approach is flawed because the selection of the voxels to determine the ROI is inaccurate (since each data point was considered to correspond to one subject). The result section should not present results where the repeated measure aspect of the dataset is not taken into account (first section of Results). We are aware that SPM does not currently allow analysing longitudinal datasets where the number of measures is not the same for all the subjects (as this is the case here). It seems that the authors have two options: (1) either they discard the 2 subjects for which they only have 3 data points and use a within-subject design for balanced designs in SPM; (2) or, even better, keeping their 14 subjects, they use the SwE toolbox (http://www.nisox.org/Software/SwE/) that seems to be able to handle unbalanced longitudinal datasets. Mean centering should allow the distinguishing between within- and between-subject effects (cf Guillaume, Hua et al., 2014, NeuroImage, 94).

We have re-analysed the data for both FA and log mwj using the SwE toolbox suggested by the reviewers. The outcome is shown in Author response images 1 and 2.

Author response image 1
Voxel-based multiple regressions using SwE showing the positive correlation between% similarity and FA.

The statistical parametric maps present the outcome of the voxel-based multiple regression testing for a correlation between song similarity and FA and are visualised at puncorr<0.001 and kE≥4 voxels, and overlaid on the population-based template and scaled according to the colour-code (T values) on the left of each statistical map. The crosshairs point to the tFA (A), the VP (C), NCM (E) all in the left hemisphere and CMM (G) in the right hemisphere. The extent and significance of these clusters is summarized in Author response table 1. Graphs B, D, F and H visualise the nature of the correlation between song similarity and FA where individual data points are colour-coded according to bird-identity (i.e. one colour = one bird). The average within-bird correlation is presented by the coloured lines, while the black dashed line indicates the overall association between song similarity and FA, disregarding bird-identity or age. ‘r’ is the repeated-measures correlation (rmcorr) coefficient. The * indicates a significant rmcorr correlation between FA and% similarity in the CMM (p=0.00287). Full summary of the overall and rmcorr correlation for all clusters is given in Author response table 2.

Author response image 2
Voxel-based multiple regressions using SwE showing the negative correlation between% similarity and log mwj.

The statistical parametric maps present the outcome of the voxel-based multiple regression testing for a correlation between song similarity and local tissue volume and are visualised at puncorr<0.001 and kE≥20 voxels, and overlaid on the population-based template and overlaid on the population-based template and scaled according to the colour-code (T values) on the left of each statistical map. The crosshairs point to the VP (A) or the CM in the left hemisphere (C). The extent and significance of these clusters is summarized in Author response table 1. Graphs B and D inform on the nature of the association between song similarity (%) and log-transformed modulated jacobian determinant (log mwj; a metric reflecting local tissue volume). More specifically, the individual data points of the graphs are colour-coded according to bird-identity (i.e. one colour = one bird). The average within-bird correlation is presented by the coloured lines, while the dashed black line indicates the overall association between song similarity and log mwj, disregarding bird-identity or age. ‘r’ is the repeated-measures correlation (rmcorr) coefficient. The * indicates a significant rmcorr correlation between logmwj and% similarity in the VP (p=0.0245) and in CM (left: p=0.0175). Full summary of the overall and rmcorr correlation for all clusters is given in Author response table 2.

Author response table 1
Summary of the voxel-based multiple regressions using SwE (% similarity and FA).
Correlation betweenClusterHemisphereClusterPeak
kEpFWETpuncorr
% similarity and FAtFALeft220.6753.63<0.001
Right70.5593.79<0.001
NCMLeft110.6353.74<0.001
CMMRight80.5874.10<0.001
VP40.9333.160.001
% similarity and log mwjVP8520.2014.090.000
CMLeft62650.9252.490.000
Right93720.024*5.170.000

FA stands for Fractional Anisotropy, one of the DTI metrics. ‘log mwj’ refers to the log-transformed, modulated and warped jacobian determinants. This table summarises the outcome of the voxel-based multiple regression based on 54 data points (12 birds with 4 time points and 2 birds with 3 time points). The ‘Cluster’ and ‘Peak’ columns refer to two different levels of assessing significance, respectively cluster-based inference and peak- or single voxel-based inference where the T- and p-value of the voxel with highest significance of the cluster is reported. When applying the selection criteria of pFWE<0.05 and kE>5 voxels, only the correlation between% similarity and log mwj of the right CM can be considered significant.

Author response table 2
Summary of the voxel-based multiple regressions using SwE (% similarity and FA or log mwj).
Correlation betweenClusterHemisphereBetween- subject correlationWithin-subject correlation
Spearman’s ρp valuermcorr rrmcorr p
% similarity and FAtFALeft0.728<0.00010.1750.274
Right0.656<0.00010.3260.0378
NCMLeft0.635<0.00010.1840.248
CMMRight0.618<0.00010.4540.0029*
VP0.592<0.0001-0.0410.799
% similarity and log mwjVP-0.583<0.0001-0.3990.0094*
CMLeft-0.3460.0103-0.3930.0111*
Right-0.3270.0158-0.3970.0101*

‘log mwj’ refers to the log-transformed, modulated and warped jacobian determinants; FA stands for Fractional Anisotropy, one of the DTI metrics. ‘r’ is the repeated-measures correlation coefficient of the within-subject correlation analyses. Spearmans’ ρ informs on potential correlations between the MRI parameters and song similarity at a specific time point between birds. Tests that survive Benjamini-Hochberg FDR correction for multiple comparisons are highlighted in bold Author response table 3.

Author response table 3
Benjamini-Hochberg FDR correction for multiple comparisons of rmcorr analyses.
MRI parameterCluster-based ROIHemispherep valuerank(i/m)Q
FACMMR0.0029*10.0063
Log mwjCMR0.0094*20.0125
Log mwjCML0.0101*30.0188
Log mwjVP0.0111*40.0250
FAtFAR0.037850.0313
FANCML0.24860.0375
FAtFAL0.27470.0438
FAVP0.79980.0500

‘log mwj’ refers to the log-transformed, modulated and warped jacobian determinants; FA stands for Fractional Anisotropy, one of the DTI metrics. rmcorr’ is the repeated-measures correlation analysis. FDR rate = 0.05; number of tests = 8; i is the rank, m is the total number of tests and Q is the false discovery rate set at 0.05. Only those tests that survive FDR correction for multiple comparisons are highlighted bold. ‘rmcorr’ is the repeated-measures correlation analysis.

We understand the concern that some of measures are coming from the same individuals to perform statistical correlation analyses. Reanalysing the data with SwE as requested picked up the same regions. Though if we would apply the same selection criteria (Roiser et al., 2016) using only clusters that survived a family-wise error (FWE) correction thresholded at pFWE<0.05 combined with a minimal cluster size (kE) of at least 5 or 20 contiguous voxels for respectively DTI and 3D RARE analyses, only the correlation between song similarity and volume changes in right CM would be considered as significant.

The more stringent SwE analysis also influences the rmcorr correlations in some regions. Since we extracted the average FA value or log mwj of a cluster to calculate rmcorr correlations, these average values change as the extent of the clusters change. In the original analysis, VP for example was a large cluster (479 voxels) and showed a significant within subject correlation of r=0.496*. In the SwE analysis, the cluster size of the VP was much smaller (4 voxels), causing the within subject correlation to decrease to r = -0.041. However, the outcome of the within-subject correlation analyses between song similarity and local volume (log mwj) remain similar.

The voxel based analysis (1) ONLY had exploratory purposes to uncover very specific brain sites where this brain-behaviour relationship exists, (2) was used successfully in two other papers (Hamaide et al., 2018, Orije et al., 2020) and (3) most importantly, lead to subsequent rmcorr analysis approving the appointed regions to have the investigated correlation. Using SwE instead, we had to bend the criteria of significance and number of clusters as defined for MRI (Roiser et al., 2016) in order to continue to the discovered rmcorr outcomes. We also feel that adding the SwE analysis to the supplementary data would only complicate things for the reader, and dilute the attention from the main message. We do not want to make this into a methodological paper, comparing different statistical analyses. I hope we could convince you to keep the original analysis and its outcome in the current paper as well.

We added a reference to direct the readers to the ‘response to the reviewers’ for more information about the supplementary analyses we performed. These clarifications were added to the Materials and methods: Voxel-based statistical correlation analyses between structural MRI and song parameters section.

“To overcome concerns that statistical correlation analyses was done on some of the measures coming from the same individuals, we reanalysed the data with a more stringent Sandwich Estimator (SwE) toolbox (http://www.nisox.org/Software/SwE/) approach and a classic ROI-based analysis as outlined in the ‘response to reviewers’ (p3-10). We could detect the same regions, but they would not be considered significant if we would apply the same selection criteria for assessing the significance of a cluster as outlined above (Roiser et al., 2016). As the purpose was exploratory and validated with subsequent cluster-based ROI correlation analysis, we preserved the initial outcome conform the analysis of our earlier studies (Hamaide et al., 2018; Orije et al., 2020).”

3) MRI analysis 2 – explaining for non-experts: The manuscript falls short for a general audience in detailing how various structures were identified and assigned significance. One issue relates to the requirement for clusters > 40 contiguous voxels. What is the diameter of a sphere containing that many voxels? And does this volume threshold exclude smaller song nuclei, such as HVC, LMAN, DLM, or Avalanche? Finally, it would help a general reader to report the scale of FA, rather than just reporting absolute values.

We have used a two-step approach in this study:

First, we performed brain-wide voxel-based statistical analyses to identify which brain sites exhibit a significant relationship between performance (similarity) and structural architecture (DTI or local volume). Instead of deciding where to look by manually drawing ROIs of e.g. the song control nuclei and auditory areas, we use voxel-based statistical methods that are capable of uncovering very specific brain sites where this brain-behaviour relationship exists.

For all voxel-based analyses, we used very strict criteria to identify significant clusters (excerpt from the Materials and methods: Statistical analyses: Voxel-based statistical correlation analyses between structural MRI and song parameters section):

“Unless explicitly stated, we used the following two criteria to assess the significance of a cluster: (1) clusters should contain at least 5 or 20 contiguous voxels for respectively DTI and 3D RARE analyses (number of contiguous voxels is represented by kE) and (2) the ‘peak voxel’ (based on T values) of the cluster should survive a family-wise error (FWE) correction for multiple comparisons thresholded at pFWE<0.05 (Roiser et al., 2016). Only clusters where both criteria were satisfied were considered significant.”

For all voxel-based statistical analyses, we used a cut-off of 5 contiguous clusters for DTI and 20 voxels for volume analyses. These cluster sizes correspond to the following volumes (see calculations below):

– DTI: volume of cluster of 5 voxels: 0.04332 mm³

– Volume analysis from 3D RARE images: volume of cluster of 20 voxels: 0.03808 mm³

– We refer the reviewers to Figure 2 of Nixdorf-Bergweiler (1996). These graphs include the volumes of four song control nuclei in juvenile male (and female) zebra finches. Based on these graphs and on the voxel and cluster volumes calculated above, the voxel and cluster volumes are small enough to detect differences in the song control nuclei and auditory areas.

In our previously published studies we have been able to identify structural changes in the brains of adult zebra finches when testing for structural differences between males and females (in e.g. HVC and LMAN: Hamaide et al., 2017), upon targeted brain lesioning (in e.g. HVC, DLM: Hamaide, Lukacova et al., 2018 NeuroImage) and when assessing volumetric changes in ontogeny (in e.g. LMAN: Hamaide et al., 2018). We have now added the longitudinal DTI data in male and female zebrafinches in the current paper confirm that the method allows to pick up sexual differential changes during ontogeny (the SCS) (Figure 1B-E).

In the graphs by Nixdorf-Bergweiler, you can see that the volume of most of the song control nuclei reaches adult (>100 dph) sizes at around 60 dph, which is the youngest age at which we obtained song production data in this study. Therefore, this volume threshold does not exclude detecting structural differences in smaller song nuclei such as LMAN and DLM, compared to larger song control nuclei such as Area X.

Second, after establishing where in the brain these relationships exist, we aimed at better understanding the nature of the voxel-based statistical analyses, i.e. to overcome limitations of voxel-based statistical testing in SPM by using rmcorr, and to create a graph to visualise the ‘nature’ of the correlation. Therefore, we extracted the average DTI or DBM parameter value for each cluster. To this end, we created ‘cluster-based ROIs’. These cluster-based ROIs are defined based on the statistical parametric maps. Instead of thresholding the maps at pFWE<0.05 (†) (strict threshold to define whether a cluster is significant or not), we extracted the clusters at p uncorrected<0.001 (‡) kE≥40 voxels (which makes the previously identified cluster slightly larger). This is an approach identical to studies previously published by our lab e.g. Hamaide et al., 2018 (brain development study); Hamaide et al., 2018; Anckaerts et al., 2019.

The volumes of these clusters correspond to:

– 3D RARE: volume of cluster of 40 voxels: 0.07616 mm³

– DTI: volume of cluster of 40 voxels: 0.34656 mm³

Calculation:

– 3D RARE: voxel size at acquisition: (0.07x0.17x0.16) mm³

– volume of one voxel: 0.001904 mm³

volume of 20 voxels: 0.03808 mm³ †

volume of 40 voxels: 0.07616 mm³ ‡

– DTI: voxel size at acquisition: (0.19x0.19x0.24) mm³

volume of one voxel: 0,008664 mm³

volume of 5 voxels: 0.04332 mm³ †

volume of 40 voxels: 0.34656 mm³ ‡

The present MRI correlation study considers only male zebra finches of >64 dph. Average volumes deduced from the data of Nixdorf-Bergweiler:

· LMAN: 0.2 mm³

· HVC: 0.5 mm³

· RA: 0.27 mm³

· Area X: 1.5 mm³

We have added some clarifications to the Materials and methods: Statistical analysis MRI section.

“We have used a two-step approach to analyse the MRI datasets. First, instead of deciding where to look by manually drawing ROIs (of for example the song control and auditory nuclei), we used data-driven image analysis techniques that are capable of localising the specific brain sites where a brain-behaviour relationship exists. […] The outcome of the DBM analysis was published in Hamaide, De Groof et al. (2018), while the male data were used in this study to correlate with the song outcome of the same birds.”

We have added some clarifications to the Materials and methods: Voxel-based statistical correlation analyses section.

“These cluster sizes correspond to the following volumes: DTI: volume of cluster of 5 voxels is 0.04332 mm³ and 3D RARE: volume of cluster of 20 voxels is 0.03808 mm³. […] This cluster-based ROI approach is identical to the methods used in our other studies e.g. (Hamaide et al., 2018, Hamaide et al., 2018, Anckaerts et al., 2019).”

“Finally, it would help a general reader to report the scale of FA, rather than just reporting absolute values.”

We have added some clarifying sentences to the Materials and methods: MRI data processing: Diffusion Tensor Imaging section:

“FA is scaled between 0 and 1; where 0 refers to isotropic and 1 anisotropic diffusion properties. Typically, one expects high FA values in white matter regions that contain many coherently organised myelinated fibre tracts.”

4) MRI analysis 3 – correcting for multiple comparisons: The authors need to choose how they want to correct for multiple comparison in their voxel-based approach (voxel wise or cluster wise). If the authors choose a cluster-based approach, they should justify the first p value threshold used to obtain the clusters (recent published recommendations about how to choose these thresholds should be followed and mentioned). If they choose a voxel-size approach, they should justify their minimum cluster size.

For all voxel-based statistical analyses, we have based our threshold for assessing significance on an Editorial in NeuroImage Clinical (Roiser et al., 2016). More specifically, only clusters that survived a family-wise error correction for multiple comparisons of pFWE<0.05 (peak voxel value), and consisted of at least 5 (DTI) or 20 (volume) contiguous voxels, were considered significant.

These thresholds are similar to the thresholds we used in our previously published papers (of which we added the references to the manuscript text).

We have added the following clarification to the text in the Materials and methods: Statistical analyses: MRI: Voxel-based statistical correlation analyses between structural MRI and song parameters section:

“Unless explicitly stated, we used the following two criteria to assess the significance of a cluster: (1) clusters should contain at least 5 or 20 contiguous voxels for respectively DTI and 3D RARE analyses (number of contiguous voxels is represented by kE) and (2) the ‘peak voxel’ (based on T values) of the cluster should survive a family-wise error (FWE) correction for multiple comparisons thresholded at pFWE<0.05 (Roiser et al., 2016).”

Furthermore, each figure showing the result of voxel-based analyses has a supplementary file showing the exact pFWE values at cluster level and peak level which is encouraged by (Roiser et al., 2016). The subscript of the figure mentions the statistical threshold used.

5) MRI analysis 4 – Positive controls: A potentially noteworthy feature of the current study is that the only significant anatomical changes were detected in regions outside of the classical song system. But numerous studies have shown that the structure of various song control nuclei (HVC, RA, Area X) changes markedly over the period in which these measurements were made (increasing in volume between 20 and 60 days, and increasing in myelination between 20 and 100 days, eg). Further, some early structural changes in the song system (spine density and dynamics in HVC) are correlated with copying outcome.

The reviewers refer to a study by Roberts et al. (Nature 2010) who found that: “Spine dynamics were measured in the forebrain nucleus HVC, the proximal site where auditory information merges with an explicit song motor representation, immediately before and after juvenile finches first experienced tutor song. Higher levels of spine turnover prior to tutoring correlated with a greater capacity for subsequent song imitation. In juveniles with high levels of spine turnover, hearing a tutor song led to the rapid (~24h) stabilization, accumulation and enlargement of dendritic spines in HVC. Moreover, in vivo intracellular recordings made immediately before and after the first day of tutoring revealed robust enhancement of synaptic activity in HVC. These findings suggest behavioural learning results when instructive experience is able to rapidly stabilize and strengthen synapses on sensorimotor neurons important to the control of the learned behaviour.”

There is an important difference in study design and hypothesis between the study by Roberts et al. and our study. For these experiments, Roberts et al. temporarily deprived birds of a tutor and investigated before versus after tutoring exposure. In our experiments, the juvenile birds already sing advanced copies of the tutor song. We are investigating relationships between how well birds are copying the tutor song and the structural properties of the brain, while Roberts et al. are addressing the first stages of song learning.

Furthermore, we would like to point out that, even though our in vivoDTI protocol is very sensitive to detect changes in microstructural tissue properties, changes in spine turnover should affect the entire nucleus and to a large extent if we want to pick it up with our brain-wide in vivo imaging tools.

I would be more confident in the current results if the authors could show that their method is sensitive to structural changes within the song system that are known to occur during development, even (or perhaps especially) if these changes are not correlated with song learning outcomes.

1)Regarding 3D RARE data for detecting volumetric anatomical changes during development:

We have recently published a longitudinal study on brain development in zebra finches based on similar volumetric data (obtained at 20, 30, 40, 65, 90, 120 and 200 dph) as we present in this manuscript. In the paper (Hamaide et al., 2018), we are capable of detecting sex differences in local volume that arise between 20 and 200 dph in e.g. HVC, RA and NIf. Furthermore, considering brain development, we find clear patterns of volume increase and decrease over e.g. the different phases of song learning.

Please see Author response image 3 and its legend from the paper as well as the figure on anatomy from the supplementary figures of this paper (Author response image 4):

Author response image 3
Relative volume differences between consecutive sub-phases of vocal learning in male and female zebra finches.

The statistical maps highlight voxels where the modulated jacobian determinants are larger (blue: volume decrease from first to later phase) or smaller (red; expansion from first to later phase) at the sensory phase compared to the sensorimotor phase, sensorimotor compared to the crystallization phase, or around crystallization compared to 200 dph. The sensory phase includes data obtained at 20–30 dph, the sensorimotor phase 40–65 dph, and the crystallization phase 90–120 dph. The statistical maps are color-coded according to the scale on the right (T-values; T = 5.09 corresponds to pFWE<0.05, no cluster extent threshold). Author response image 4 presents anatomical labels on the slices underlying the statistical maps. The white arrow in the horizontal slices points to LMAN. Abbreviations: dph: days post hatching; Do: dorsal; Ve: ventral; Ro: rostral; Ca: caudal. (Hamaide et al., 2018).

Author response image 4
Bird brain anatomy.

A informs on the different subdivisions of the zebra finch brain projected on the population-based template. The colors refer to pallium, subpallium, thalamus and hypothalamus, midbrain, pons and medulla. The drawing on the right subdivides the telencephalon in its different sub-regions delineated by laminae, and cerebellum. B illustrates sagittal slices of the bird brain including schematic atlas drawings obtained from the zebra finch histological atlas browser (Oregon Health and Science University, Portland, OR 97239; http://www.zebrafinchatlas.org (Karten et al., 2013), and MR-images extracted from the population-based template. The numbers below the sagittal slices appoint the approximate (~) distance (mm) from the midline. Anatomical areas visible on the T2-weighted MRI slices are appointed by numbers, while the letters indicate regions that are only visible on the schematic atlas drawings. C provides an overview of (the approximate position of) anatomical regions defined in (A) on horizontal slices derived from the population-based template. The numbers below the sagittal slices correspond to the approximate position (in ‘mm’ from the midline). Legend: a: MMAN; b: Field L2b; c: Area X; d: HVC; e: RA; f: basorostral nucleus; g: lateral arcopallium; h: basal nucleus of Meynert and ventral pallidum; 1: posterior commissure; 2: Field L; 3: NCM; 4: thalamic zone; 5: TSM; 6: anterior commissure; 7: LMAN; 8: striatum including Area X; 9: FPL (lateral prosencephalic fascicle); 10: MLd; 11: TeO or ventral part of the optical lobe; 12: entopallium; 13: medial and lateral portion of the caudal mesopallium (respectively CMM and CML). Abbreviations: Do: dorsal; Ve: ventral; Ro; rostral: Ca: caudal; L: left; R: right.

We added citations of these prior studies showing sexual dimorphism and volume changes during ontogeny.

We have added some clarifications to the Results section:

“The 3D anatomical dataset enabled us to assess regional changes in brain volume that arise over time (brain development) or between male and female zebra finch brains (sex differences; these data have been published (Hamaide et al., 2017). […] The present data enables us to extend on the latter, as the present study includes DTI data obtained in juvenile zebra finches.”

We have added some clarifications to the Discussion section:.

“Firstly, a previous study by our lab investigated brain development in (juvenile) zebra finches (Hamaide et al., 2018). This study clearly shows that most changes in brain volume occur relatively early (before 65 dph), and that the changes affect large portions of the brains. Furthermore, the same study shows that relatively large, wide-spread brain areas decrease in volume from 65 dph to 200 dph (the same time frame as this study). The clusters detected in the current study are much smaller and may perhaps overlap with only a small fraction of these large clusters.”

We have added some clarifications to the Materials and methods section.

“The outcome of the DBM analysis was published in Hamaide et al., 2018, while the male data were used in this study to correlate with the song outcome of the same birds.”

2) Regarding DTI data for detecting ultrastructural changes during ontogeny:

We have been able to detect sex differences in the brains of adult zebra finches (Hamaide et al., 2017). Furthermore, the volumes of the song control nuclei reach adult size at around 60 dph in male zebra finches. In this study, the data used for voxel-based correlation analyses are obtained at 65 dph and older. Therefore, the size of the nuclei should not interfere with our ability to detect differences.

Furthermore, we now added a chapter to the Results section: “Longitudinal structural MRI changes in the brains of maturing male and female zebra finches.” This chapter shows a comprehensive overview of (1) which brain sites develop differences in local tissue microstructure between male and female zebra finch brains during the song learning process, and (2) the microstructural brain changes that characterise the first 200 days of post hatch life in both male and female zebra finch brains. The results are summarized in figure 1B-E. This to show that our method is able to detect well known biological differences between male and female zebra finches (Nixdorf-Bergweiler 1996) and their changes during ontogeny. These results were further discussed in the Discussion.

We added to the Results section.

Longitudinal structural MRI changes in the brains of maturing male and female zebra finches:

We set up a longitudinal study where we repeatedly collected structural MRI data of the entire zebra finch brain (Figure 1A). […] In addition, several clusters identified by the (voxel-based) interaction between age and sex over time (Figure 1 B) were only found to be significantly changing during ontogeny in males (indicated by the white dotted boxes in Figure 1E).”

Concerning ‘even (or perhaps especially) if these changes in the SCS are not correlated with song learning outcomes’. This was actually a very nice suggestion and an extra control for our outcome. We investigated the correlation between% similarity and FA in main song control system regions: Area X, HVC, LMAN and RA. The outcome of this analysis and text has been added as an extra chapter in the Results section: “Song learning accuracy does not trace back to the song control system”.

6) Relating MRI to song learning – age: Please clarify how age is controlled for or used in the analysis. Do the authors just go from the assumption (based on the data, maybe) that copying accuracy increases with age, and that the two variables are therefore inextricably confounded? Or is there a way to separate maturation (age) effects from changes related to copying accuracy? It will also make it easier for the reader to understand statements like "However, individual improvements in song learning resulted in a lower local volume of the CM (left: p=0.0126; right: p=0.0075; Figure 3D)", which now may be difficult for some readers to assess, because age is not visible in the figure.

We do find a main effect of age, that is, we find that –on average– the pupils’ song similarity scores are significantly higher at 200 dph compared to 65 dph. For some birds, song similarity to tutor song does not increase over the different ages (visible in Figure 2A of the paper).

The goal of the current study was to identify brain-behaviour relationships. More specifically, we set out to investigate whether skilled performance (song learning accuracy deduced by song similarity) could be traced back to the structural properties of the brain. Based on the graphs, it is evident that not all birds follow the same trajectory, some birds learn faster or better compared to others. Therefore, we do not consider ‘age’ as a suitable reference or correcting factor and we chose to not take age into account in the statistical analyses of the MRI data. We think the repeated-measures correlation (rmcorr) is more powerful.

“However, individual improvements in song learning resulted in a lower local volume of the CM (left: p=0.0126 rmcorr=-0.391; right: p=0.0075 rmcorr=-0.416; Figure 4D).”

This statement is based on the outcome of the repeated-measures correlation (rmcorr) test. Age is not controlled in the rmcorr test, but bird-identity is taken into account (repeated measures). The outcome of the test indicates that –on average– when birds sing more accurate copies of the tutor song (within-bird comparisons!) they appear to have a smaller CM. In the Discussion, we speculate that continued improvements in song similarity as a form of vocal motor practicing might evolve towards an optimized and ‘automatic performance’ where redundant circuitries are pruned to facilitate optimal performance explaining the volume decrease in some of the involved structures. A similar explanation is used in the paper of (Ocklenburg et al., 2018) where they discovered asymmetry in tissue microstructure in the planum temporale (related to auditory speech processing) in human subjects, measured by diffusion MRI and explaining their readouts as more efficient information processing.

A detailed explanation is added to the Discussion section.

“It is generally known that in normal rearing conditions song learning accuracy improves with age. […] Based on these observations, we conclude that even though we cannot fully remove age-effects, we strongly believe that the current findings are mainly driven by correlations between performance levels and the structural characteristics of the brain rather than purely brain development effects.”

7) Relating MRI to song learning – representing changes in copying accuracy; This study includes multiple measures, using Fractional Anisotropy in Figure 2, and local volume in Figure 3. In both figures we see correlations between song similarity and MRI measure, but we do not see the time course of song learning. Therefore, statistical claims such as correlation between improvements in song learning and a lower local volume of the CM cannot be judged visually from the data as currently presented. To address this, authors should present figures, similar to Figures 2 and 3 but instead of showing the similarity vs MRI, present the similarity gains vs. MRI. For example: for each bird, you present similarity (day 90) – similarity (day 65) vs. MRI on day 90. This will allow the reader to judge visually (and not only statistically) if any of the MRI measure correlates with learning. In addition, it would be nice to have a figure illustrating this statement (e.g. E.g., showing similarity gains vs. right NCM activation): "Surprisingly, a small cluster in the right NCM displayed, in addition, a significant repeated-measures correlation."

The time course of song learning is presented in Figure 2B (song similarity scores relative to age).

The hypothesis of this study was focussed on finding brain-behaviour relationships. Therefore, we opted for correlation analyses where we were interested in between-bird variance (e.g.: Do birds that sing a better copy of the tutor song have a bigger HVC?) and within-bird changes (e.g.: If pupils progressively (between 65 and 200 dph) sing a better copy of the tutor song, does this improvement correlate –on average across the birds– with a change in microstructural tissue properties in a specific brain site?). These questions motivate our choice of visualising the data. Indeed, the current figures do not provide precise information about the age of the bird (Figure 2B demonstrates that% similarity increase on average with age), but the graphs of Figure 6 do provide valuable information about the overall song copying accuracy of the bird and consequently inform about the between-bird variation in learning accuracy (some birds produce on average a better copy of the tutor song than other birds, and this between-bird variance partly drives the correlation). On these graphs you can appreciate that birds which have high% similarity, demonstrate high% similarity at all ages, and present FA-data shifted to higher values: making the measure FA a surrogate measure for their ability to copy the song. This important piece of information would be lost when drawing figures based on similarity gains. Furthermore, when drawing figures based on similarity gains, we present figures which are not presenting our main message, and therefore could be misleading for the reader. Nevertheless, we made such figures (Author response image 5) to illustrate this.

Author response image 5
Absence of significant Spearmans’ ρ correlation between similarity gain and FA of NCM, tFA and VP.

Similarity gain was calculated as the difference of song similarity% between baseline 65 dph and subsequent time points (90, 120 and 200 dph). However, the gain in song similarity does not correlate to FA values in any of the regions that correlate with similarity% . This is mostly because the gain in song similarity does not take the difference between subjects into account. At the baseline of our song analysis 65 dph, good singers already show a high similarity% to the tutor song. Therefore, the further gain in song similarity% is limited. The gain in song similarity is an intuitive measure for song learning, but requires to be measured early enough to monitor a real progression. However, the subsong that juvenile zebra finches produce is highly variable and difficult to analyse, which is why we started measuring song output at 65 dph.

8) Relating MRI to song learning – dichotomizing copying accuracy: Regarding the correlation between MRI properties at days 20/40dph and learning accuracy, the authors should justify why they dichotomised song accuracy (good vs. bad learners) rather than simply taking the% of song similarity. Why don't the authors test whether MRI properties at day 20 (or 30) allow predicting vocal learning accuracy at day 200 (expressed as% of song similarity)?

We display the correlation between (FA at 20 dph) and (% song similarity at 200 dph) in Figure 6—figure supplement 2 and this confirms what we postulated. Only a positive correlation between FA at 20dph and% song similarity at 200 dph was found in the left NCM.

We added this figure as Figure 6—figure supplement 2 and refer to it in the Results section:

“Furthermore, FA in left NCM at 20 dph was positively correlated (p=0.01, ρ=0.662) to the% song similarity at 200 dph (Figure 6—figure supplement 2).”

This result was specific to the left NCM as none of the other regions identified in the voxel-based correlation analysis showed a similar predictive relationship (Figure 6C, Figure 6—figure supplement 1 and 2).

We have added some clarifications to the Materials and methods section.

“An additional correlation analysis was run between the FA values at 20 dph and the song similarity at 200 dph, to determine whether the microstructural tissue properties early in life already relate to song learning proficiency later in life.”

9) Discussion – mechanisms: The authors are careful to note the lack of explanatory power in these correlative measurements, which is good. But they need to say more about how they think these developmental changes might relate to learning. What are we supposed to make with the finding that the FA changes over development when there is no link to what that value represents in the songbird's brain? Going back to an earlier point, it would help to see that this method can detect FA changes related to increased myelination of the song system, which is dramatic and presumably should generate a large signal. That said, the authors need to discuss in depth how such changes (decreased volume, increased FA) could be related to better learning.

In reply to this, I would like to quote from ‘Zatorre RJ, Fields RD, Johansen‐Berg H. Plasticity in gray and white: Neuroimaging changes in brain structure during learning. Nat Neurosci. 2012; 15(4): 528‐ 536’ which is in the reference list of our paper: “current neuroimaging techniques cannot directly inform us about the underlying cellular events mediating the observed effects. Moreover, phenomena visible via MRI are likely never the result of a single process happening independently, but probably involve multiple coordinated structural changes involving various cell types. Conversely, neuroimaging techniques offer certain advantages as they can be repeatedly performed in the same individual and provide whole-brain measures of brain structure and function”

The following Figure from this paper (Author response image 6) illustrates this.

Author response image 6
Candidate cellular and molecular mechanisms of FA changes.

(a) Cellular events underlying changes detected by MRI during learning include axon sprouting, dendritic branching and synaptogenesis, neurogenesis, changes in glial number and morphology, and angiogenesis in gray matter regions. (b) Changes in white matter include axon branching, packing density, axon diameter, fiber crossing, and the number of axons, myelination of unmyelinated axons, myelin thickness and morphology, changes in astrocyte morphology or number, and angiogenesis (from Zatorre RJ et al., Nat Neurosci. 2012).

In the current paper we could divide the outcome (4 regions) into 1) fiber structures (tFA) and 2) gray matter regions where the gray matter could still be a mixture of gray matter and some fiber structures (VP).

I can only repeat what Zatorre et al.wisely wrote in their review on this topic “Neuroimaging changes in brain structure during learning” that multiple factors can be involved and this is what the readers of our paper should grasp as a message but also that this unbiased data driven imaging approach uncovered 4 regions involved in song copying accuracy and pave the way to further investigation of these regions such as testing the impact of specific neuromodulations on relevant brain networks and on song learning. We do hope the reviewers appreciate this new, different, complementary and highly valuable study approach that directs songbird neuroscientist to new and different study targets.

In relation to the comment ‘Going back to an earlier point, it would help to see that this method can detect FA changes related to increased myelination of the song system, which is dramatic and presumably should generate a large signal’. Figure 1 showing statistical FA maps during development/upon ageing- clearly illustrates changes in white matter structures (definitely involving myelination but also other features) in the entire brain including those surrounding the SCN, their mutual connections (contributing to changes in various lamina) and further down (tOM). These maps at the same time help to observe the bigger picture of (all) the changes.

We have added some clarifications to the Results section.

“The DTI datasets allows to establish spatiotemporal maps that indicate when and where in the brain neuroplastic changes in tissue microstructure occur (Hamaide et al., 2016). In the current study, we focus on the Fractional Anisotropy (FA) outcome, a metric derived from DTI data. FA quantifies the directional dependence of water diffusion and hence indirectly reflects specific microstructural tissue characteristics (Beaulieu, 2002). Note that alterations in FA can be caused by a wide variety of microstructural tissue re-organisations including altered axonal integrity, myelination, axon diameter and density, change in cellular morphology, etc. (Beaulieu, 2002, Zatorre et al., 2012, Dyrby, et al., 2018).”

We have added some clarifications to the Discussion section.

“As a result, alterations to FA are notoriously biologically unspecific as they can be caused by a wide variety of microstructural tissue re-organisations including altered axonal integrity, myelination, axon diameter and density, change in cellular morphology, etc. (Beaulieu, 2002, Zatorre et al., 2012, Dyrby et al., 2018). Moreover, the biological underpinnings responsible for the MRI readout are most probably always reflecting different processes happening in concert, in a coordinated way involving various different cell types. To unambiguously pinpoint the biological mechanisms responsible for the observed structural difference between good and bad learners, additional studies at the cellular and molecular level are required.”

10) Discussion – Novelty: Prior studies have shown correlations between NCM functional properties and song learning outcome, between CM functional properties and vocal error detection, and between VP and song copying. A strong feature of the current study is that it provides independent validation that these regions correlate with song copying, but given the earlier work, the current findings are not wholly novel even if they are useful contributions. On the other hand, the tFA result is entirely novel but what this fiber tract is needs to be more fully described. The authors are quick to link it to the projections from the basorostral nucleus, but we are uncertain whether such a precise assignment can be made with these methods. Is this distinct from other fiber tracts in this general region, including parts of the occipitomesencephalic tract? Showing some conventional histology of the tFA in relation to the MRI data would be helpful here. And the VP finding is quite timely, given the recently emerging evidence of the role of this structure in song learning. Further, VP is the only region that showed significant correlations in both FA and volume with learning. Perhaps the manuscript should highlight the tFA and VP findings more strongly, while casting the NCM and CM data are more confirmatory in nature, to emphasize novelty.

The reviewers clearly mention “Prior studies have shown correlations between NCM functional properties and song learning outcome, between CM functional properties and vocal error detection, and between VP and song copying” while we clearly demonstrate in the current paper the link with structural properties of these regions! And also… that this structural property has a predictive value for copying accuracy! Both things ARE NEW! Follow up studies can now be planned in a very targeted manner to unravel what this structural property is, but also which connections ‘starting from’ or ‘projecting to’ these regions ‘under which early live (tutoring) circumstances’ contribute to the observed changes in this region. The current study includes a massive amount of work providing a solid foundation for many follow up studies with different methodological approaches.

The location of tFA in our MRI outcome matches perfectly with tFA assignation in the drawing atlas of Nixdorf-Bergweiler and Bischof, 2007. As an illustration (not a validation), we did an extra effort to calculate DTI Fiber tracking (tractography) by copying as ‘seed’ the location of the crosshair (i.e. the voxel of highest significance) from the statistical maps to the super resolution track density images of the adult zebra finch used in the publication of Hamaide et al, 2017. These tracts shown in Author response image 7 shed more light on the extent of the fibers passing the crosshair.

Author response image 7
Result of voxel based multiple regression in tFA (A) used as a seed for exploratory tractography (B).

The statistical map (A) presents the outcome of the voxel-based multiple regression testing for a correlation between song similarity and FA (n=14). The crosshairs point to the tFA in the left hemisphere (A). Tractography clarifies the tracts running through tFA (B) cluster found with the voxel-based multiple regression. Seed-based fiber tractography itself was performed on ex vivo super-resolution reconstruction DTI acquired in male zebra finch brain created for Hamaide et al., 2017. The seeds were positioned at the level of tFA to filter out the relevant tracts from the whole brain probabilistic track density imaging using MRtrix3. More details on the methods used to acquire and process the tractography data can be found in (Hamaide et al., 2017).

We have added some clarifications to the Results section.

“Using various atlases of the zebra finch brain (http://www.zebrafinchatlas.org/; (Nixdorf and Bischof, 2007, Poirier et al., 2008, Karten et al., 2013) and high resolution tract tracings within the zebra finch brain (Hamaide et al., 2017), we identified that these clusters co-localize with two secondary auditory areas, i.e. the caudomedial nidopallium (NCM) and caudal mesopallium (CM), with a white matter tract that connects the basorostral nucleus to the arcopallium (frontoarcopallial tract (tFA) (Wild and Farabaugh 1996)), and with an area at the base of the telencephalon termed the ventral pallidum (VP).”

Furthermore, we found an additional cluster midsagittal near the striatum and mesopallium, extending laterally and caudo-ventrally adjacent to the septomesencephalic tract (TSM; sub-peak next to the TSM in the left hemisphere: pFWE=0.002 T=6.38; Figure 3C). Based on this spatial pattern and in accordance with the Karten-Mitra zebra finch brain atlas (http://www.zebrafinchatlas.org/; (Karten et al., 2013)), we identified this area as the VP.

In the Discussion we already included –to our knowledge– all the literature there is on tFA and related tracts in songbirds. From this it appears that the discovery of its involvement in song copying accuracy was as new to us as for the songbird community. Besides Martin Wild, whose coordinates I can no longer retrieve as he is retired, there is no one to ask help. I can as for now only conclude that our findings clearly illustrate that as pupils produce more accurate copies of the tutor song, training and controlling the upper vocal organs i.e. above the syringe including tong, beak, facial muscles.… is also essential for song copying accuracy. The reviewer’s suggestion to bring this to the forefront is very uncomfortable as the paper aims at pointing out that this tract is important and therefore interesting to look into in future studies maybe in comparison with homologue or similar tracts and their function in the humans. Overall, the present findings (i) add a new dimension to previously published data as we provide clear evidence of relationships between performance levels and the structural properties of four specific areas, (and not the functional properties as was shown before) (ii) identify a novel not-yet-explored brain area (tFA) in the context of song learning which deserves in-depth investigation in future studies and (iii) uncover that future performance levels can be predicted based on the structural properties of a secondary auditory region at the earliest stages of song learning. We actually emphasized that we were able ‘to go back in time’ and to relate song learning accuracy levels obtained at 65 dph (and older) to the structural properties of specific brain regions of the same birds at 20 dph before actual vocal practicing starts.

[Editors' note: the decision after resubmission follows.]

As for point number 2, where we asked you to replace the original analysis with the new SWe based analysis, we have now asked an independent expert for their opinion. Their response is as follows:

Reviewer #4:

The authors rely on the correlation analysis method to identify unique brain regions (voxels) in the songbird brains, of which the FA, an integrated/vectorized readout of the diffusion-based MRI signal, varies to specific song learning behavior during development. Previous reviewers raise the concerns on the statistic validity to specify the unique brain regions, in particular, the "circular issue" to define the ROIs based on the selected voxels (beyond a significance threshold).

From the revised manuscript, similar brain regions were highlighted using the new analysis method, which is encouraging. However, the authors have to assign smaller voxel sizes to preserve individual voxels above the statistic threshold, which may break the compensatory/correction rules for multiple comparison problems. This issue has been well reported in the literature and is faced by neuroimaging researchers routinely. In most cases, it is due to the rather small sample size to present the population given a certain size of variability. In the animal MRI field, it is a known problem.

The authors do observe some reliably detected spatial patterns in the songbird brains (n=14). One of the challenges for the voxel-wise analysis is to precisely register the brains from individual subjects to the same template. The mismatch of voxels across subjects leads to pseudo-negative statistic estimates, but if a smoothing step (averaging voxels) is applied, it may reduce the potential FA value differences across different conditions. It is a dilemma.

We smoothed our data in plane 2 times the voxel size. This is also what we did in our other papers. Smoothing is often recommended, for compensating imperfect registration, but also to improve statistics. Smoothing renders the data more Gaussian distributed, improving the validity of the commonly used Gaussian random field theory thresholding approach, which is used in SPM (Smith and Kindlmann, 2013).

Here, I suggested two tentative ways to deal with the problem:

An intriguing observation is the symmetric observation of the brain regions (left and right brain nuclei are identified and voxel counts are provided in tables). One possible way to deal with the statistical issue is to create a mirror image for each subject. Then, the authors can just focus on the one-side hemisphere to redo their analysis (hopefully with sufficient power).

We don’t think this is a good solution as the regions that were found clearly show asymmetry, e.g. left NCM allows predictions from FA, while right NCM does not; see Figure 6). Overall, hemispheric specialization has been demonstrated in zebra finches by us in previous fMRI studies (Poirier et al., 2009) and others (Phan and Vicario, 2010).

The second way is to define the ROI based on the songbird anatomy, but not by the voxel-wise analysis results. If the atlas-ROI can show specific correlation features, it can serve as an alternative way to support the voxel-wise results.

We agree that a good ROI based analysis is necessary to support the findings of a voxel based analysis, since the voxel-wise multiple regression in SPM does not allow including a random effect for bird identity. Consequently, by inserting repeated-measures data we violate the assumption of independency of measures. To correct for this potential confound, we performed additional tests on ROI based data. One can debate which ROI’s should be used for this analysis, atlas based ROI’s or cluster based ROI’s. In our prior comments to the reviewers we added a section to the paper where we performed a ROI based analysis on several song control nuclei to confirm that song learning accuracy does not trace back to the song control system. We will expand on this section adding a supplementary table with the full results of a ROI based analysis. In relation to this ROI based analysis, it is important to recognize the following facts: (1) There is no such ‘brain atlas’ of the zebra finch where the entire-brain is covered with detailed ROI’s in analogy with rodent brain atlases (e.g. different cortices, different subregions of the striatum, of the hippocampus etc.). This is probably because the songbird brain has been mainly studied as neural substrate for song learning and production. There are atlases which provide a list of annotations (including our MRI based atlas:(Poirier et al., 2008)) but as I mentioned they do not cover the entire brain and do not provide details/subdivisions. (2) It is exactly the exploratory unbiased voxel based analysis that allowed us to ‘discover’ additional regions beyond the well-known song related ROI’s, and this is the main purpose and benefit. One of these discovered regions (VP) or ventral pallidum, was brought up in very recent literature (Hisey et al., 2018). The other one (tFA) or fronto-arcopallial tract is hardly described in literature and not in relation to copying capacity and thus would have never been selected as ROI for further investigation if we would not had obtained the outcome of the exploratory voxel based approach!

Taking this into account we performed a ROI based analysis to confirm the results of the voxel based analysis. For this we transferred relevant ROI’s from the Zebra finch atlas (Poirier et al., 2008). Including regions from the song control system: Area X, HVC, LMAN, RA; and regions from the auditory system: Field L, NCM, CMM. In addition, we delineated tFA and VP, which came up significantly in the voxel based multiple regression analysis. The surroundings of Area X and RA were also delineated as they showed a significant change during ontogeny in male zebra finches. This to confirm whether a ROI based analysis would also pick up these regions as significant. These delineations were made based on the contrast of the average FA map created for figure 1E of the main article.

Since we are dealing with repeated measures, violating the assumption of independence, there are two different ways in which to approach the correlation analysis. Firstly, spearman’s ρ was calculated on averaged song similarity and FA for each subject. This is a common solution to resolve the issue of non-independence. This renders the overall association or between-subject correlation. Secondly, we analysed the repeated measures correlation, which takes for each bird the repeated-measures into account and can provide inference on the common association between brain structure and song similarity across the group of birds. Both types of correlation analysis were performed on the ROI based FA data. The results are summarized in Author response table 4.

Author response table 4
Summary of the ROI-based between- and within-subject correlation analysis (% similarity and FA).
ROIHemisphereBetween- subject correlationWithin-subject correlation
Spearman’s ρp valuermcorr rrmcorr p
Song control systemArea XLeft0.02860.92280.2670.0911
Right-0.18240.53250.1620.313
HVCLeft-0.49890.0694-0.2710.0871
Right-0.07250.80540.08060.616
LMANLeft0.09890.7366-0.1350.4
Right-0.02420.9346-0.01140.944
RALeft-0.32310.25990.0290.857
Right0.04620.87550.1380.391
Auditory systemField LLeft-0.05930.84030.1080.501
Right-0.29230.31050.09960.536
NCMLeft0.69670.00560.1090.499
Right0.54730.04280.1850.247
CMLeft0.48130.0814-0.1230.443
Right0.6220.01760.2830.0732
OtherVPLeft0.59560.02460.1360.397
Right0.60880.02090.2360.138
tFALeft0.63960.01380.2590.102
Right0.34510.22690.3430.0283
Area X surr.Left-0.13410.64770.1930.225
Right-0.06370.82860.2930.0633
RA surr.Left-0.1560.5942-0.1910.232
Right0.01980.94650.2420.128

The FA averaged ROI based correlation analysis finds the same significant regions as the voxel based analysis. Picking up significant correlations in NCM (bilateral), right CM, left tFA. Right tFA does not present a significant between-subject correlation, but it is significant correlated within-subjects. Using the complete regions to calculate repeated measures correlations, does not give the same results. There is now no longer a significant rmcorr in right NCM, or VP. Meaning that our previous analysis could detect a small subpart of both NCM and VP that demonstrates a within-subject correlation to song similarity, whereas the analysis of the average data of the entire region could no longer pick up a repeated measures correlation. IEG expression patterns also don’t ‘fill’ the entire NCM (Terpstra et al., 2004), argumenting that NCM is indeed a large region with different sub-regions of which the particular functions have not yet been uncovered. Our exploratory voxel based analysis at least uncovered which part of NCM is engaged in song similarity. This is the entire point: our exploratory study exactly allowed us to go beyond these atlas shortcomings and discover what subregions are ‘recruited’ for producing a perfect copy of the tutor song!

FA stands for Fractional Anisotropy, one of the DTI metrics. This table summarizes the ROI-based between subject correlation, calculated using spearman’s ρ on song similarity and FA data averaged per subject (12 data points, 1 for each of the 12 birds). The within-subject correlation was calculated using repeated measures correlations based on 54 data points (12 birds with 4 time points and 2 birds with 3 time points). Significant correlations are highlighted bold. Abbreviations: LMAN: lateral magnocellular nucleus of the anterior; NCM: nidopallium; caudomedial nidopallium; CM: caudomedial mesopallium; tFA: fronto-arcopallial tract; VP: ventral pallidum; Area X surr.: Area X surroundings, RA surr.: RA surroundings.

https://doi.org/10.7554/eLife.49941.sa2

Article and author information

Author details

  1. Julie Hamaide

    Bio-Imaging Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium
    Contribution
    Conceptualization, Formal analysis, Visualization, Methodology, Project administration
    Competing interests
    No competing interests declared
  2. Kristina Lukacova

    Centre of Biosciences, Institute of Animal Biochemistry and Genetics, Slovak Academy of Sciences, Bratislava, Slovakia
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  3. Jasmien Orije

    Bio-Imaging Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6699-6221
  4. Georgios A Keliris

    Bio-Imaging Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium
    Contribution
    Supervision
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6732-1261
  5. Marleen Verhoye

    Bio-Imaging Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium
    Contribution
    Formal analysis, Supervision
    Competing interests
    No competing interests declared
  6. Annemie Van der Linden

    Bio-Imaging Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium
    Contribution
    Conceptualization, Supervision, Funding acquisition
    For correspondence
    annemie.vanderlinden@uantwerpen.be
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2941-6520

Funding

Fonds Wetenschappelijk Onderzoek (G030213N)

  • Annemie Van der Linden

Fonds Wetenschappelijk Onderzoek (G044311N)

  • Annemie Van der Linden

Fonds Wetenschappelijk Onderzoek (G037813N)

  • Annemie Van der Linden

Fonds Wetenschappelijk Onderzoek (Hercules Foundation)

  • Annemie Van der Linden

Fonds Wetenschappelijk Onderzoek (AUHA0012)

  • Annemie Van der Linden

Belgian Federal Science Policy Office (Interuniversity Attraction Poles P7/17)

  • Annemie Van der Linden

Fonds Wetenschappelijk Onderzoek (1115215N)

  • Jasmien Orije

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Dr S C Woolley and Dr D Vallentin for valuable discussions of the data and reading of earlier versions of the manuscript. The computational resources and services used in this work to build the population-based template were provided by the HPC core facility CalcUA of the Universiteit Antwerpen, the VSC (Flemish Supercomputer Center), funded by the Hercules Foundation and the Flemish Government – department EWI.

Ethics

Animal experimentation: The Committee on Animal Care and Use at the University of Antwerp (Belgium) approved all experimental procedures (permit number 2012-43 and 2016-05) and all efforts were made to minimize animal suffering.

Senior Editor

  1. Barbara G Shinn-Cunningham, Carnegie Mellon University, United States

Reviewing Editor

  1. Tom Smulders, Newcastle University, United Kingdom

Publication history

  1. Received: July 4, 2019
  2. Accepted: February 27, 2020
  3. Version of Record published: March 20, 2020 (version 1)

Copyright

© 2020, Hamaide et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,003
    Page views
  • 107
    Downloads
  • 5
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Julie Hamaide
  2. Kristina Lukacova
  3. Jasmien Orije
  4. Georgios A Keliris
  5. Marleen Verhoye
  6. Annemie Van der Linden
(2020)
In vivo assessment of the neural substrate linked with vocal imitation accuracy
eLife 9:e49941.
https://doi.org/10.7554/eLife.49941

Further reading

    1. Neuroscience
    Abraham Katzen, Hui-Kuan Chung ... Shawn R Lockery
    Research Article Updated

    In value-based decision making, options are selected according to subjective values assigned by the individual to available goods and actions. Despite the importance of this faculty of the mind, the neural mechanisms of value assignments, and how choices are directed by them, remain obscure. To investigate this problem, we used a classic measure of utility maximization, the Generalized Axiom of Revealed Preference, to quantify internal consistency of food preferences in Caenorhabditis elegans, a nematode worm with a nervous system of only 302 neurons. Using a novel combination of microfluidics and electrophysiology, we found that C. elegans food choices fulfill the necessary and sufficient conditions for utility maximization, indicating that nematodes behave as if they maintain, and attempt to maximize, an underlying representation of subjective value. Food choices are well-fit by a utility function widely used to model human consumers. Moreover, as in many other animals, subjective values in C. elegans are learned, a process we find requires intact dopamine signaling. Differential responses of identified chemosensory neurons to foods with distinct growth potentials are amplified by prior consumption of these foods, suggesting that these neurons may be part of a value-assignment system. The demonstration of utility maximization in an organism with a very small nervous system sets a new lower bound on the computational requirements for utility maximization and offers the prospect of an essentially complete explanation of value-based decision making at single neuron resolution in this organism.

    1. Neuroscience
    Yuan-hao Wu, Ella Podvalny, Biyu J He
    Research Article Updated

    While there is a wealth of knowledge about core object recognition—our ability to recognize clear, high-contrast object images—how the brain accomplishes object recognition tasks under increased uncertainty remains poorly understood. We investigated the spatiotemporal neural dynamics underlying object recognition under increased uncertainty by combining MEG and 7 Tesla (7T) fMRI in humans during a threshold-level object recognition task. We observed an early, parallel rise of recognition-related signals across ventral visual and frontoparietal regions that preceded the emergence of category-related information. Recognition-related signals in ventral visual regions were best explained by a two-state representational format whereby brain activity bifurcated for recognized and unrecognized images. By contrast, recognition-related signals in frontoparietal regions exhibited a reduced representational space for recognized images, yet with sharper category information. These results provide a spatiotemporally resolved view of neural activity supporting object recognition under uncertainty, revealing a pattern distinct from that underlying core object recognition.