Zebra finches are sexually dimorphic vocal learners. Males learn to sing by imitating mature conspecifics, but females do not. Absence of song in females is associated with atrophy and apparent repression of several vocal learning brain regions during development. However, atrophy can be prevented and vocal learning retained in females when given early pharmacological estrogen treatment. To screen for drivers, we performed an unbiased transcriptomic analysis of song learning nuclei and surrounding regions from either sex, treated with vehicle or estrogen until 30 days old when divergence between the sexes becomes anatomically apparent. Analyses of transcriptomes by RNA sequencing identified song nuclei expressed gene modules associated with sex and estrogen manipulation. Female HVC and Area X gene modules were specialized by supplemental estrogen, exhibiting a subset of the transcriptomic specializations observed in males. Female RA and LMAN specialized modules were less dependent on estrogen. The estrogen-induced gene modules in females were enriched for anatomical development functions and strongly correlated to the expression of several Z sex chromosome genes. We present a hypothesis linking loss and de-repression of vocal learning in females, estrogen, and sex chromosomes gene expression.
This study is useful as it provides further analysis of previously published data to address which specific genes are part of the masculinizing actions of E2 on female zebra finches, and where these key genes are expressed in the brain. However the data supporting the conclusion of masculinizing the song system are incomplete as the current manuscript is a re-analysis of differential gene expression modulated by E2 treatment between male/female zebra finches without manipulation of gene expression. The conclusions (and title) regarding song learning are also incompletely supported with no gene manipulation or song analysis. Importantly, the use of WGCNA for a question of sex-chromosome expression in species without dosage compensation is considered inadequate. As the experimental design did not include groups to directly test for song learning, and there was also no analysis of song performance, these data were also considered inadequate in that regard.
Vocal learning is the ability to imitate sounds, and is a necessary and specialized component for spoken language and song. Vocal learning is found in seven non-human clades, four mammalian and three avian, each having independently evolved the trait1. Oscine songbirds have proved to be the most tractable for studying vocal learning in the lab, with much of the field focusing on the Australian zebra finch (Taeniopygia castanotis). Despite ∼300 million years of separation from their common ancestor2,3, there is remarkable evolutionary convergence between songbird and human vocal learning in terms of behavioral progression, developmental effects of deafening, anatomical connectivity of vocal-motor learning circuits, sites of accelerated evolution within the genome, and specific genes with specialized up- or down-regulated expression in song and speech circuits relative to the surrounding motor control circuits1,4–10. Unlike in humans, however, vocal learning is strongly sexually dimorphic in zebra finches and many other vocal learning species11. Male zebra finches learn to produce a species appropriate song by imitating mature male conspecifics during juvenile development, while females are limited to producing innate calls12.
The songbird vocal motor learning circuit contains four major interconnected telencephalic song control nuclei; HVC (proper name) in the dorsal nidopallium (DN); the lateral magnocellular nucleus of the anterior nidopallium (LMAN) in the anterior nidopallium (AN); the robust nucleus of the arcopallium (RA) in the lateral intermediate arcopallium (LAI); and Area X in the striatum (Str; Fig. 1a)6. During juvenile development in zebra finch females, HVC and RA atrophy, HVC fails to form synapses in RA, and Area X never appears11,13–20. Amazingly, female zebra finches treated with estrogen or a synthetic analog at an early age do not exhibit song system atrophy and instead form a functional neural circuit with all the anatomical components and connections seen in males21–24. This “masculinized” song system allows estrogen supplemented females to imitate vocalizations, though not with the same accuracy as males21,23,25–27. Interestingly, lesioning female HVC prevents estrogen dependent anatomical “masculinization” of its postsynaptic targets of RA and Area X28.
The genetic basis of this estrogen-sensitive and sexually dimorphic zebra finch vocal learning remains largely unknown beyond the downstream recruitment of the androgen receptor (AR)29. However, the examination of a rare gynandromorphic zebra finch with lateralized sex chromosome composition indicates that genetically male HVC and Area X are larger than their female analogs independent of gonadal hormone production, implicating sex chromosome gene expression within the song system30. Unlike the mammalian X and Y, birds have a Z and W sex determination system where females are hemizygous (ZW) and males homozygous (ZZ)31. The relevant transcriptional machinery appears to be set up by post-hatch day 30 (PHD30), after which estrogen fails to masculinize female song nuclei or behavior and the male song system enlarges while the female song system atrophies13,15,21,32. Taken together with the hypothesis that vocal learning in females was lost multiple independent times amongst songbirds33, the extant findings suggest genetic drivers of vocal learning loss associated with estrogen, sex chromosomes, and song nuclei gene expression specializations. To screen for potential genetic drivers and locate their action within the song system, we performed an unbiased analysis of transcriptomes from song nuclei and surrounding motor control regions in zebra finches of either sex chronically treated with 17-β-estradiol (E2) or vehicle from hatch until sacrifice at PHD30 (Fig. 1b). We used a new zebra finch genome assembly and annotation produced by the Vertebrate Genomes Project containing both the Z and W chromosomes34.
Identification of gene expression modules in song nuclei
We first sought to characterize differences in song nuclei gene expression specializations relative to surrounds in juvenile males and females with and without E2 treatment. We used RNA-Seq data from a previous study by our lab, on the effects of E2 manipulation on the song system in both males and females25. This E2-treatment program masculinized females sufficiently for them to produce song as adults in another cohort of birds25. Birds were sacrificed as juveniles at post-hatch day 30 (PHD30), after a one hour period of silence to limit activity-dependent gene expression in song nuclei and surrounding motor regions; this developmental age was chosen because it is around the time when both males and E2-treated females start to sing (e.g. subsong) and all four song nuclei are sufficiently developed to be visually apparent in histological sections25. The four major song nuclei (HVC, LMAN, RA, and Area X) and their adjacent surrounding motor regions (DN, AN, LAI, and Str respectively; Fig. 1a) were dissected using laser capture. The E2-treated females in this study had similarly sized song system nuclei as males, indicating that E2 treatment prevented atrophy. This past study mapped the RNA-Seq reads to an older genome assembly lacking the W sex chromosome25.
With this data, we first remapped RNA-Seq reads (from n = 3 birds per group) to a new zebra finch assembly (GCF_008822105.2) with both the Z and W chromosomes. In our quality control analyses, hierarchical clustering of sample expression vectors revealed two technical outliers, one HVC from a vehicle-treated male HVC and one RA from a E2-treated female, which we excluded from further analysis (Fig. S1). Similar to previous PCA and hierarchical clustering25, the remapped RNA-Seq data expression levels with outliers removed still resulted in separation of song nucleus and surround and some E2-treated female samples, without sufficient resolution at finer group levels (Fig. S1). To address our questions at finer resolution and in an unbiased way, we performed weighted gene correlation network analysis (WGCNA), where we decomposed the transcriptome into actively expressed gene modules using hierarchical clustering of the gene adjacency matrix. This matrix describes the inferred structure of gene networks within our data and was calculated by soft thresholding the matrix of gene-to-gene expression correlations across samples (Fig. S2)35. After initial hierarchical cluster-based module assignment, genes were iteratively reassigned to the module whose aggregate expression they best correlated with until no additional genes met the WGCNA reassignment threshold (Fig. S3 and Methods for parameterization). This optional reassignment was performed to ensure that genes are matched to the aggregate measure that best represents their expression.
Of the ∼21,000 annotated zebra finch genes, 13,220 were well expressed in the finch telencephalic brain regions sampled, a comparable number of genes to what we have seen expressed in adult zebra finch telencephalon9,36. These 13,220 were assigned to 14 co-expression modules (Fig. S4a). The median observation of unassigned transcripts was 22-fold lower than the median observation of module assigned transcripts (4.86 vs. 0.22 FPKM). Two modules clearly marked single samples from two different birds (Fig. S4a,b - arrows), indicative of technical overfitting; these two modules (not birds) were excluded from further analysis, resulting in a more visibly diverse pattern within and across modules (Fig. S4c,d). The remaining 12 modules contained 12,444 genes in total were lettered in descending order of size A through L, containing from 4,890 to 127 constituent genes each, and were dynamically expressed across brain regions and treatment groups (Fig. 1c)
Lack of male HVC and Area X specializations in juvenile females
To understand song system transcriptional specialization in the context of the gene modules, we calculated the module eigengene (MEG) expression values for each of the 12 modules (Fig. 1d). MEGs are the first principle component of variance of all genes in a module and are the aggregate measure for each module’s expression across samples. We then tested for statistically significant correlations between MEG expression for each module and the song nuclei specializations relative to their respective surrounds. Each song nucleus had unique and overlapping specialization of genes in 2 to 4 modules of the 12 (Fig. 2a). The LMAN specialization relative to the surrounding AN correlated with the expression of gene modules B, F and I; HVC specialization relative to DN correlated to the expression of gene modules B, F, and G; RA specialization relative to LAI consisted of gene modules C and L; and Area X specialization relative to the surrounding Str consisted of gene modules C, F, G, and I. Area X was the only song nucleus that did not have a specialized module unique to it relative to other song nuclei (Fig. 2a).
In non-vocal learning juvenile females, interestingly LMAN was specialized relative to the AN by the same gene modules as in males (B, F, and I) as well as an additional module G (Fig. 2b); RA was specialized by module A as in males, but not module L and by additional modules A and G. In contrast, neither juvenile female HVC nor Area X exhibited significant gene module expression specializations relative to their surrounds. We interpret these findings to indicate that: LMAN and RA atrophy later in juvenile female development; that module A and G genes could be associated with the start of this atrophy; HVC and Area X are likely the first to atrophy or not develop; and lack of any gene module specialization in them at this age could mean that they would be more sensitive to estrogen prevention of vocal learning loss.
E2-responsive gene modules in song nuclei
To assess the effects of chronic exogenous estrogen on the developing song system, we first performed a control analysis of modules in the E2-treated juvenile males. In the E2-treated juvenile males, the song nuclei specialized modules overlap to those seen in vehicle-treated males (Fig. 2a vs 2c); differences were that module I in LMAN and Area X and module L in RA were no longer present; and module G newly appeared in HVC, module A appeared in RA as seen in females, and module E also appeared in RA. These differences in gene module specializations in the male song system are consistent with the E2-treatment regimen used causing a slight decrease in vocal learning accuracy in males25.
In contrast, E2-treated females more closely mirrored the gene module specializations seen in the vehicle treated male song system (Fig. 2a vs 2d). Specifically, module G was not associated to RA while module C became more strongly correlated; module G expression was strongly specialized in HVC; and module C appeared specialized in Area X. Despite these changes to be more like males, in E2-treated females, module A in RA maintained its specialization as seen in vehicle-treated females (Fig. 2b vs 2d), and module E appeared in RA as in E2-treated males (Fig. 2c vs 2d); a weak correlation of a module I in HVC and module K in RA appeared (Fig. 2d). That is, the transcriptional response of female song nuclei to E2-treatment appeared to be far more dramatic than in males, consistent with estrogen inhibition in males having only a limited impact on song nuclei and vocal learning25.
We performed an additional test for E2-induced changes to gene module expression in females by comparing E2-treated song nuclei in females to the combination of E2-treated surrounds from the same animals, vehicle-treated female surrounds, and vehicle-treated female song nuclei (Fig 2e). This analysis represents a comparison between samples from vocal learning capable female samples and non-vocal learning female samples at each node. It revealed 1 to 3 modules in each song nucleus of E2-treated females that were a subset of some of the strongest correlated modules found in control males (Fig. 2a vs 2e). These were: module F in LMAN; module G in HVC; module C in RA; and modules C, F and G in Area X. That is, these are four core modules for song nuclei that are the most sensitive to E2-induction or enhancement in females and most associated with the presence of vocal learning in females.
Gene modules for telencephalic sex differences
We next tested whether some modules could be explained by sex differences in the brain. We compared vehicle treated male and female song nuclei and surrounds separately, and found that module E strongly correlated with higher expression in all male brain regions relative to females (Fig. 2f,g). Among the song nuclei, module G eigengene expression was significantly higher in male HVC and Area X; conversely, module C expression was higher in female LMAN and RA (Fig. 2f). Among the surrounds, module H eigengene expression (not significant in any other comparison) was significantly higher in female DN and module C was significantly higher in female AN and Str (Fig. 2g). These findings indicate the existence of a sex specific, telencephalon-wide shift in the transcriptome represented in these data by module E. This broad sexual dimorphism is further refined by regionally patterned sex differences like those observed for module G in HVC.
Gene modules with region specific expression
We next quantified the magnitude of module expression differences and observed several gene modules with clear region-specific expression patterns (Fig. 2h-k). Module A, which appeared specialized in RA relative to LAI in several comparisons, was highly expressed in all Area X and Str samples, regardless of treatment or sex (Fig. 2h). Interestingly, FOXP2, a gene critical for spoken-language and vocal learning in humans37 and songbirds38, was a potent member of module A, correlating with the eigengene at r2 = 0.92 across all samples. Module C, which was specialized to several song nuclei including RA, HVC, and Area X relative to their surrounds and sexually dimorphic in LMAN and Str, was most highly expressed in arcopallial samples, especially RA, with some increase in HVC (Fig. 2i). Module F, which was specialized to LMAN relative to its surround in all such comparisons, was highly expressed exclusively in LMAN regardless of sex and E2-treatment (Fig. 2j). Module G, which was specialized to HVC in a sex and E2 dependent manner, was only highly expressed in HVC samples, with males having higher expression than females, and females further split by treatment (Fig. 2k).
Functional enrichment of specialized modules
We sought to understand the cumulative biological function of genes among the modules. To do this, we mapped the zebra finch genes to their 1:1 human orthologs where possible and then used human gene annotation to examine the Gene Ontology (GO) functions enriched within each module’s constituent genes. Of the 12,444 module assigned genes, 7,909 (63%) had 1:1 human orthologs annotated in Ensembl. We found GO terms significantly enriched in module G (Table 1), which included “DNA binding transcription factor activity”, “cell differentiation”, “anatomical morphogenesis”, “cell-to-cell signaling”, and “positive regulation of multicellular organism growth”, indicating that module G genes specialized in male and E2-treated female HVC and Area X potentially enact a proliferative developmental program. Other significantly enriched terms were “extracellular matrix structural component”, “external side of the plasma membrane”, and “extracellular space”, indicating that this module may also act to restructure the extracellular matrix, perhaps to accommodate new cells. Module E, which was differential between the sexes for both song nuclei and surrounds, had 6 significantly enriched terms, of which 3 pertained to DNA damage repair: “nucleolus”, “transcription, DNA templated”, and “U2-type precatalytic spliceosome” (Table 2). These results in module E indicate that there is a telencephalon-wide sexually dimorphic gene expression program in finches that likely acts within the nuclear environment. A full table of GO enrichments by module can be found in supplemental data (supplemental_sig_go_enrich.csv).
Gene modules enriched for human speech associated genes
We next asked if any of the modules were enriched for genes previously determined to be convergently specialized in songbird song nuclei and human speech brain regions7,9. We performed this analysis by treating the different sets of convergently specialized genes as gene ontology terms. These gene lists included the transcriptional convergence between RA and human dorsal laryngeal motor cortex (dLMC), HVC and dLMC, LMAN and dLMC, Area X and the anterior caudate, and Area X and the anterior putamen (Gedman et al. Fig3B,D9). We found that each of these convergent gene lists was enriched for the modules specialized to or highly expressed in the finch brain region used to generate the list (Fig3a). We found that module B, specialized to both LMAN and HVC (Fig 2a), was enriched for the convergently specialized genes in human dorsal laryngeal motor cortex (dLMC; Fig 3a)9. Module C, specialized to RA (Fig 2a,i), was enriched for the genes convergently specialized in dLMC and RA (Fig 3a) known to match the lower layers of the cortex9. Module A, highly expressed in Area X and Str (Fig 2h), was enriched for genes convergently specialized in Area X and human anterior striatum (both caudate and putamen; Fig. 3a).; Module I, specialized to Area X (Fig 2a), was also even more strongly enriched for the same convergences as module A (Fig 3a). These findings indicate that the genes previously identified as convergently regulated between songbird song brain regions and human speech brain regions are components in the larger specialized gene networks identified here using WGCNA.
Gene modules enriched for specific chromosomes
We next determined whether any modules were enriched in genes from specific chromosomes. We performed a bootstrapped analysis, randomizing the mapping between genes and modules 50,000 times to approximate null distributions. P values for each chromosome-module pairing were then FDR corrected. To do this as accurately as possible, we performed this analysis using revised chromosomal assignments and structure from the most recent zebra finch genome assembly, bTaeGut_1.4.pri, which better resolves the microchromosomes39 (see Fig. S6 for the less resolved bTaeGut_W.pat.v2 assembly). Surprisingly, we found that each of the 12 modules were enriched for genes from at least one chromosome (Fig. 3c). The most striking enrichment was of module E genes on the Z and W sex chromosomes, with nearly all expressed W genes and ∼⅔ of expressed Z genes being members of module E. This is consistent with module E exhibiting higher expression in the male telencephalon regardless of treatment or region (Figs. 2d-e). For the autosomes, we observed significant enrichments of two HVC specialized gene modules on chromosome 1A; module B (specialized in male HVC and human dLMC); and module G (specialized in male HVC and E2-treated female HVC and sexually dimorphic in vehicle-treated HVC). These results are particularly interesting given that zebra finch chromosomes 1 and 1A are believed to be the result of a songbird-specific fragmentation of the ancestral chromosome 1 found in chickens40,41.
Module A, the largest module which was highly expressed in Area X and adjacent striatum, was enriched across 5 macro-chromosomes (chr1, 2, 3, 4, and 10). Module F, which was strongly expressed in LMAN, was enriched on chromosomes 2 and 7. Modules C (most strongly expressed in the arcopallium and part of RA and Area X specializations) and I (a component of LMAN specialization) was enriched on chromosome 6 on microchromosome 35. Module D, a component of the LMAN specialization in E2-treated females, was enriched across 8 small microchromosomes (chr22, 23, 25, 27, 28, 29, 30, 31). 4 of the 5 smallest modules in gene counts were enriched in the microchromosomes: Module H, which was sexually dimorphic only in DN, was enriched on microchromosome 28; module K was significantly enriched on microchromosome 37 and in the “other” category, which includes all remaining unnamed DNA scaffolds in the assembly such as further microchromosomes; module J was also enriched in “other”.
Sex- and micro-chromosome gene expression across the telencephalon
The telencephalon-wide, sexually dimorphic expression of the sex chromosome dominated module E is one of the strongest features in the present dataset; Module E eigengene is highly expressed in all male samples and depleted in all female samples regardless of brain region or treatment (Fig. 4a vs 4c-d). To better understand the relationship between the sex chromosomes and module E, we examined the distribution of membership in module E with the sex chromosomes separated. WGCNA allows us to consider gene membership in a module as a continuous variable, rather than a binary variable, by correlating each gene’s expression profile to the module eigengene. Doing this for module E, we found that Z transcripts were positively correlated to the module eigengene while W transcripts were anticorrelated (Fig. 4e). This is consistent with Z chromosome transcripts being generally depleted in females relative to males, while W chromosome transcripts were only expressed in female brains. This reduction in Z chromosome transcript abundance is consistent with the finding that diploid Z chromosomes in male birds do not have one copy inactivated to compensate for gene dosage, unlike the X inactivation in female mammals30,42.
Given the sexually dimorphic expression of module E across brain regions and the enrichment of the sex chromosomes within that module, we separately compared the expression of sex chromosome genes in the non-vocal motor surrounds of vehicle-treated control males and females regardless of WGCNA assignment to better understand sex chromosome expression without the influence of vocal learning specialization or E2 response. To do this, we correlated expression of each sex chromosome gene to the sexual dimorphism, such that female-enriched transcripts were positively correlated and male-enriched transcripts were anticorrelated (Fig. 4f,g). In these gene sets, we identified between 73 and 82 significantly expressed W chromosome genes and between 433 and 527 significantly depleted Z chromosome genes for each of the surround regions in males relative females. Examining the union of these gene sets revealed that a total of 95 W and 694 Z chromosome genes were differentially expressed between sexes in at least one brain region, 62% and 65% of annotated sex chromosome genes respectively. Conversely, the intersection of these regional gene sets contained 58 significantly expressed W chromosome genes and 306 significantly depleted Z chromosome genes in all non-vocal regions in females, representing 38% and 29% of annotated sex chromosome genes, respectively (Fig. 4h-i, Table S2). This indicates that there is a telencephalon wide set of sex enriched and depleted sex chromosome genes in addition to regionally patterned sex chromosome gene expression.
We also observed strong enrichments of several of the smaller gene modules, J and K, on microchromosomes. Examining the MEG expression for these modules, we found that both were expressed brainwide in non-overlapping sets of animals. Module J was highly expressed in all samples taken from animal-b, an E2-treated female (Fig. 4b); similarly, module K was most highly expressed across all samples from animal-g, an E2-treated male, and exhibited intermediate expression in all samples from animal-f, a vehicle-treated female (Fig. 4c). Though not enriched on microchromosomes, Module L exhibited similar animal-specific high expression in animal-a, an E2 treated female (Fig. 4d). These animal and chromosome specific shifts in the transcriptomes could represent the systemic effects of allelic chromosomal structural variation which WGCNA was able to capture due to our broad sampling of regions across the telencephalon.
Modeling vocal learning and sex chromosome module interactions
As it is unlikely that the gene modules act independently of each other, we sought putative interacting genes between two modules of interest, module G specialized in HVC and Area X and module E dominated by sex chromosome genes. To do this, we again replaced binary in-or-out module membership with continuous module membership by correlating each genes’ expression to the relevant module eigengene35. This allowed us to quantify the extent to which any gene was associated with any module, regardless of initial assignment33. Looking across all assigned genes for our modules of interest, we identified two outlier genes, PDE8B and HABP4, which were the most E associated gene assigned to module G and the most G associated gene assigned to module E, respectively (Fig. 4j). Both genes were significantly upregulated in male HVC relative to its surround, but not in female HVC of either treatment. PDE8B catalyzes the hydrolysis of the second messenger cAMP and mutations to the gene cause an autosomal dominant form of striatal degeneration in humans43. HABP4 is an RNA binding protein, known to repress the expression and subsequent DNA binding of MEF2C44, a Z chromosome transcription factor which has undergone accelerated evolution in songbirds10 and whose repression by FOXP2 is critical for cortico-striatal circuit formation in mice related to vocal behaviors45. Both PDE8B and HABP4 are found on the Z chromosome and HABP4 was one of the 694 Z transcripts significantly depleted across the telencephalon in vehicle-treated females relative to males (Table S2). These findings indicate that Z chromosome genes may be subject to multiple gene regulatory programs; both the telencephalon wide depletion driven by reduced copy number and specialized upregulation in HVC for vocal learning behavior.
Candidate gene drivers of HVC specialization in E2-treated females
To reduce the 344 genes in module G to the putative drivers of masculine HVC development, we next examined the relationship of continuous membership in module G (correlation to MEG-G) to gene expression specialization in HVC at the level of single genes. We did this by correlating gene expression to the specialization of HVC in males (vehicle- and E2-treated) or E2-treated females for each of the following four vocal learning comparisons: 1) male HVC specialization relative to the surround; 2) E2-treated female HVC specialization relative to the surround; 3) HVC sexual dimorphism in vehicle-treated controls; and 4) E2-treated female HVC specialization relative to vehicle-treated female HVC. For all four comparisons, we found the higher the correlation of module G genes to the MEG, the higher the correlation with the vocal learning specialization (Fig. 5a-d). This means that their expression was higher in male or E2-treated female HVC relative to the appropriate non-vocal learning controls across all comparisons. We identified genes of interest as being strongly correlated (r2 ≥ 0.5) to both the module G eigengene and vocal learning specialization (Fig. 5e-h, higher magnification view of colored boxes in Fig. 5a-d). All genes of interest exhibited a positive correlation to vocal learning specialization across all four comparisons (Fig. 5a-d). We next generated a core gene list for module G involvement in E2- and sex-dependent HVC development by intersecting these four vocal learning specialized gene sets (Fig. 5i). We found a core set of 14 genes that strongly marked vocal learning capable HVC in all comparisons and strongly correlated with the aggregate of module G expression (Table 3, Fig. S6).
We examined the chromosomes of origin for these 14 core genes and found that three (GHR, RGS7BP, and THBS4) were on the Z sex chromosome (Fig. 5j). This was a >3-fold significant enrichment over chance of Z chromosome transcripts (p = 0.009, upper-tailed hypergeometric test) against a background of module assigned genes. This result was statistically significant regardless of the background gene set, when using all genes (∼21,000) or only module G members (344). These three Z chromosome module G genes in HVC not only exhibited >50% reduced expression in control female HVC relative to males, but also exhibited upregulated expression in male HVC relative to the surrounding DN regardless of treatment and upregulation in E2-treated female HVC relative to either the surround or vehicle-treated female HVC (Fig. S7). In the context of our cross-region sex chromosome analysis (Fig. 3c-f), only RGS7BP was sexually dimorphic outside of the song system, being depleted in all female surround regions. Neither GHR (growth hormone receptor) nor THBS4 were significantly depleted in any surround region comparison. Taken together, these results indicate that the 3 core genes in vocal learning capable HVC on the Z chromosome are subject to additional E2 sensitive transcriptional regulation in HVC, separate from the Z chromosome transcript depletion seen throughout the female brain.
Of the 14 core genes total, several have been previously studied in the brains of other species which may inform their role in vocal learning systems. THBS4 encodes a secreted extracellular-matrix glycoprotein necessary for appropriate neuronal migration in the mouse46 and is elevated 6-fold in the human cortex compared with non-vocal learning primates47. Human EDA2R was recently identified as a top correlate of cognitive performance and brain size48; it was also found in a human GWAS study to correlate with circulating estrogen and testosterone levels49. Rare mutations in human PHETA1 lead to Lowe oculocerebrorenal syndrome, that includes pathophysiology in seizures, mental retardation, and structural brain abnormalities50,51. SIX2 is a homeobox domain containing transcription factor that governs early brain and craniofacial development and provides neuroprotection from dopamine injury52,53. GHR encodes a transmembrane receptor whose activation controls cell division54. The gene which encodes GHR’s ligand, growth hormone, is interestingly duplicated and undergoing accelerated evolution in the genomes of songbirds, including the zebra finch55. Based on these findings, we consider GHR as the most likely candidate gene related to the E2 sensitive atrophy of HVC in females. Based on the findings in these gene sets, we hypothesize that without excess estrogen in females, HVC expansion is prevented by not specializing the growth and neuronal migration promoting genes in module G to the HVC lineage by late development. This is potentially enacted by depleting necessary gene products from the Z sex chromosome, such as GHR, which are already present in only one copy.
The present study seeks to further our understanding of sexually dimorphic vocal learning in zebra finches by comparing the gene expression specializations in the song system between male and female birds at PHD30, during the critical period of atrophy for the female vocal circuit. The birds were given either a vehicle or E2 from hatching, which induces rudimentary vocal learning behavior in females where it would otherwise be absent. HVC appeared unspecialized at the level of gene module expression in control females, while in E2-treated females HVC exhibited a subset of the observed male HVC gene expression specialization. Similarly, Area X appeared unspecialized in vehicle-treated females, but exhibited a subset of the male Area X specialization in E2-treated females, consistent with the known absence of Area X in vehicle-treated females and presence in E2-treated females. This contrasts with RA and LMAN which were similarly specialized in males and females in the absence of E2-treatment. Given that lesions of HVC prevent the emergence of Area X in E2-treated females28, these results support a model of zebra finch development where transcriptomic masculinization of female HVC by E2 is a critical event which facilitates the emergence of female Area X and ultimately endows these females their rudimentary song. When examining the module G HVC specialization induced by E2-treatment in female HVC, we surprisingly found that the most specialized genes were disproportionately from the Z chromosome.
How did E2 treatment produce the transcriptomic effects we observed with module G in female HVC and what are the implications regarding sexually dimorphic vocal learning in the zebra finch? One possibility is that this process begins with increased estrogen receptor (ER) activity within developing HVC cells after being provided surplus activating ligand56, followed by altered transcription of ER targets in the genome. Module G contained the androgen receptor (Table S1), which is believed to be a major downstream effector of E2 response in female zebra finches29. This initial transcriptional loading of the system would then have been processed by gene regulatory networks within each cell, spreading in effect through the transcriptome. It is possible that differential module G expression arose purely from traditional gene regulatory networks, where transcription factors form complex, elaborate feedback networks with themselves and the genes they regulate. However, this framework fails to explain why the core 14 vocal learning correlated genes from module G in HVC were enriched for transcripts from a single chromosome: the Z sex chromosome with halved copy number in females.
We hypothesize that the Z chromosome genes identified here are co-regulated, necessary components of a proliferative transcriptional program downstream of ER, represented by module G. The module G proliferative program could be specialized to developing male HVC by the expression of patterning genes, such as SIX2, early in development and maintained through persistent GHR signaling. We propose that these Z chromosome transcripts in module G are depleted in females by lower haplotype dosage during development and thus fail to specialize female HVC. Due to lower abundance of gene products from module G, the female HVC could be less proliferative during juvenile development and fail to facilitate emergence of its downstream target Area X. Similarly, without a sufficiently developed HVC, RA lacks one of its major inputs and may subsequently atrophy. We further propose that E2 masculinizes female song behavior by increasing the abundance of these module G transcripts in HVC, increasing specialized HVC proliferation, and facilitating emergence of HVC’s other major target, Area X (Fig 6). This model of sex chromosome influenced song system development is consistent with recent work comparing male and female zebra finch transcriptomes from RA at young juvenile (PHD20) and young adult (PHD50) ages in un-manipulated birds, where they found that sexually dimorphic Z chromosome expression at juvenile ages preceded the dimorphic expression in the autosomes seen in adults57.
Our results help refine the traditional notion that hormonal signaling organizes the brain to produce sexually dimorphic behaviors independent of neuronal sex chromosome content58,59. Instead, our data indicate that sexual dimorphism in zebra finch song learning ability was likely established by the interaction of sex hormone signaling and sex chromosome gene expression within HVC during development. These findings are thematically similar to work in the Four Core Genotypes mouse model60, where chromosomal and gonadal sex are separable by translocating the sex determining Sry gene. In these mice, sex chromosome composition regulates sexually dimorphic brain gene expression, circuit anatomy, and behaviors60–63, though the sex chromosome genes responsible remain unknown. Additional experiments manipulating the candidate genes implicated here in developing HVC to both mimic and prevent the action of E2 in female zebra finches are needed to test these hypotheses.
In addition to these vocal learning focused results, performing unbiased gene network analysis with samples taken throughout the zebra finch telencephalon in both sexes revealed a surprisingly strong relationship between the modular organization of telencephalic gene expression and the chromosomal structure of the genome. Each of the 12 modules identified by WGCNA were enriched for genes from at least one chromosome. Of these significant enrichments, those from on the micro- and sex-chromosomes stood out as particularly strong. Module E encompassed the majority of W and Z chromosome genes and was sexually dimorphic in its expression throughout the telencephalon. Indeed, we found that roughly ⅓ of sex chromosome transcripts are significantly sexually dimorphic in all non-vocal brain regions, and roughly ⅔ are sexually dimorphic in at least one region. This half-brain-wide and half-regionally-patterned sexual dimorphism observed from the sex chromosomes may act as a substrate for the evolution of sexually dimorphic behaviors generally, with brain-wide shifts in expression providing a transcriptional background for evolution of sex-specific patterning in additional sex chromosome genes. These regionally patterned sex chromosome genes could then recruit gene expression networks from somatic chromosomes, resulting in the sex specific anatomical patterning of gene expression from throughout the genome. This general model is consistent with local depletion of GHR and other Z chromosome transcripts in the developing HVC lineage, leading to a loss of specialized module G expression from somatic chromosomes. This hypothesis of an interaction between sex chromosomes and autosomes linked to the gain and loss of vocal learning in one sex, can be tested in future studies.
Funding for this work was provided by the Howard Hughes Medical Institute (EDJ), NIH-NIDCD R01-DC016224 (HM), and the NSF-GRFP (MHD). We thank Gregory Gedman, Giulio Formenti, Caitlin Gilbert, Lindsey Cantin, César Vargas, Chul Lee, and Jason Manley for conversations during visualization and analysis. We also thank Alipasha Vaziri and Tobias Nöbauer for providing the computing infrastructure used throughout. This analysis is a memorial to Mark Konishi (1933-2020) whose work on this topic influenced us greatly.
Materials & Correspondence
Correspondence to Matthew H. Davenport and Erich D. Jarvis
HM has received royalties from Chemcom. HM has received research grants from Givaudan. HM has received consultant fees from Kao.
Materials and Methods
Animal Handling and Sample Preparation
The 96 samples used in the present analysis are the E2 or vehicle treated subset of a previously published RNAseq dataset25. We briefly re-describe our methodology here. All animal procedures were approved by the IACUC of Duke University.
E2 (Sigma E1024-1G) was dissolved in DMSO (100mg/mL) and then diluted in olive oil (1mg/mL). 30-50uL of E2 sample or DMSO only vehicle was applied to the flank of male and female zebra finches daily from PHD0-PHD14 and on alternating days from PHD15-30 (n=3 per sex-treatment combination). We have previously shown that this treatment program is sufficient to induce song system masculinization in E2 treated female zebra finches25.
On PHD30, animals were sacrificed following one hour of dark isolation. Animals were anesthetized by isoflurane inhalation and rapidly decapitated. Brain hemispheres were dissected, embedded in OCT, and flash frozen in an ethanol and dry ice slurry. Sections were taken from the right hemisphere coronally at 14um onto polyethylene naphthalate (PEN) membrane slides for RNA isolation and adjacent sections taken on charged glass slides for histology or in-situ hybridizations. From the PEN membrane slides, song nuclei and surrounding control regions were laser capture microdissected (LCM) using an ArcturusXT LCM system (Nikon) guided by a Nissl stained tissue series for each animal. No sample pooling was performed, each sample originated from a single bird. This is 8 samples worth of RNAseq data per bird for 12 birds, providing 3 samples per sex-treatment-region combination, roughly a terabyte of read data.
RNA was extracted from the LCM isolated tissue samples using the Arcturus Picopure kit (Applied Biosystems KIT0204) following manufacturer’s instructions. RNA quality was assessed using an Agilent 2100 bio-analyzer and the RNA 6000 pico kit (Agilent 5067-1513). Next, cDNA was synthesized using the SMART-Seq v4 Ultra Low input RNA Kit (Takara 634892). Sequencing libraries were made with the NEBNext Ultra II DNA Library Prep kit (New England Biolabs E7645L) and cleaned-up using SPRIselect beads (Beck-man Coulter B23317). Libraries were sequenced by Novogene Co., Ltd. on the Novaseq 6000 platform (Illumina) and S4 flow cells resulting in 150bp paired-end reads.
RNAseq read mapping and quality control
RNAseq reads were first trimmed to remove adapters and low quality base calls using Trimmomatic64 and then mapped to a high-quality Vertebrate Genomes Project (VGP) female zebra finch nuclear genome (bTaeGut2.pat.W.v2, GCF_008822105.2)34 using STAR (v2.7.1)65. Uniquely mapped reads were then tallied at the level of genes using Rsubread::featureCounts (R-3.6.3)and then counts normalized to fragments per kilobase of transcript per million mapped reads66. Multi-mapped reads were rejected as they have a higher probability of being associated with technical artifacts of sequencing or genome assembly. Read based quality control was performed with FastQC (Babraham Bioinformatics) with reports prepared by MultiQC (Python-3.5.5)67. This workflow was automated by the CountMatrix pipeline (https://github.com/mattisabrat/CountMatrix/). We next removed two outlier samples (one male vehicle HVC and one female vehicle RA) based upon hierarchical clustering of the sample space before computing gene-to-gene correlations (Fig. S1).
Gene Module Identification
All remaining analyses were completed in R-4.2 unless otherwise specified. Data was wrangled in the tidyverse, and custom visualizations produced with ggplot, ggdendro, VennDiagram, RColorBrewer, and ggpubr68,69. Unsigned topological overlaps between genes (gene-to-gene correlations) were calculated in a single block with WGNCA::blockwiseModules with a soft thresholding power (β) of 6 based on scale free topology fit and model connectedness as described in the WGNCA vignette (Fig. S2)35. Next we determined an appropriate WGCNA parameterization quantitatively and qualitatively by sweeping the module size and tree cut height parameters of WGNCA::recutBlockwiseTrees. We selected our model for analysis by examining the resulting gene assignments and sample-sample distance matrices. We were able to increase or decrease the proportion of genes assigned to WGCNA modules by lowering or raising the minimum module size parameter, respectively. We found that setting the minimum size parameter below 100 genes included more genes, but with models that increasingly over-fit single samples, producing obvious outliers in the distance matrix (Fig. S3). Correspondingly, raising the minimum size parameter beyond 100 included fewer genes, but did not greatly reduce the number of outlier samples. Based on this we selected 100 as the minimum module size, parameterizing to explain as much transcriptomic variance as possible while minimizing the number of technically overfit samples (Fig. S2-4).
Module association to vocal learning
Module eigengenes (MEGs) from each module were correlated against binarized song system membership, vocal learning capability, or sex and the statistical significance of each correlation assessed using WGCNA::corPvalueStudent. This was done in the following sample subsets by node: male samples broken out by treatment; female samples broken out by treatment; all female samples; song system components from either sex treated with vehicle; and surrounding control regions from either sex treated with vehicle. Within each of the 4 sex-treatment combinations we compared the song system components to surrounds at each node. Within all female samples we compared the vocal learning capable E2-treated song system elements to all other female samples from each node. Within the vehicle song systems and vehicle surrounds we compared between male and female finches for each region.
Module gene ontology and convergent vocal learning gene expression signature enrichment
Module assigned zebra finch genes were mapped to their 1:1 human orthologs where possible, dropping unmapped or multi-mapped genes, using orthofindR::getOrthos (https://github.com/ggedman/orthofindR) which wraps Ensembl’s biomaRt. Uncorrected p values for the enrichment of human gene ontology terms within the human orthologs of module G were calculated using generally applicable gene set enrichment (GAGE) implemented in gage::gage70. To determine if the genes previously shown as convergently differentially expressed in the zebra finch song system and human speech brain regions mapped to specific modules, we treated these five gene lists identically to GO terms and tested for their significant enrichment across the human orthologs of each module also using GAGE.
Analysis of sex chromosome gene expression independent of vocal learning or E2 treatment
Sex chromosome transcripts, regardless of WGCNA module assignment, were examined in the vehicle-treated non-vocal learning producing surround samples for each node. To find the most consistently expressed and depleted W and Z chromosome genes respectively, we correlated expression of each sex chromosome transcript with sexual dimorphism within each region, such that expressed W genes would be positively correlated and depleted Z chromosome genes would be anticorrelated. We computed correlations and p values using the WGCNA corAndP function; upper-tailed for the W chromosome and lower-tailed for the Z chromosome. Genes significantly expressed or depleted across regions were then intersected to identify consistently regulated transcripts across the telencephalon.
Module enrichment on chromosomes
To associate modules to chromosomes, we bootstrapped FDR corrected p values for the enrichment of each chromosome-module pairing by randomizing the mapping of genes to modules 50k times and calculating the enrichments observed on each chromosome from each module in each randomization to empirically determine the null distributions. The calculation of bootstraps and p values was performed in Python-3.5.5 and parallelized using joblib’s Parallel.
Identification of core module G genes in HVC and Z chromosome enrichment
We defined genes of interest as the subset of significantly (T test for correlation: p≤0.05) vocal learning capability correlated genes in HVC whose expression correlated to module eigengene G (MEG-G) across the dataset at r2 ≥ 0.5 and to vocal learning at r2 ≥ 0.5 in at least one of the four vocal learning comparisons in HVC, calculated using WGCNA::corAndP. These comparisons were: all male HVC samples against all male DN samples; female E2 treated HVC against all other female samples at the node, including vehicle treated HVC; Vehicle treated male HVC against vehicle treated female HVC; and E2 treated female HVC against vehicle treated female HVC. We defined core genes as those meeting this criteria for all four HVC vocal learning comparisons. We tested the statistical significance of Z chromosome enrichment in this core gene list with an upper-tailed hypergeometric test, implemented in phyper, where each core gene is a sampling event without replacement from module assigned genes.
Data and Code Availability
All raw data for this experiment is available on the NCBI Sequence Read Archive (accession: PRJNA698257). The count matrix, quality control results, and analysis code is available online (https://github.com/mattisabrat/sex-and-song).
- 1.Evolution of vocal learning and spoken languageScience 366:50–54
- 2.A molecular timescale for vertebrate evolutionNature 392:917–920
- 3.Error in estimation of rate and time inferred from the early amniote fossil record and avian molecular clocksJ. Mol. Evol 59:267–276
- 4.Twitter evolution: converging mechanisms in birdsong and human speechNat. Rev. Neurosci 11:747–759
- 5.Birdsong and human speech: common themes and mechanismsAnnu. Rev. Neurosci 22:567–631
- 6.Neural mechanisms for learned birdsongLearn. Mem 16:655–669
- 7.Convergent transcriptional specializations in the brains of humans and song-learning birdsScience 346
- 8.Molecular mapping of movement-associated areas in the avian brain: a motor theory for vocal learning originPLoS One 3
- 9.Songbird brain organization and its molecular convergence with humans for vocal imitation learning
- 10.Positive selection in non-coding genomic regions of vocal learning birds is associated with genes implicated in vocal learning and speech functions in humansGenome Res
- 11.Sexual dimorphism in vocal control areas of the songbird brainScience 194:211–213
- 12.The zebra finch: a synthesis of field and laboratory studiesChoice 34:34–34
- 13.Ontogeny of brain nuclei controlling song learning and behavior in zebra finchesJ. Neurosci 5:1556–1562
- 14.Calbindin expression in developing striatum of zebra finches and its relation to the formation of area X:326–341https://doi.org/10.1002/cne.23174
- 15.Neuronal growth, atrophy and death in a sexually dimorphic song nucleus in the zebra finch brainNature 315:145–147
- 16.Divergent and parallel development in volume sizes of telencephalic song nuclei in male and female zebra finchesJ. Comp. Neurol 375:445–456
- 17.Sex and regional differences in the incorporation of neurons born during song learning in zebra finchesJ. Neurosci 8:2869–2874
- 18.Female zebra finches do not sing yet share neural pathways necessary for singing in malesJ. Comp. Neurol 527:843–855
- 19.Estrogen synthesis in the male brain triggers development of the avian song control pathway in vitroNat. Neurosci 4:170–175
- 20.Waiting periods versus early innervation: the development of axonal connections in the zebra finch song systemJ. Neurosci 14:6532–6543
- 21.Hormone-induced sexual differentiation of brain and behavior in zebra finchesScience 208:1380–1383
- 22.Behavioral correlates of sexual differentiation in the zebra finch song systemBrain Research 231:153–172https://doi.org/10.1016/0006-8993(82)90015-4
- 23.Early estrogen treatment alone causes female zebra finches to produce learned, male-like vocalizationsJ. Neurobiol 22:755–776
- 24.Early estrogen treatment of female zebra finches masculinizes the brain pathway for learned vocalizationsJ. Neurobiol 22:777–793
- 25.Estrogen and sex-dependent loss of the vocal learning system in female zebra finchesHorm. Behav 129
- 26.The correlation between the degree of brain masculinization and song quality in estradiol treated female zebra finchesBrain Res 336:381–383
- 27.Hormonal determination of song capacity in females of the zebra finch: Critical phase of TreatmentZ. Tierpsychol 64:330–336
- 28.Lesions of HVc block the developmental masculinizing effects of estradiol in the female zebra finch song systemJ. Neurobiol 22:29–39
- 29.Estrogen establishes sex differences in androgen accumulation in zebra finch brainJ. Neurosci 6:734–738
- 30.Neural, not gonadal, origin of brain sex differences in a gynandromorphic finchProc. Natl. Acad. Sci. U. S. A 100:4873–4878
- 31.Brenner’s Encyclopedia of Genetics (Second Edition):397–400
- 32.A critical period for estrogen action on neurons of the song control system in the zebra finchProc. Natl. Acad. Sci. U. S. A 85:7006–7007
- 33.Female song is widespread and ancestral in songbirdsNat. Commun 5
- 34.Towards complete and error-free genome assemblies of all vertebrate speciesNature 592:737–746
- 35.WGCNA: an R package for weighted correlation network analysisBMC Bioinformatics 9
- 36.As above, so below: Whole transcriptome profiling demonstrates strong molecular similarities between avian dorsal and ventral pallial subdivisionsJ. Comp. Neurol https://doi.org/10.1002/cne.25159
- 37.A forkhead-domain gene is mutated in a severe speech and language disorderNature 413:519–523
- 38.FoxP2 regulation during undirected singing in adult songbirdsJ. Neurosci 26:7390–7394
- 39.False gene and chromosome losses affected by assembly and sequence errorsbioRxiv 2021.04.09.438906 https://doi.org/10.1101/2021.04.09.438906
- 40.Chromosomal polymorphism and comparative painting analysis in the zebra finchChromosome Res 13:47–56
- 41.The genome of a songbirdNature 464:757–762
- 42.Dosage compensation is less effective in birds than in mammalsJ. Biol 6
- 43.Autosomal-Dominant Striatal Degeneration Is Caused by a Mutation in the Phosphodiesterase 8B Gene:83–87https://doi.org/10.1016/j.ajhg.2009.12.003
- 44.MEF2C DNA-binding activity is inhibited through its interaction with the regulatory protein Ki-1/57FEBS Lett 579:2615–2622
- 45.Foxp2 controls synaptic wiring of corticostriatal circuits and vocal communication by opposing Mef2cNat. Neurosci 19:1513–1522
- 46.Thrombospondin 4 deficiency in mouse impairs neuronal migration in the early postnatal and adult brainMol. Cell. Neurosci 61:176–186
- 47.Increased cortical expression of two synaptogenic thrombospondins in human brain evolutionCereb. Cortex 17:2312–2321
- 48.Neurology-related protein biomarkers are associated with cognitive ability and brain volume in older ageNat. Commun 11
- 49.Using human genetics to understand the disease impacts of testosterone in men and womenNat. Med 26:252–258
- 50.The oculo-cerebral-renal syndrome of LoweArch. Neurol 32:103–107
- 51.Deficiency in the endocytic adaptor proteins PHETA1/2 impairs renal and craniofacial developmentDis. Model. Mech 13
- 52.Combinatorial activity of Six1-2-4 genes in cephalic neural crest cells controls craniofacial and brain developmentCell. Mol. Life Sci 71:2149–2164
- 53.Transcription factor Six2 mediates the protection of GDNF on 6-OHDA lesioned dopaminergic neurons by regulating Smurf1 expressionCell Death Dis 7
- 54.The Growth Hormone Receptor: Mechanism of Receptor ActivationCell Signaling, and Physiological Aspects. Front. Endocrinol 9
- 55.Duplication of accelerated evolution and growth hormone gene in passerine birdsMol. Biol. Evol 25:352–361
- 56.Androgen and estrogen sensitivity of bird song: a comparative view on gene regulatory levelsJ. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol 204:113–126
- 57.Emergence of sex-specific transcriptomes in a sexually dimorphic brain nucleusCell Rep 40
- 58.Organizing action of prenatally administered testosterone propionate on the tissues mediating mating behavior in the female guinea pigEndocrinology 65:369–382
- 59.The organizational-activational hypothesis as the foundation for a unified theory of sexual differentiation of all mammalian tissuesHorm. Behav 55:570–578
- 60.A model system for study of sex chromosome effects on sexually dimorphic neural and behavioral traitsJ. Neurosci 22:9005–9014
- 61.X chromosome number causes sex differences in gene expression in adult mouse striatumEur. J. Neurosci 29:768–776
- 62.Sex chromosome complement affects social interactions in miceHorm. Behav 54:565–570
- 63.Sex differences in juvenile mouse social behavior are influenced by sex chromosomes and social contextGenes Brain Behav 10:465–472
- 64.Trimmomatic: a flexible trimmer for Illumina sequence dataBioinformatics 30:2114–2120
- 65.STAR: ultrafast universal RNA-seq alignerBioinformatics 29:15–21
- 66.featureCounts: an efficient general purpose program for assigning sequence reads to genomic featuresBioinformatics 30:923–930https://doi.org/10.1093/bioinformatics/btt656
- 67.MultiQC: summarize analysis results for multiple tools and samples in a single reportBioinformatics 32:3047–3048
- 68.Welcome to the tidyverseJ. Open Source Softw 4
- 69.ggplot2: Elegant Graphics for Data Analysis
- 70.GAGE: generally applicable gene set enrichment for pathway analysisBMC Bioinformatics 10