Automated annotation of birdsong with a neural network that segments spectrograms
Abstract
Songbirds provide a powerful model system for studying sensory-motor learning. However, many analyses of birdsong require time-consuming, manual annotation of its elements, called syllables. Automated methods for annotation have been proposed, but these methods assume that audio can be cleanly segmented into syllables, or they require carefully tuning multiple statistical models. Here we present TweetyNet: a single neural network model that learns how to segment spectrograms of birdsong into annotated syllables. We show that TweetyNet mitigates limitations of methods that rely on segmented audio. We also show that TweetyNet performs well across multiple individuals from two species of songbirds, Bengalese finches and canaries. Lastly, we demonstrate that using TweetyNet we can accurately annotate very large datasets containing multiple days of song, and that these predicted annotations replicate key findings from behavioral studies. In addition, we provide open-source software to assist other researchers, and a large dataset of annotated canary song that can serve as a benchmark. We conclude that TweetyNet makes it possible to address a wide range of new questions about birdsong.
Data availability
Datasets of annotated Bengalese finch song are available at:https://figshare.com/articles/Bengalese_Finch_song_repository/4805749https://figshare.com/articles/BirdsongRecognition/3470165Datasets of annotated canary song are available at:https://doi.org/10.5061/dryad.xgxd254f4Model checkpoints, logs, and source data files are available at:http://dx.doi.org/10.5061/dryad.gtht76hk4Source data files for figure are in the repository associated with the paper:https://github.com/yardencsGitHub/tweetynet(version 0.4.3, 10.5281/zenodo.3978389).
-
Song recordings and annotation files of 3 canaries used to evaluate training of TweetyNet models for birdsong segmentation and annotationDryad Digital Repository, doi:10.5061/dryad.xgxd254f4.
-
Model checkpoints, logs, and source data filesDryad Digital Repository, doi:10.5061/dryad.gtht76hk4.
-
Bengalese Finch song repository.Figshare, https://doi.org/10.6084/m9.figshare.4805749.v6.
-
BirdsongRecognition.Figshare, https://doi.org/10.6084/m9.figshare.3470165.v1.
Article and author information
Author details
Funding
National Institute of Neurological Disorders and Stroke (R01NS104925)
- Alexa Sanchioni
- Emily K Mallaber
- Viktoriya Skidanova
- Timothy J Gardner
National Institute of Neurological Disorders and Stroke (R24NS098536)
- Alexa Sanchioni
- Emily K Mallaber
- Viktoriya Skidanova
- Timothy J Gardner
National Institute of Neurological Disorders and Stroke (R01NS118424)
- Timothy J Gardner
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Ethics
Animal experimentation: All procedures were approved by the Institutional Animal Care and Use Committees of Boston University (protocol numbers 14-028 and 14-029). Song data were collected from adult male canaries (n = 5). Canaries were individually housed for the entire duration of the experiment and kept on a light-dark cycle matching the daylight cycle in Boston (42.3601 N). The birds were not used in any other experiments.
Copyright
© 2022, Cohen et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 4,754
- views
-
- 519
- downloads
-
- 50
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
Two-photon (2P) fluorescence imaging through gradient index (GRIN) lens-based endoscopes is fundamental to investigate the functional properties of neural populations in deep brain circuits. However, GRIN lenses have intrinsic optical aberrations, which severely degrade their imaging performance. GRIN aberrations decrease the signal-to-noise ratio (SNR) and spatial resolution of fluorescence signals, especially in lateral portions of the field-of-view (FOV), leading to restricted FOV and smaller number of recorded neurons. This is especially relevant for GRIN lenses of several millimeters in length, which are needed to reach the deeper regions of the rodent brain. We have previously demonstrated a novel method to enlarge the FOV and improve the spatial resolution of 2P microendoscopes based on GRIN lenses of length <4.1 mm (Antonini et al., 2020). However, previously developed microendoscopes were too short to reach the most ventral regions of the mouse brain. In this study, we combined optical simulations with fabrication of aspherical polymer microlenses through three-dimensional (3D) microprinting to correct for optical aberrations in long (length >6 mm) GRIN lens-based microendoscopes (diameter, 500 µm). Long corrected microendoscopes had improved spatial resolution, enabling imaging in significantly enlarged FOVs. Moreover, using synthetic calcium data we showed that aberration correction enabled detection of cells with higher SNR of fluorescent signals and decreased cross-contamination between neurons. Finally, we applied long corrected microendoscopes to perform large-scale and high-precision recordings of calcium signals in populations of neurons in the olfactory cortex, a brain region laying approximately 5 mm from the brain surface, of awake head-fixed mice. Long corrected microendoscopes are powerful new tools enabling population imaging with unprecedented large FOV and high spatial resolution in the most ventral regions of the mouse brain.
-
- Neuroscience
Concurrent verbal working memory task can eliminate the color-word Stroop effect. Previous research, based on specific and limited resources, suggested that the disappearance of the conflict effect was due to the memory information preempting the resources for distractors. However, it remains unclear which particular stage of Stroop conflict processing is influenced by working memory loads. In this study, electroencephalography (EEG) recordings with event-related potential (ERP) analyses, time-frequency analyses, multivariate pattern analyses (MVPAs), and representational similarity analyses (RSAs) were applied to provide an in-depth investigation of the aforementioned issue. Subjects were required to complete the single task (the classical manual color-word Stroop task) and the dual task (the Sternberg working memory task combined with the Stroop task), respectively. Behaviorally, the results indicated that the Stroop effect was eliminated in the dual-task condition. The EEG results showed that the concurrent working memory task did not modulate the P1, N450, and alpha bands. However, it modulated the sustained potential (SP), late theta (740–820 ms), and beta (920–1040 ms) power, showing no difference between congruent and incongruent trials in the dual-task condition but significant difference in the single-task condition. Importantly, the RSA results revealed that the neural activation pattern of the late theta was similar to the response interaction pattern. Together, these findings implied that the concurrent working memory task eliminated the Stroop effect through disrupting stimulus-response mapping.