Automated annotation of birdsong with a neural network that segments spectrograms
Abstract
Songbirds provide a powerful model system for studying sensory-motor learning. However, many analyses of birdsong require time-consuming, manual annotation of its elements, called syllables. Automated methods for annotation have been proposed, but these methods assume that audio can be cleanly segmented into syllables, or they require carefully tuning multiple statistical models. Here we present TweetyNet: a single neural network model that learns how to segment spectrograms of birdsong into annotated syllables. We show that TweetyNet mitigates limitations of methods that rely on segmented audio. We also show that TweetyNet performs well across multiple individuals from two species of songbirds, Bengalese finches and canaries. Lastly, we demonstrate that using TweetyNet we can accurately annotate very large datasets containing multiple days of song, and that these predicted annotations replicate key findings from behavioral studies. In addition, we provide open-source software to assist other researchers, and a large dataset of annotated canary song that can serve as a benchmark. We conclude that TweetyNet makes it possible to address a wide range of new questions about birdsong.
Data availability
Datasets of annotated Bengalese finch song are available at:https://figshare.com/articles/Bengalese_Finch_song_repository/4805749https://figshare.com/articles/BirdsongRecognition/3470165Datasets of annotated canary song are available at:https://doi.org/10.5061/dryad.xgxd254f4Model checkpoints, logs, and source data files are available at:http://dx.doi.org/10.5061/dryad.gtht76hk4Source data files for figure are in the repository associated with the paper:https://github.com/yardencsGitHub/tweetynet(version 0.4.3, 10.5281/zenodo.3978389).
-
Song recordings and annotation files of 3 canaries used to evaluate training of TweetyNet models for birdsong segmentation and annotationDryad Digital Repository, doi:10.5061/dryad.xgxd254f4.
-
Model checkpoints, logs, and source data filesDryad Digital Repository, doi:10.5061/dryad.gtht76hk4.
-
Bengalese Finch song repository.Figshare, https://doi.org/10.6084/m9.figshare.4805749.v6.
-
BirdsongRecognition.Figshare, https://doi.org/10.6084/m9.figshare.3470165.v1.
Article and author information
Author details
Funding
National Institute of Neurological Disorders and Stroke (R01NS104925)
- Alexa Sanchioni
- Emily K Mallaber
- Viktoriya Skidanova
- Timothy J Gardner
National Institute of Neurological Disorders and Stroke (R24NS098536)
- Alexa Sanchioni
- Emily K Mallaber
- Viktoriya Skidanova
- Timothy J Gardner
National Institute of Neurological Disorders and Stroke (R01NS118424)
- Timothy J Gardner
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Ethics
Animal experimentation: All procedures were approved by the Institutional Animal Care and Use Committees of Boston University (protocol numbers 14-028 and 14-029). Song data were collected from adult male canaries (n = 5). Canaries were individually housed for the entire duration of the experiment and kept on a light-dark cycle matching the daylight cycle in Boston (42.3601 N). The birds were not used in any other experiments.
Copyright
© 2022, Cohen et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 3,945
- views
-
- 451
- downloads
-
- 42
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
Sensory signals from the body’s visceral organs (e.g. the heart) can robustly influence the perception of exteroceptive sensations. This interoceptive–exteroceptive interaction has been argued to underlie self-awareness by situating one’s perceptual awareness of exteroceptive stimuli in the context of one’s internal state, but studies probing cardiac influences on visual awareness have yielded conflicting findings. In this study, we presented separate grating stimuli to each of subjects’ eyes as in a classic binocular rivalry paradigm – measuring the duration for which each stimulus dominates in perception. However, we caused the gratings to ‘pulse’ at specific times relative to subjects’ real-time electrocardiogram, manipulating whether pulses occurred during cardiac systole, when baroreceptors signal to the brain that the heart has contracted, or in diastole when baroreceptors are silent. The influential ‘Baroreceptor Hypothesis’ predicts the effect of baroreceptive input on visual perception should be uniformly suppressive. In contrast, we observed that dominance durations increased for systole-entrained stimuli, inconsistent with the Baroreceptor Hypothesis. Furthermore, we show that this cardiac-dependent rivalry effect is preserved in subjects who are at-chance discriminating between systole-entrained and diastole-presented stimuli in a separate interoceptive awareness task, suggesting that our results are not dependent on conscious access to heartbeat sensations.
-
- Neuroscience
Decoding the activity of individual neural cells during natural behaviours allows neuroscientists to study how the nervous system generates and controls movements. Contrary to other neural cells, the activity of spinal motor neurons can be determined non-invasively (or minimally invasively) from the decomposition of electromyographic (EMG) signals into motor unit firing activities. For some interfacing and neuro-feedback investigations, EMG decomposition needs to be performed in real time. Here, we introduce an open-source software that performs real-time decoding of motor neurons using a blind-source separation approach for multichannel EMG signal processing. Separation vectors (motor unit filters) are optimised for each motor unit from baseline contractions and then re-applied in real time during test contractions. In this way, the firing activity of multiple motor neurons can be provided through different forms of visual feedback. We provide a complete framework with guidelines and examples of recordings to guide researchers who aim to study movement control at the motor neuron level. We first validated the software with synthetic EMG signals generated during a range of isometric contraction patterns. We then tested the software on data collected using either surface or intramuscular electrode arrays from five lower limb muscles (gastrocnemius lateralis and medialis, vastus lateralis and medialis, and tibialis anterior). We assessed how the muscle or variation of contraction intensity between the baseline contraction and the test contraction impacted the accuracy of the real-time decomposition. This open-source software provides a set of tools for neuroscientists to design experimental paradigms where participants can receive real-time feedback on the output of the spinal cord circuits.