1. Evolutionary Biology
  2. Genetics and Genomics
Download icon

The evolutionary history and genomics of European blackcap migration

  1. Kira Delmore  Is a corresponding author
  2. Juan Carlos Illera
  3. Javier Pérez-Tris
  4. Gernot Segelbacher
  5. Juan S Lugo Ramos
  6. Gillian Durieux
  7. Jun Ishigohoka
  8. Miriam Liedvogel  Is a corresponding author
  1. Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Germany
  2. Research Unit of Biodiversity (UO-CSIC-PA), Oviedo University, Spain
  3. Department of Biodiversity, Ecology and Evolution, Complutense University of Madrid, Spain
  4. Wildlife Ecology and Management, University Freiburg, Germany
Research Article
  • Cited 3
  • Views 1,956
  • Annotations
Cite this article as: eLife 2020;9:e54462 doi: 10.7554/eLife.54462

Abstract

Seasonal migration is a taxonomically widespread behaviour that integrates across many traits. The European blackcap exhibits enormous variation in migration and is renowned for research on its evolution and genetic basis. We assembled a reference genome for blackcaps and obtained whole genome resequencing data from individuals across its breeding range. Analyses of population structure and demography suggested divergence began ~30,000 ya, with evidence for one admixture event between migrant and resident continent birds ~5000 ya. The propensity to migrate, orientation and distance of migration all map to a small number of genomic regions that do not overlap with results from other species, suggesting that there are multiple ways to generate variation in migration. Strongly associated single nucleotide polymorphisms (SNPs) were located in regulatory regions of candidate genes that may serve as major regulators of the migratory syndrome. Evidence for selection on shared variation was documented, providing a mechanism by which rapid changes may evolve.

eLife digest

Every year as the seasons change, thousands of animals migrate huge distances in search of food or better climates. As far as migrations go, there might be none so impressive as the trans-oceanic flights made by small migrating songbirds. These birds can weigh as little as three grams and travel up to 15,000 kilometres. Most migrate alone and at night and yet still manage to return to the same location each year. Several strands of research suggest there could be a genetic basis to their migratory behaviour, but exactly which genes control this phenomenon remains poorly understood.

One small songbird that has been studied for decades is the European blackcap. This species exhibits a real variety of migration patterns. Some blackcaps travel rather short distances, others much further, and some populations do not migrate at all. Populations that share the same breeding grounds in the summer may migrate in different directions in the autumn. These features make it a good species to study the genetic variation between populations that migrate in different directions and over different distances. However, only in recent years has advancing technology made it possible to comprehensively study an animal’s entire genome, leaving no gene unturned.

Now, Delmore et al. have used high-throughput sequencing technologies to trace the evolutionary history of migration in European blackcap and started by assembling a reference genome for the species. Then, the genomes of 110 blackcaps from several populations that take different annual migrations were compared to the reference. This revealed that the populations began to diverge some 30,000 years ago and that there was some apparent gene mixing between groups of migrating and resident blackcaps around 5,000 years ago. The analysis showed only a small set of genes code for their differences in migration. Additionally, while the candidate genes were shown to be common among blackcaps, the genes identified did not match those reported from studies of other migrating songbirds. Finally, Delmore et al. also noted that the differences between the populations tend to be in the parts of the genome that control whether a given gene is switched on or off, which could explain how new migratory behaviours can rapidly evolve.

This study is one of the most comprehensive genomic analysis of migration to date. It is important work as songbirds, like other animals, are responding to increasing pressures of environmental and climate change. In time, the findings could be used to support conservation efforts whereby genetic analyses could determine if certain populations possess enough variation to respond to coming changes in their habitats.

Introduction

Bird migration is a fascinating and highly variable behaviour that integrates many traits – morphological, physiological and behavioural. Research on a wide range of species has provided important insight into this behaviour, from the incredible distances that birds cover during their journeys (Alerstam et al., 2003; Egevang et al., 2010), to the fine-tuned and precisely controlled timing of migration (Gwinner and Helm, 2003) and the fascinating sensory modalities that allow birds to navigate with amazing precision (Mouritsen, 2018; Wiltschko and Wiltschko, 1972). The European blackcap Sylvia atricapilla is an iconic migratory species that is well suited to work on the genetics of migration. Blackcaps exhibit dramatic differences in migratory behaviour (Figure 1a), spanning the entire spectrum from exclusively migratory populations in the northern portion of their range to short distance and partially migratory populations in the Mediterranean; non-migratory, or resident, populations occur on both the European continent (Iberian Peninsula) and the Atlantic islands. In addition to variation in the propensity to migrate and the distance covered, blackcaps vary in migratory orientation, with a migratory divide (contact zone between populations that breed adjacent to one another but take different migratory routes) occurring between populations that migrate southwest (SW) and southeast (SE) from their breeding grounds in Central Europe in the autumn. A novel migratory route also evolved very recently in this species, with an increasing number of birds migrating northwest (NW) from the Central European breeding grounds in the autumn (Cramp, 1992).

Figure 1 with 2 supplements see all
Sampling design and population structure.

(a) Sampling sites and migratory phenotypes. Samples were collected from the breeding grounds, except for a subset of NW migrants that were sampled during winter in the UK (open blue circle) (details in Supplementary file 4). (b–d) Population structure represented by a Principle Component Analysis (PCA) (b), NGSadmix (K = 2 and 3 shown) (c) and pairwise estimates of FST (d), showing differentiation between migrants and residents (as well as among residents themselves). Long dist SE = long distance migrants that orient SE in autumn (purple), med dist = medium distance migrants that orient in the corresponding heading during autumn migration (SE = green, SW = orange and NW = blue), res continent = residents found on the continent (yellow), short dist SW = short distance migrants that orient SW (black), res isl = resident birds on islands (cape = Cape Verde, canary = Canary Islands). Among continental residents, open circles indicate Cazalla de la Sierra, open circles with dash Asni, and filled circles Gibraltar. A PCA excluding islands can be found in Figure 1—figure supplement 1; results from NGSadmix at larger values of K can be found in Figure 1—figure supplement 2.

Variation in the migratory behaviour of European blackcaps was harnessed in a series of influential papers published in the 1980s and 1990s that detailed the genetic basis of migration. Common garden experiments showed that selectively bred individuals that were reared in isolation from their parents maintain population-specific behaviour, suggesting that there is a genetic component to migration (Helbig, 1994; Helbig et al., 1989; Helbig, 1991a; Berthold and Querner, 1981; Pulido and Berthold, 2010). F1 hybrids crossbred between populations that differ in migratory traits exhibited intermediate phenotypes (orientation, distance and propensity to migrate), suggesting that these traits are additively inherited (Berthold and Querner, 1981; Pulido and Berthold, 2010; Helbig, 1991b). Further work with F2 hybrids showed a wider distribution of phenotypes and the recovery of parental phenotypes, indicative of traits that are controlled by only a few major genes (Helbig, 1996), and selection experiments mating birds according to migratory status showed that the transition between resident and migratory behaviour can occur in just a few generations (Pulido et al., 1996; Berthold et al., 1990). The rapidity with which migratory behaviour can evolve has been supported in natural populations; the NW route taken by some birds was only established in the past 70 years and probably in response to increased food availability during the winter in the United Kingdom (Berthold et al., 1992).

The blackcap has also been the subject of extensive phylogeographic study. Pérez-Tris et al. (2004) used mitochondrial (control) data from 241 birds and 12 populations across the entire breeding range to show that migratory variation in this species arose recently (4,000–13,000 years ago [ya]) and has not yet resulted in significant population differentiation. These results could suggest that the genes that control migratory variation have small effect sizes or are restricted to a small portion of the genome. The only populations showing substantial genetic differentiation occurred in the Central European migratory divide (i.e., between SW and SE migrants), indicating that differences in orientation may help to maintain population differentiation. Resident populations showed evidence of historical bottlenecks followed by sudden expansions, suggesting that blackcaps lost their ability to migrate after secondary colonization of mild areas in southern Europe and on the Atlantic islands. This finding was supported by Voelker and Light (2011) who used mitochondrial (ND2 and cytb) data to reconstruct ancestral states within the genus Sylvia. Limited genetic differentiation between blackcaps was also documented by Dietzen et al. (2008) using mitochondrial (cytb) data; these authors also estimated dates for the colonization of Atlantic islands and for an earlier colonisation of the Canary Islands (the latter occurring 300,000–3,000,000 ya vs. 4,000–40,000 ya for colonisation of other islands including the Azores and Cape Verde).

The phylogeographic studies described above provided important insight into the evolution of migration in blackcaps and other temperate species more generally. When these studies and experimental work are considered together, the thorough set of studies conducted on blackcap migration is arguably unequalled in other species. Surprisingly, these classic experiments and molecular marker-based approaches have been followed by a dearth of genetic work on migration in blackcaps. Here, we leverage our knowledge of this excellent study system by using high-throughput sequencing techniques to provide the first genome-wide characterization of the blackcap. The major objectives of this study were to assemble a high-quality reference genome de novo, and to use whole genome resequencing data from 110 blackcaps (including birds from each migratory phenotype and encompassing the entire breeding range, Figure 1a) to (1) examine population structure and demography in this system, and (2) study the genetic basis of three migratory traits in unison: the propensity to migrate, migratory distance and orientation.

Analyses of population structure and demography revealed novel insights that are important for understanding both the evolutionary history of migration in blackcaps and the underlying genetics of this behaviour. A small number of studies on the genomics of migration have been conducted in songbirds (Delmore et al., 2016; Lundberg et al., 2017). We compare our results to theirs, evaluating a long held hypothesis of a common genetic basis to migratory behaviour (Liedvogel et al., 2011). Our results are not only relevant to understanding the genetics of migration in the blackcap. Data on the genetics of complex behaviours is at a premium in the evolutionary literature, which has focused primarily on morphological traits, and migration probably plays an important role in the early stages of speciation in many systems. Our results will speak to the genetic basis of this process.

Results and discussion

Assembly of a high-quality draft reference genome

We used whole genome sequencing (WGS) data and an optical map (Illumina and Bionano Irys technology, Supplementary files 13) to de novo assemble a hybrid reference genome for the blackcap (BioProject number PRJNA545868; Guojie Zhang, personal communication). The final genome is 1.02 Gb in length, comprised of only 96 scaffolds and has a large N50 scaffold length of 22 Mb. Ninety scaffolds mapped to the collared flycatcher Ficedula albicollis genome (average three scaffolds/chromosome; Supplementary file 3) and our annotation strategy, which used both in silico and evidence-based approaches, identified 17,982 protein-coding genes. Results from BUSCO and an analysis of UCEs (ultra-conserved elements) suggest that our reference is nearly complete, with 92% of single-copy orthologues unique to birds and 97% of UCEs identified in amniotes (Faircloth et al., 2012; Supplementary file 2).

Population structure and demography

We aligned WGS data from 110 blackcaps (including the two birds used in our assembly) to this reference (average coverage 17.5x, Figure 1a, Supplementary file 4) and estimated genotype likelihoods at genome-wide single nucleotide polymorphisms (SNPs). Genomic differentiation was low between migratory populations of different distances and orientations, but unlike earlier work using mitochondrial data (Pérez-Tris et al., 2004; Dietzen et al., 2008), we documented considerable differentiation between migrant and resident populations on both the continent and islands. A PCA separated resident island birds from continental populations on PC1, and resident continental birds from migrants on PC2 (Figure 1b). Migrants were not clearly distinguished on either PC (we obtained the same result when we re-ran the PCA excluding islands; Figure 1—figure supplement 1). Results from an ADMIXTURE analysis and estimates of FST confirm this pattern. At a cluster value of two, ADMIXTURE distinguished between island and continental birds (similar to PC1). At a cluster value of three, populations on the continent were further divided into resident and migratory groups (similar to PC2), and resident island and continent birds showed some admixture with the migratory group (Figure 1c). No further structure was observed beyond these three clusters (Figure 1—figure supplement 2). Estimates of FST ranged from 0.018 to 0.11, with the highest estimates occurring between migrants and both resident groups (0.06–0.11 for islands, 0.042–0.05 for continent residents; Figure 1d). Evidence for limited population differentiation combined with dramatic differences in the migratory behaviour of blackcaps is ideal for identifying genomic regions that are associated with this focal trait. Specifically, genomic regions associated with migration should standout against this backdrop of limited differentiation, although analyses involving residents will need to account for elevated values of differentiation.

A phylogeographic analysis using mitochondrial data suggested that variation in migratory behaviour evolved recently, 4000–13,000 ya (Pérez-Tris et al., 2004). Our results move this date further back in time, to 30,000 ya and the start of the last glacial maximum (Clark et al., 2009). Specifically, we used multiple sequentially Markovian coalescent (MSMC, implemented in MSMC2) (Schiffels and Durbin, 2014; Malaspinas et al., 2016) to characterize the demographic history of blackcaps. The demographic trajectories of migratory, resident continent and resident island birds began to diverge ~30,000 ya. The effective population size of migrant and resident island populations expanded and contracted, respectively, while continental residents exhibited a relatively constant effective population size (Figure 2a; Figure 2—figure supplement 1). Relative cross-coalescence rates (CCR) between all three groups exhibited a concomitant drop ~30,000 ya (Figure 2b; Figure 2—figure supplement 2). The drop of relative CCR between migratory and resident island populations was steeper than that between migratory and resident continent populations (Figure 2b), suggesting that genetic separation following the colonization of islands resulted in greater separation than that between continental populations of migrants and residents. Increased differentiation between migrants and resident island birds (vs. resident continent birds) was also documented in our PCA (Figure 1). Results for medium-distance migrants (NW, SW and SE) and long-distance migrants are indistinguishable (Figure 2—figure supplement 4; Figure 2—figure supplement 5; Figure 2—figure supplement 6).

Figure 2 with 6 supplements see all
Demographic history.

(a) Effective population size by time estimated by MSMC2 using five individuals per blackcap phenotype. Note that the most recent time segment is regarded as being unreliable in MSMC2 results. (b) Relative cross-coalescence rate estimated by MSMC2. 15 lines with three colours indicate relative cross-coalescence rate for all pairwise combinations of the six populations (three for comparisons between populations on the continent [continent vs. continent], three for comparisons between populations on the islands [island vs. island], and nine for comparisons between continent and island populations [continent vs. island]). The dotted vertical line indicates the inferred time of population separation. Results from down-sampling can be found in Figure 2—figure supplements 1, 2 and 3; results for medium- and long-distance migrants run separately can be found in Figure 2—figure supplements 45 and 6.

One interesting finding from our demographic analyses is that of apparent gene flow between migrant and resident continent birds ~5000 years ago. Specifically, the relative CCR between migrant and resident continent populations started to increase at ~5000 ya (~25,000 years after initial divergence; Figure 2—figure supplement 3). This admixture event may reflect secondary contact between migrant and resident continent populations and is line with our results from ADMIXUTRE, with admixture documented between these groups at a cluster value of three. The last glacial maximum ended ~19,000–11,500 ya (Clark et al., 2009). After this time, populations would have expanded out of their glacial refugia, and perhaps migrant and resident continent populations came into secondary contact ~5000 years after these expansions began. Similar to our results on population differentiation, island populations exhibit their own evolutionary trajectories following divergence. This result is in line with results from Dietzen et al. (2008), who suggested that at least one separate colonization to the Atlantic islands occurred (earlier than that to the Canaries).

Genetic basis of migratory traits

Here, we transition to study local patterns of genomic differentiation, identifying specific genomic regions that have signatures of selection related to three phenotypes: the propensity to migrate, orientation of migration and distance of migration (resident continent, short distance SW, medium distance NW, SW, SE and long distance SE populations). We excluded resident island birds from these analyses (because of limited sample size [n = 5] and potential effects from founder events) and focused on a single resident continent population (Gibraltar, we obtained similar results using Cazalla de la Sierra, total number of birds included in these analyses = 82, Supplementary file 4).

Positive selection was more common in residents and limited to a few, small genomic regions (Table 1a). For example, hapFLK is a tree-based method that controls for hierarchical population structure. Global and local NJ trees are constructed using haplotype frequencies and regions under selection show longer branch lengths. Only nine regions were found to be under selection (permutation test, see 'Materials and methods') according to this method, and six of these appeared in residents. Figure 3a shows estimates of hapFLK for the entire genome, and Figure 4a exemplifies one region on Super-Scaffold 99 (syntenic with flycatcher chromosome 3). The average size of these regions was 16.7 kb and only six genes occurred within them. We used CAVIAR (Rochus et al., 2018) to identify variants showing strong associations with selection in these regions. Each region included one to four variants, all of which occurred in non-coding regions (Supplementary file 5). Previous phylogeographic work suggested that migration is the ancestral state in blackcaps (Pérez-Tris et al., 2004; Voelker and Light, 2011). Accordingly, the selection in genomic regions that we identified here is probably involved in the transition from migrant to resident phenotypes.

Figure 3 with 1 supplement see all
Genome-wide local estimates of population differentiation.

Results from hapFLK using haplotype frequencies (a,c) and ΔPBS using SNP frequencies (b,d; 2,500 bp windows). Estimates of ΔPBS for resident continent (b) and medium-distance NW migrants (d) are shown; results for the remaining populations can be found in Figure 3—figure supplement 1. Genetic elements, scaffolds and genes discussed in the text are highlighted.

Figure 4 with 1 supplement see all
Exemplifying genomic regions under positive selection.

Local neighbour joining trees for regions under selection in (a) the resident continent population on Super-Scaffold 99, and (b) medium-distance NW population on Super-Scaffold 73. Selection is indicated by longer branch lengths in each population than is the case in global trees built using data from all genomic regions (Figure 4—figure supplement 1). Panels to the right of the trees show the corresponding frequency of haplotypes in each population of the tree. Haplotype clusters are colour coded (colours of haplotype clusters do not correspond to the population colour coding used in other figures), and frequencies are plotted along the Y axis. Haplotype frequency plots show the near fixation of a single dominating haplotype in (a) resident continent (yellow) and (b) medium-distance NW populations (blue). The location (in bp) of these regions on each Super-Scaffold is shown below these panels and the resident continent group is only included to root the tree in panel (b), and thus has no haplotype frequencies.

Table 1
Genetic variants underlying variation in migration.

(a) Results from analyses including all continental birds and (b) results from analyses limited to medium-distance migrants. Results from hapFLK include the size, the population where the signal was found and genes within the region. Estimates of ΔPBS and (PBS) in the same regions are shown; they are bolded if in the top 1% of the focal population’s distribution and new sizes are estimated using neighbouring windows above this threshold (if larger than the limits from hapFLK, additional genes are specified). Estimates of PBS were re-estimated using island populations (vs.continent resident populations). Regions in the top 1% of an island population’s distribution are indicated in section (a) (recorded as 'NA' if the initial population under selection was not resident). 'Scaf' refers to the scaffold within the blackcap genome where the region is found and 'chr' refers to the flycatcher chromosome that these scaffolds map to. For the number of strongly associated SNPs identified by CAVIAR and estimates of nSL, see Supplementary file 5.

(a)
hapFLKΔpbs
ScafChrSize
(bp)
Log p-valuePopulationGenesSize
(Mb)
ΔPBS
(PBS)
Island
replacement
Genes
124A14,0599.4ResidentLOC1008591735218.7 (0.40)AzoresEDA2R
131129,1958.3ResidentCHST4, TERF2IP, KARS30341.0 (0.87)Cape VerdeDHX38, DHODH, IST1, C2H2, ATXN1, AP1G1, PHLPP2, TAT, GABARAPL2, TMEM231, CHST6
17376109.5Short SW316.50 (0.02)NANKAIN1
22953,8908.8Med SECLSTN21,005.521.9 (0.19)NADUF4637, PIK3CB, FOXL2, MRPS22, COPB2, RBP2, NMNAT3
30213,75611.5Resident42.58.14 (0.19)Cape Verde
30279028.8Resident1,029.519.1 (0.42)Canaries, Cape Verde
41810,3418.3Resident11.515.0 (0.33)
461A4127.9Med SE9.59.0 (0.03)NA
99313,1407.8ResidentTTBK119228.6 (0.61)Azores, Canaries, Cape VerdeLOC101820716, ACSS1, NEIL1, SLC22A7, TTL
(b)
hapFLKΔpbs
ScafChrSize (bp)Log p-valuePopulationGenesSize (Mb)ΔPBS (PBS)
17332589.04Med NWSDC1514.49 (0.20)
3023118.85Med NW711.31 (0.16)
461A4618.71Med NW38.14 (0.15)
631A10889.55Med SE1.14 (0.05)
6769959.46Med SW59.03 (0.18)
735361111.81Med NWATG2B, BDKRB2330.41 (0.35)

We complemented results from hapFLK with a modified version of the Population Branch Statistics (PBS) (Yi et al., 2010) and the number of segregating sites by length (nSL) (Ferrer-Admetlla et al., 2014). PBS is an FST-based statistic that estimates allele frequency differences between three or more populations. This parameter can be elevated by linked purifying selection (or background selection) within populations that is unrelated to positive selection (in our case selection related to migration). We removed these confounding effects by scaling PBS and subtracting the maximum value of PBS in orthologous windows from that in the non-focal population (hereafter ‘ΔPBS’, following Vijay et al., 2017). nSL is a haplotype-based statistic that focuses on patterns within populations, using segregating sites to measure the length of haplotypes. Linked selection should increase haplotype lengths at genomic regions that are under positive selection. Eight of the nine regions identified by hapFLK also exhibited extreme values of ΔPBS and nSL (in the top 1% of the distribution) in the same population as that identified by hapFLK (ΔPBS Table 1a, Figure 3b for resident birds [estimates for short distance SW, medium distance SE, SW, NW, and long distance SE, Figure 2—figure supplement 4; Figure 2—figure supplement 5; Figure 2—figure supplement 6]; nSL Supplementary file 5).

As noted already, population structure and linked selection can elevate differentiation between populations. We controlled for these effects using hapFLK and ΔPBS, respectively, and emphasise that genomic differentiation between populations of blackcaps is low to begin with (Figure 1d). In addition, linked purifying selection would be expected to increase PBS in all populations (i.e., not just the focal population), but this is not the case. This is exemplified in Figure 5a where estimates of ΔPBS for all populations are shown but these are only elevated in the resident continent population. As a final test of population structure, we re-estimated ΔPBS using resident island birds (instead of resident continent birds). We conservatively excluded these populations from our initial analyses because their sample sizes are small and because genetic drift can affect estimates of differentiation in island populations. Nevertheless, the island populations are also resident and thus these estimates could help to validate the genomic regions that were identified as being under selection in resident populations on the continent. Table 1a summarizes these results, noting which genomic regions exhibited elevated values of PBS on islands. Of particular interest, PBS was elevated in all three island populations at the genomic region on Super-Scaffold 99 (Figure 5b). Combined with findings from hapFLK (controlling for population structure and relying on haplotypes), ΔPBS (controlling for linked selection and relying on SNP data) and nLS (estimated within populations and relying on haplotypes), these results provide strong evidence that this specific region contains important variation for the transition to residency, not only on the continent but also on the islands.

Estimates of ΔPBS on Super-Scaffold 99 corresponding with the region shown in Figure 4a (smoothed using the geom_smooth function in ggplot to summarize data in 2500-bp windows).

(a) Estimates for resident continent, medium-distance NW, SW and SE migrants, and short- and long-distance birds. These estimates are only elevated in the resident continent phenotype, ruling out a role for linked selection in generating this signature in residents. (b) Estimates for the resident continent and island birds (Azores, Canaries and Cape Verde), which are all elevated, implying that parallel selection is probably involved in the transition from migration to residency in this region. Colours correspond to Figure 1a with yellow showing data for resident continent birds.

Note that it is possible that the signatures of positive selection that we document here reflect selection based on different ecological variables involved with the colonization of areas further south on the continent, but at least in the case of Super-Scaffold 99, we believe that this is rather unlikely as most ecological variables (biotic and abiotic) are quite distinct between islands and the continent (and between the islands themselves) (Cropper, 2013Valente et al., 2017). The transition to residency is shared, probably representing one of the only shared selection pressures experienced by all of these populations. Note that the lack of consistent results for other regions under selection in the resident continent population does not preclude the potential importance of these regions as, for example, genetic drift on islands would affect which genetic variants were present on islands for selection to act on.

Our finding that only a few genomic regions under selection contain genes and that the strongly associated SNPs identified by CAVIAR are in non-coding regions could suggest that cis-regulatory changes are important for the transition from migration to residency. In support of this suggestion, an alignment of predicted mRNAs from several bird species and transcripts from a testis transcriptome of the blackcap placed two of the SNPs from CAVIAR in the 3′ untranslated region (3′ UTRs) of two genes (GPR83-L on Super-Scaffold 12 and CHST4 on Super-Scaffold 13, syntenic with flycatcher chromosomes 11 and 4a, respectively). Three prime3′ UTRs can act as posttranscriptional regulators; they contain binding sides for microRNAs, which can inhibit translation or target mRNA for degradation (Mayr, 2017; Barrett et al., 2012). In fact, previous work with monarch butterflies identified 55 conserved microRNAs that are differentially expressed between summer and migratory butterflies (Zhan et al., 2011). Future analyses to validate this suggestion could include the use of qPCR to determine whether GPR83-L and/or CHST4 are in fact differentially regulated between the migratory phenotypes.

Future work using techniques aimed at identifying binding sites for transcription factors (e.g., ChIP-seq) could also be useful. We conducted a preliminary analysis here, using HOMER (Heinz et al., 2010) to detect known transcription factor motifs in the genomic regions that are under selection in residents. Specifically, Ruegg et al. (2014) used a literature search to identify 25 candidate genes for migration. Four of these genes are transcription factors whose motifs are in the libraries searched by HOMER: three basic helix-loop-helix transcription factors (bHLH) (Clock, Npas2, and Bmal1) and one basic leucine zipper domain (Nfil3). We found a bHLH motif (GHCACGTG) on Super-Scaffolds 12 and 99 (Figures 3a,b, 4a and 5). The motif on Super-Scaffold 99 is particularly interesting as there is a SNP (G/T) at the beginning of the motif that is nearly fixed in continental residents (the allele frequency for G in Asni, Gibraltar and Cazalla de la Sierra is 1, 0.85 and 0.9, respectively; FST between Gibraltar and medium-distance NW, SW and SE migratory populations is 0.15, 0.25 and 0.44, respectively). This motif could disrupt or weaken transcription factor binding (Kasowski et al., 2010). This is also the genomic region that showed elevated PBS in both resident continent and island populations (Figure 4a, Figure 5). Clock, Npas2 and Bmal1 are involved in maintaining circadian rhythms. Circadian rhythms synchronize circannual clocks, which are important cues controlling seasonal migratory behaviour (Gwinner, 1996; Visser et al., 2010).

Concerning the actual identity of genes within regions that are under selection, several have functions that could be related to the transition from migration to residency. For example, LOC100859173 (located on in the genomic region under selection on Super-Scaffold 12, the region with a bHLH motif mentioned above; Table 1a) has been annotated as a probable G-protein coupled receptor that mediates the function of neuropeptide Y (NPY). NPY is localized in the brain of birds and works with Agouti-related peptide (AGRP) and proopiomelanocortin (POMC) to control energy balance. Specifically, NPY/AGRP neurons stimulate appetite, food intake and fat deposition, while POMC inhibits these processes (Boswell and Dunn, 2017). It has been hypothesized that the effects of NPY may extend to seasonal changes in energy balance that are important for migration, including hyperphagia and fat deposition (Boswell and Dunn, 2017). Beyond its role in energy balance, NPY also facilitates learning and memory via the modulation of hippocampal activity and has an effect on circadian rhythms, reproduction, and the contraction of vascular smooth muscles. It has been suggested that a common genetic mechanism or major regulator may control migratory traits (Liedvogel et al., 2011; Liedvogel and Lundberg, 2014). A protein such as NPY, or the transcription factors that bind the bHLH motif identified in the prior analysis, could fill this role.

Analysis focused on migratory orientation and distance

So far, we have considered all three migratory traits exhibited by blackcaps together (propensity, orientation and distance) and our results relate mostly to residents. The elevated population differentiation that we noted between resident and migratory birds could reduce our power to identify selection that is specific to migrants (Fariello et al., 2013). Accordingly, we ran a second set of analyses excluding resident birds and examining migratory orientation and distance independently. Starting with orientation and limiting our analysis to medium-distance migrants with varying orientations (medium-distance NW, SW and SE migrants, total number of birds included in these analyses = 54, Supplementary file 4), hapFLK identified only six regions that are under positive selection (Table 1b). Most of these regions showed selection in the NW phenotype and exhibited extreme values of ΔPBS limited to the population identified by hapFLK (Figure 3c,d). Figure 4b exemplifies results for hapFLK at one region under selection in the NW migrants (~4 kb on Super-Scaffold 73, syntenic with flycatcher chromosome 5). Results for nSL can be found in Supplementary file 5.

The list of genes in genomic regions that are under selection in this analysis focusing on orientation is small, but it also includes genes with functions that are strongly related to the phenotype they are associated with. For example, SDC1 is a region on Super-Scaffold_17 that is under selection in NW migrants. This gene codes for a transmembrane protein that helps to regulate the Wnt signalling pathway. This pathway plays a role in embryonic development and has been shown to influence feather and beak morphogenesis, along with feather molt (Yu et al., 2004; Mallarino et al., 2011; Bhullar et al., 2015; Widelitz, 2008). NW migrants have rounder wings and more narrow beaks than southern migrants (Rolshausen et al., 2009). Differences in the timing of migration probably mean that birds also molt at different times. This has not been evaluated directly in comparisons between migrants, but variation in molt patterns have been documented between NW migrants and birds that are resident on the continent (de la Hera et al., 2009).

Two previous studies attempted to identify de novo genomic regions under selection related to differences in orientation: Delmore and Liedvogel (2016) with Swainson’s thrushes (Catharus ustulatus) and Lundberg et al. (2017) with willow warblers (Phylloscopus trochilus). Delmore and Liedvogel (2016) identified a region on chromosome 4 and Lundberg et al. (2017) regions on chromosome 1 and 5 that are associated with orientation. None of these regions overlap with those under selection in our study on blackcaps. It is tempting to suggest that migration may be controlled by similar genes across broad taxonomic scales, with early results from candidate genes (e.g., the poly-glutamine repeat in Clock) showing consistent results across groups as divergent as insects, fishes and birds (Delmore and Liedvogel, 2016). Nevertheless, several studies have failed to document an association with Clock, and a comparison of our results with those of Delmore and Liedvogel (2016) and Lundberg et al. (2017) adds further caution to this idea of a common basis (at least at the sequence level). This is an important finding as it has long been hypothesized that there may be a shared genetic mechanism for migration, not only in birds but also in other taxonomic groups (Liedvogel et al., 2011; Liedvogel and Lundberg, 2014; Liedvogel and Delmore, 2018).

None of the regions identified by hapFLK and PBS were fixed for alternate haplotypes or alleles. This fact is evident in Figure 4, in which the regions under selection still include haplotypes from a different cluster, and it could suggest that selection is acting on shared genetic variation (i.e., variation that is already present in the population rather than newly derived mutations). The idea that transitions between migratory phenotypes have been facilitated by shared genetic variation has been around for quite some time in the blackcap literature, particularly as rapid transitions have been observed and include the evolution of a new NW migratory route in the past 70 years. Shared variation can facilitate these rapid changes as these variants are already present in the population and have been tested by selection (Barrett and Schluter, 2008). The fact that regions under selection are quite narrow (Table 1) also supports a role for shared genetic variation (Barrett and Schluter, 2008) and we provide further evidence below.

First, we estimated the genetic distance between one haplotype in each cluster and an ancestral sequence that we derived using WGS from the two most closely related sister taxa, hill babbler (Pseudoalcippe abyssinica, an African resident) and garden warbler (Sylvia borin, a long distance migrant) (Voelker and Light, 2011). Using the region on Super-Scaffold 73 that shows selection in NW migrants (Figure 4b), we predicted that if haplotypes in the light blue cluster were present in the population already, they should exhibit similar levels of divergence from the ancestral sequence as haplotypes from all other clusters. This is precisely what we found; genetic distance from the ancestral sequence was similar for haplotypes from all clusters (181 differences for the NW haplotype vs. 178, 179 and 181 [x3] and 182 differences in the rest). We reran this analysis limiting our data to synonymous substitutions in predicted coding regions (i.e., those that are likely to be evolving neutrally and located in ATG2B and BDKDB) and obtained similar results. Specifically, we identified six synonymous substitutions between all three medium-distance migrant populations and both garden warblers and hill babblers, suggesting that there is no difference in the age of these haplotypes.

To follow up on the former analysis, we constructed a maximum likelihood (ML) tree using sequence data from the region under selection on Super-Scaffold 73. We built this tree using data from all continental blackcaps, garden warblers, and hill babblers, using the willow warbler as an outgroup, and compared this tree to a consensus tree summarizing ML trees constructed for each scaffold in the blackcap reference genome (i.e., a tree built using genome-wide data; Figure 6a). Supporting previous phylogenetic work in the system, garden warblers and hill babblers formed a sister clade to blackcaps in the consensus tree, and relationships among blackcaps were largely unresolved. By contrast, garden warblers were more closely related to blackcaps than were hill babblers in the tree built using data from the region on Super-Scaffold 73 (Figure 6b). In addition, the medium-distance NW population (in which positive selection is acting in this particular region) occurs at the base of the blackcap clade. Recall that garden warblers are obligate migrants whereas hill babblers are residents, supporting the suggestion that haplotypes favoured in the NW phenotype were already present in the population before divergence, perhaps even in ancestral populations. Unfortunately we do not have data from any closely related species to determine how old this haplotype is (i.e., if it is older than the split between garden warblers, hill babblers and blackcaps sensu Colosimo et al., 2005).

Evidence for the use of shared variation on Super-Scaffold 73.

(a) A rooted extended majority rule consensus tree summarizing maximum likelihood (ML) trees constructed for all scaffolds in the blackcap reference genome (96 scaffolds). Node numbers indicate the number of scaffolds in which populations were partitioned into two sets. (b) A ML tree constructed for the region on Super-Scaffold 73 with migratory garden warbler more closely related to blackcaps and medium-distance NW birds occurring at the base of this clade in Figure 4b. Nodes with bootstrap values <80 are collapsed; nodes without numbers have support values of 100. (c) The site frequency spectrum (SFS) for the region on Super-Scaffold 73 (red) compared to SFSs for 1000 random sequences from the genome (varying shades of gray).

In a final analysis, we compared the site frequency spectrum (SFS) for the region on Super-Scaffold 73 to SFSs estimated for 1000 random sequences of the same length from throughout the genome. SFSs for the random sequences are similar to expectations under neutrality, with a preponderance of alleles at low frequencies. By contrast, the SFS of Super-Scaffold 73 shows an excess of mid-frequency alleles (Figure 6c). Greater variance in SFSs are expected when selection makes use of standing variation because alleles have been recombining onto different backgrounds in ancestral populations (Przeworski et al., 2005; Pennings and Hermisson, 2006).

We conclude our study by examining the genetic architecture of migratory distance. We included all migrants in this analysis, quantified migratory distance as an ordinal variable from short- (1), to medium- (2), to long-distance (3) migrants, and used a Bayesian Sparse Linear Mixed Model (BSLMM, 87) to identify SNPs that are associated with migratory distance (total number of birds included in these analyses = 72, Supplementary file 4). BSLMMs are a form of genome-wide association analysis that includes a term for other factors that influence the phenotype and are correlated with genotype (e.g., population structure and ancestry; a kinship matrix based on genome-wide SNP data) and can be used to estimate both the combined effects of multiple SNPs and the effects of SNPs on their own.

Our results suggest that a large percentage of variance in migratory distance can be explained by our SNP set (PVE = 0.90 ± 0.28), but only one SNP showed a strong association with this focal trait (posterior inclusion probability >0.01). This SNP is located on Super-Scaffold 79, occurs in an area of elevated FST between long- and short-distance migrants (FST = 0.31, in 0.018 percentile FST values) and is 627-bp downstream from the gene KCNIP1, which encodes a potassium channel interacting protein (major determinants of neuronal cell excitability). Combined with the haplotype identified in the hapFLK analysis, which provides a signature of positive selection in short-distance migrants on Super-Scaffold 17 (Table 1a), these loci represent good candidates for controlling migratory distance, but future analyses with a larger sample size are needed to confirm the robustness of this finding. Direct information on migratory distance could also inform this analysis by allowing us to code the phenotype as continuous.

Conclusions

Early research on blackcaps was pivotal for demonstrating the existence of a genetic basis of migration and studying its evolution. This is due in large part to the tractability of this species and its variability in migratory behaviour. Here, we have expanded this study system beyond phenotypic and marker-based approaches, launching it into the genomic era and conducting one of the most comprehensive genome-wide analyses of migration to date. Populations of blackcaps began to diverge ~30,000 years ago, but differentiation remains low between migratory populations. There is evidence for past gene flow between migratory and resident populations on the European continent but comparison of the contemporary structure of these populations suggests that gene flow may be limited. This is certainly the case for resident island birds. It has been suggested that one single genetic mechanism controls migratory traits and may be shared across broad taxonomic groups. We do not find evidence for one common genetic mechanism across species here, and no protein-coding change is shared across the three focal traits (propensity, distance and orientation) that we examined in unison. Future work on gene expression may identify major regulators that control multiple migratory traits, and both NPY and bHLH transcription factors are good candidates. Combined with the additional results that we presented here (such as the importance of standing genetic variation), this information is vital for understanding how predictable the evolution of migration and other complex behavioural traits may be.

Blackcaps have not only been relevant to work on the evolution and genetics of migration. Early work in this system suggested that differences in migration might serve as reproductive isolating barriers early in speciation. For example, hybrids were shown to exhibit intermediate orientation behaviour that was predicted to be inferior because it would bring hybrids over large ecological barriers that pure forms avoid (Helbig, 1991b). More recently, it was shown that NW migrants arrive on the breeding grounds earlier than SW migrants, and that these birds mate assortatively on the basis of arrival time, helping to reduce gene flow between phenotypically distinct groups (Bearhop et al., 2005). The role of migration in speciation has gained considerable traction in recent years (Rolshausen et al., 2009; Bearhop et al., 2005; Irwin and Irwin, 2005; Rohwer and Irwin, 2011; Turbek et al., 2018; Delmore and Irwin, 2014; Bensch et al., 2009) and results from our study suggest that selection at a very small number of loci may be sufficient to initiate reductions in gene flow very early in the process of population differentiation and speciation.

Materials and methods

Genome assembly

Request a detailed protocol

Blood samples from two male blackcaps from the Mooswald breeding population at Freiburg im Breisgau, Germany, classified as medium-distance SW migrants (on the basis of morphometrics and isotope signatures) were used to assemble the reference genome. Full details on all steps in our genome assembly can be found in Supplementary file 6 (BioProject number PRJNA545868; Guojie Zhang, personal communication). Briefly, genomic DNA from one individual was used to sequence Illumina sequencing libraries (fragment and mate pair libraries with insert sizes of 2, 5 and 10 kb). 275.9 Gb of raw high throughput sequence (HTS) data were generated and assembled using ALLPATHS-LG. This assembly was improved several ways (e.g., by removing duplicates and closing gaps). DNA from the second individual was used to generate two BioNano optical maps (one using BspQI and the other BssSI). These maps were used to super-scaffold HTS scaffolds. Statistics for the final assembly and each stage can be found in Supplementary file 2 .

We used SatsumaSynteny (Grabherr et al., 2010) to determine which avian chromosome each scaffold was found on (aligning scaffolds to the flycatcher genome, Supplementary file 3). We validated our initial ALLPATHS assembly, the improved ALLPATHS assembly and our final assembly (including BioNano optical maps) using BUSCO (version 3.0.2, AUGUSTUS species chicken and aves_odb9 dataset) and by blasting ultra-conserved elements (UCEs) identified by Faircloth (2016) using whole-genome alignments for the chicken and zebra finch (Supplementary file 2).

Genome annotation

Request a detailed protocol

We annotated genes with putative functions and protein domains using MAKER. Gene prediction was performed using a de novo testis transcriptome of blackcaps and cDNAs from three avian species (zebra finch, chicken and flycatchers) from the ensembl database. Following MAKER, we obtained the predicted protein sequences to annotate genes functionally using blastp and Interproscan. For the final annotation, we only included gene predictions that either had an Annotation edit Distance (AED) <0.5 and/or a blastp hit (with the thresholds described above) and/or a predicted protein domain.

Resequencing analysis

We obtained whole genome resequencing (WGS) data from 110 male blackcaps (including WGS data from the two individuals used to generate the reference genomes). High molecular weight DNA was extracted from blood withdrawn from the brachial vein, following a standard salt extraction protocol. Individual samples were collected across the European breeding range including three island populations (Canary Islands, Cape Verde, and Azores) and covering the entire range of migratory phenotypes. Population phenotype was scored on the basis of morphometry, stable isotope signature and/or ringing-recovery data from selected individuals (see Supplementary file 4 for a description of how each population was phenotyped). Birds were sampled during the breeding season unless indicated otherwise. Specifically, exceptions are a subset of UK overwintering birds (n = 6) sampled during the winter in the British Isles, and a subset of long-distance SE migrants (n = 5) caught during autumn migration and selected on the basis of wing length (see Supplementary file 4 for details). We also obtained WGS data for five garden warblers and three hill babblers, the closest sister taxa to blackcaps, sampled during breeding (Voelker and Light, 2011). We prepared small insert libraries using DNA from each individual and sequenced five samples per lane on NextSeq 500 with paired-end 150 bp reads. We trimmed reads with trimmomatic (TRAILING:3 SLIDINGWINDOW:4:10 MINLEN:30) (Bolger et al., 2014).

All analyses made use of data from resequencing reads that were aligned to the reference genome using bwa mem (Li and Durbin, 2009) or stampy in the case of the garden warblers (divergence time of 0.026 based on alignments of UCEs (Faircloth, 2016https://github.com/faircloth-lab/phyluce/). GATK (McKenna et al., 2010) and picardtools (http://broadinstitute.github.io/picard) were used to identify and realign reads around indels (RealignerTargetCreator, IndelRealigner) as well as remove duplicates (MarkDuplicates, all default settings).

We recalibrated the resulting bam files using GATK’s base quality score recalibration (BQSR). Specifically, we called SNPs for each population separately using three different programs and default settings: UnifiedGenotyper from GATK, samtools (Li et al., 2009) and FreeBayes (Garrison and Marth, 2012). BQSR requires a set of known variants. We used SNPs identified in all three programs and populations as the set of known variants for the first round of BQSR. We conducted a second round using common SNPs from the three programs that were also of high quality (QUAL >995,~10% of the common SNPs).

Most of our analyses made use of the BQSR recalibrated bams, calling genotype likelihoods (GL) with ANGSD (version 0.910–24-gf84f594, Korneliussen et al., 2014) and filtering reads that did not map to a unique location, did not have a mapping pair, or had mapping qualities below 20 and flags ≥256. When it was not possible to use GL as input, we used a vcf that had been run through GATK’s variant quality score recalibration (VQSR). VQSR also requires a set of known SNPs. We used the second set of known SNPs (common and high-quality) from BQSR for this analysis and combined variants from all populations into a single vcf file for subsequent analyses. All repetitive regions were excluded from our analyses and those focused on demography did not include the Z chromosome.

Principal components analysis (PCA) and ADMIXTURE analyses

Request a detailed protocol

We conducted a PCA using smartpca (EIGENSOFT version 5.0) and the vcf produced from VQSR. Default parameters were used in smartpca except for the addition of a correction for LD across SNPs (nsnpldregress = 2). We conducted an admixture analysis using GLs from ANGSD and running them through ngsADMIX (Skotte et al., 2013) with 8 values of K (1–9).

FST

Request a detailed protocol

We estimated FST between all populations using GLs from ANGSD, starting by estimating unfolded site frequency spectrums (SFS) for each population (doSaf 1, gl 1) and using them to obtain joint frequency spectrums (2DSFS, realSFS) for each pair of populations. These 2DSFSs were used as priors for allele frequencies at each site to estimate FST (realSFS fst index). In order to estimate unfolded SFS, we needed an ancestral sequence, or the ancestral state of variants segregating in blackcaps. This sequence was generated using WGS from garden warblers and hill babblers. Specifically, we used samtools to generate fasta files for each garden warbler and hill babbler (n=5 and n=3, respectively) and used rules outlined in Poelstra et al. (2014) to call ancestral states, with alleles that were homozygous in both outgroup species being considered ancestral and excluding remaining sites (those that were triallelic or heterozygous in the outgroup species).

Consensus tree

Request a detailed protocol

We obtained consensus fasta sequences for each population using ANGSD (-doFasta 2 –doCounts 1 –minQ 20 –setMinDepth 5) and used IQTREE (Nguyen et al., 2015) to construct maximum likelihood trees for each scaffold in the blackcap genome (there was no difference in the topology obtained for scaffolds mapping to the Z chromosome so they were included in the consensus, data not shown). We summarized the resulting trees using phylip ‘consense’ and constructing an extended majority-rule consensus tree (in which nodes that were supported by fewer than 50% of the input trees are collapsed).

MSMC2

Request a detailed protocol

We used MSMC2 to infer the demographic history of blackcaps in our dataset. MSMC2 implements the multiple sequentially Markovian coalescent (MSMC) model, estimating effective population size by time and relative cross-coalescence rates between any two populations. It allows inference of the expansions and contractions of a population and of the extent and timing of population divergence (Malaspinas et al., 2016). Specifically, by running a hidden Markov model (HMM) along all possible pairs of haplotypes, MSMC2 estimates the free parameters for a demography model (a series of effective population sizes as a function of segmented time) and relative cross-coalescence rates between sequences using a maximum-likelihood approach.

After phasing our data using fastphase (Scheet and Stephens, 2006), we combined individuals into six groups (medium and long migrants [‘med + long’], short-distance SW migrants (‘short’), resident continent birds, and resident island birds from the Azores, Cape Verde, and Canary Islands). We grouped medium (NW, SW and SE) and long-distance migrants because they exhibited very little population structure (Figure 1) and indistinguishable demographic histories (Figure 2—figure supplement 4; Figure 2—figure supplement 5; Figure 2—figure supplement 6). We excluded any birds with less than 15x coverage. This filter left us with all island individuals (five individuals for each island), five short migrants, 19 continental residents, and 44 med + long migrants. To avoid bias associated with the use of unequal numbers of individuals from each group, we randomly down-sampled five individuals from med + long migrants and continental residents to create 10 sample groups. We used the bamCaller.py script provided in the msmc-tools package (https://github.com/stschiff/msmc-toolsKhvorykh, 2018) to create sample-specific callability mask files. We generated a global mappability mask file for the reference genome using GEM (Derrien et al., 2012). We inferred effective population size by running MSMC2 separately for each group (Schiffels and Wang, 2020). We determined the number of clusters for fastPHASE using a cross-validation procedure (https://github.com/inzilico/kselection/ Khvorykh, 2018). Statistical phasing (i.e., phasing without a reference panel) can be error prone, but fastPHASE is commonly employed in non-model organisms and is well-suited to datasets like ours that include high density SNPs on a physical map (Scheet and Stephens, 2006; Burri et al., 2015; Kawakami et al., 2017).

The analysis of cross-coalescence rates requires comparisons between groups and we considered all possible combinations of groups for our analysis (Schiffels and Wang, 2020). In other words, we ran analyses for all 15 possible combinations (three between groups on the continent, three between populations on the islands, and nine for comparisons between the three continent groups and three island populations). For each pairwise combination, we ran the combineCrossCoal.py script from msmc-tools (https://github.com/stschiff/msmc-tools) and computed the relative cross-coalescence rate by dividing the between-populations coalescence rate by the average within-population coalescence rate. We scaled results using a mutation rate of 3 × 10−9/gen/site and a generation time of 2 years (Nadachowska-Brzyska et al., 2016; Nadachowska-Brzyska et al., 2015).

hapFLK

Request a detailed protocol

hapFLK is a tree-based method that is used to identify genomic regions that are under selection. This program permits the inclusion of two or more populations and accounts for both drift within populations (different Ne) and covariance across them (hierarchical structuring). We used the vcf from VQSR as input for this analysis, applying two additional filters for the inclusion of variants: minimum number of individuals/phenotype = 5 and minor allele frequency of 0.05. hapFLK also requires an estimate of the number of clusters into which haplotypes can be grouped. We ran this analysis for the complete dataset including all populations, and for a restricted dataset including only medium-distance migrants. We determined the number of clusters for each dataset separately using fastPHASE (Scheet and Stephens, 2006) and the cross-validation procedure mentioned earlier.

Once hapFLK is estimated, it is normalized using rlm in R, and p-values are computed from the chi-squared distribution. We used a permutation analysis to establish a threshold, beyond which genomic regions would be considered to be experiencing positive selection. Specifically, we randomly shuffled population labels 100 times, re-estimated hapFLK and p-values, recorded the lowest p-value for each randomization and set the threshold to the fifth percentile across randomizations. Once these regions were identified, we determined which population was experiencing selection by comparing branch lengths for a tree built using data from the entire genome and one built using data from the region under selection. Note that results from analyses using medium-distance migrants are plotted using the resident phenotype for illustrative purposes, but the analysis was not run using these birds.

We include birds from three resident continent populations – Cazalla de la Sierra and Gibraltar in the Iberian Peninsula along with Asni in Morocco (only three birds were sampled from this African population, precluding its use in the present analysis; Supplementary file 4). The Iberian Peninsula where the other two populations are found is highly heterogeneous as a result of the effects of mountains and plateaus that create variation in seasonality and, consequently, in the intensity of blackcap migratory behaviour (Pérez-Tris and Tellería, 2002; Tellería et al., 2001). There is also some evidence in our PCA to show that this heterogeneity has led to some differentiation between populations, as birds from Cazalla de la Sierra exhibit values more similar to migrants on PC2 (Figure 1c). Accordingly, to avoid any confounding effects from population structure, we limited our analysis to birds from Gibraltar. Results using Cazalla de la Sierra instead were very similar. For example, all of the genomic regions identified in Table 1b were also in the top 1% of the ΔPBS distribution when Cazalla de la Sierra was used as the continental reference population instead of Gibraltar.

CAVIAR

Request a detailed protocol

The principle described above for hapFLK focusing on haplotype clusters can also be applied to SNPs (FLK). We used results from an analysis with FLK and limited to genomic regions, which showed evidence of positive selection from hapFLK, to identify independent strongly associated SNPs with CAVIAR (CAusal Variants Identification in Associated Regions [Hormozdiari et al., 2014]). CAVIAR was originally designed to identify independent causal SNPs in GWAS studies. We followed methods described in Rochus et al. (2018) to modify this method for FLK, identifying SNPs with p-values <0.0001 in hapFLK outlier regions and using a correlation matrix generated by FLK by decomposing signals into loading on orthogonal components (vs. p-values from a GWAS and LD as is traditionally done with CAVIAR).

Δ PBS

Request a detailed protocol

We used a modified version of PBS (Population Branch Statistic) to complement results from hapFLK. PBS is similar to FST, but can include more than two populations and identifies regions within each population that exhibit differences in allele frequencies. This statistic was originally designed for three populations, but can be expanded to include more populations (Zhan et al., 2014). We used GL from ANGSD to obtain estimates of FST following the procedure described above (summarized into windows of 2500 kb) and used the equation below to estimate PBS from these values. This equation is an example that was applied to resident populations (R), where T is log transformed FST between the populations indicated in exponents:

TR-NW+ TR-SW+ TR-SE- TNW-SW- TSW-SE3

Recent papers have noted that FST can be elevated by reductions in within-population variation alone and that there are many factors that can reduce variation within populations, including linked selection in areas of reduced recombination that may result from purifying selection (background selection, [Cruickshank and Hahn, 2014; Noor and Bennett, 2009]). It is unlikely that this process affects our results because recombination rate should elevate estimates of PBS in all populations, but this is not the case (Figure 5a). Regardless, we followed methods from Vijay et al. (2017) to reduce any effects that linked selection may have on our results. Vijay et al. (2017) used estimates of FST between allopatric populations of crows that did not differ in their trait of interest to control for the effects of linked selection, estimating the difference in estimates of FST in focal populations and maximum FST in non-focal allopatric populations (ΔFST). FST in focal populations would have to extend beyond that in non-focal populations to be considered important in generating the trait of interest. We used the same approach for PBS. For example, ΔPBS for resident continent populations was estimated by finding the difference between PBS in residents and maximum PBS in medium-, short- and long-distance migrants.

nSL

Request a detailed protocol

The former analyses (hapFLK and PBS) rely on comparisons between phenotypes. In this last analysis, we focus on the affects that selection can have within a population instead. Specifically, selective sweeps can reduce variation at both the locus under selection and its neighbours (Smith and Haigh, 1974). Local reductions in variation result in the presence of extended regions of haplotype homozygosity within phenotypes (long haplotypes at high frequency). nSL (number of segregating sites by length) (Ferrer-Admetlla et al., 2014) is similar to the more common iHS, but instead of measuring the decay of haplotype identity as a function of recombination distance, it quantifies this decay of how many mutations remain in other haplotypes present in the dataset. In this way, nSL does not require a genetic map and is more robust to variation in not only recombination rate but also mutation rate.

For this analysis, we used selscan (v.1.20a https://github.com/szpiech/selscan) and the same vcf used in hapFLK, but split by phenotype (and scaffold). We ran the data through fastPHASE first to phase haplotypes (using 50 iterations of the EM algorithm, sampling 100 haplotypes from the posterior distribution and using same number of clusters identified for hapFLK). We normalized estimates of nSL into the same 2500-kb windows used for PBS.

Regulatory variants

Request a detailed protocol

Two sets of preliminary analyses were used to identify regulatory SNPs in the regions identified by hapFLK and PBS as being under selection. First, we focused on 3′ UTRs, downloading predicted mRNAs from Ensembl and NCBI for several bird species, including the Atlantic canary, White-throated sparrow, American crow, Great tit, Collared flycatcher, Zebra finch, Wild turkey, White-rumped munia, Hooded crow, Blue tit and Ground tit. We aligned these sequences with our annotation for the blackcap, and with transcripts assembled from RNAseq data obtained from the testes of a single male blackcap, to determine whether any of the strongly associated SNPs identified by CAVIAR were within 3' UTRs. Alignment files are available upon request.

In a second set of analyses, we used HOMER (Heinz et al., 2010) to identify known transcription factor binding sites (TFBS) in genomic regions under selection. Specifically, we used findMotifsGenome.pl with default settings to identify known motifs in each region and scanMotifGenomeWide.pl to identify the specific location in each region where the motif could be found (permitting no mismatches). HOMER includes known motifs for thousands of transcription factors (mostly for model organisms); we chose to focus on candidate transcription factors identified by previous studies as having an association with migration (Ruegg et al., 2014).

GWAS

Request a detailed protocol

In our final analysis on migratory distance, we limited our dataset to short-, medium- and long-distance migrants. We coded distance phenotype as an ordinal variable from 1 to 3 and conducted a GWAS analysis using a Bayesian sparse linear mixed model (BSLMM) (Zhou et al., 2013). We chose BSLMM models here (instead of hapFLK) because they allow the inclusion of ordinal variable (vs. categorical with hapFLK). BSLMM models include the phenotype as the response variable and allele frequencies at a set of SNPs as the predictor variable. They also include a term for factors that influence the phenotype and are correlated with genotype (e.g., population structure). BSLMMs are adaptive models that include linear mixed models (LMM) and Bayesian variable selection regression (BVSR) as special cases and that learn the genetic architecture from the data. We ran four independent chains for each BSLMM, with a burnin of 5 million steps and a subsequent 20 million MCMC steps (sampling every 1000 steps). We report one hyperparameter from this model (PVE: the proportion of variance in phenotypes explained by all SNPs, also called chip heritability) and consider SNPs with inclusion probabilities >0.01 following Gompert et al. (2013). Note, we chose to run this analysis with GEMMA instead of hapFLK as we did with our other migratory traits (orientation and propensity). This is because our focal variable here (distance) is ordinal in nature and this fact would have been lost in hapFLK. We could not code this variable as continuous because the average distance individuals in each population travel on migration is not exactly known.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
    The Birds of the Western Palearctic Vol VI
    1. S Cramp
    (1992)
    Oxford, UK: Oxford University Press.
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
    Circadian and circannual programmes in avian migration
    1. E Gwinner
    (1996)
    The Journal of Experimental Biology 199:39–48.
  33. 33
    Avian Migration
    1. E Gwinner
    2. B Helm
    (2003)
    81–95, Circannual and circadian contributions to the timing of avian migration, Avian Migration, Springer, 10.1007/978-3-662-05957-9.
  34. 34
  35. 35
  36. 36
    Experimental and analytical techniques used in bird orientation research
    1. AJ Helbig
    (1991a)
    In: P Berthold, editors. Orientation in Birds Experientia Supplementum. Basel: Birkhäuser Basel. pp. 270–306.
    https://doi.org/10.1007/978-3-0348-7208-9
  37. 37
  38. 38
  39. 39
    Genetic basis, mode of inheritance and evolutionary changes of migratory directions in palaearctic warblers (Aves: sylviidae)
    1. A Helbig
    (1996)
    The Journal of Experimental Biology 199:49–55.
  40. 40
  41. 41
    Siberian migratory divides: the role of seasonal migration in speciation
    1. DE Irwin
    2. JH Irwin
    (2005)
    In: R Greenberg, PP Marra, editors. Birds of Two Worlds: The Ecology and Evolution of Migration. Baltimore: Johns Hopkins University Press. pp. 27–40.
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
    Bird Species
    1. M Liedvogel
    2. K Delmore
    (2018)
    109–127, (Micro) Evolutionary Changes and the Evolutionary Potential of Bird Migration, Bird Species, Springer, 10.1007/978-3-319-91689-7.
  50. 50
    The genetics of animal movement and migration syndromes
    1. M Liedvogel
    2. M Lundberg
    (2014)
    In: L-A Hansson, S Åkesson, editors. Animal Movement Across Scales. New York: Oxford University Press. pp. 219–231.
    https://doi.org/10.1111/aec.12369
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
    Statistical Population Genomics. Methods in Molecular Biology
    1. S Schiffels
    2. K Wang
    (2020)
    MSMC and MSMC2: the multiple sequentially markovian coalescent, Statistical Population Genomics. Methods in Molecular Biology, New York, NY, Humana, 10.1007/978-1-0716-0199-0.
  75. 75
  76. 76
  77. 77
    Seasonal changes in abundance and flight-related morphology reveal different migration patterns in iberian forest passerines
    1. JL Tellería
    2. J Pérez-Tris
    3. R Carbonell
    (2001)
    Ardeola 48:27–46.
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89

Decision letter

  1. Elizabeth Scordato
    Reviewing Editor; California State Polytechnic University, United States
  2. Patricia J Wittkopp
    Senior Editor; University of Michigan, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "The evolutionary history and genomics of European blackcap migration" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The reviewers have opted to remain anonymous.

This manuscript compiles an enormous dataset to address questions about the genetic basis of migratory behavior in a classic model system in the field. The reviewers and reviewing editor were enthusiastic about the topic and united in finding the dataset extremely impressive. They also thought several key results of the paper were important. In particular, the data add to growing evidence that there is not a common genetic mechanism underlying migratory behavior. Furthermore, the attempt to decompose a complex behavior into component parts (propensity, distance, and orientation) was novel and interesting.

Despite this enthusiasm for the work, a number of significant issues were raised about the current presentation of the data and some of the analyses. These major concerns are synthesized below. After extensive discussion, the revisions required for the paper to be acceptable were deemed to be too substantial to be completed in the two month period allowed for revisions at eLife. We have therefore recommended rejection of this manuscript in its current state. Given the great promise of the dataset, however, the reviewers encourage submission of a significantly revised manuscript as a new submission in the future.

1) A primary issue identified by all reviewers is that the paper was generally hard to follow and contained many unrelated analyses. The Introduction does not set up clear questions and gives little background information on why the blackcap system is uniquely well-suited for addressing questions about the genetic basis of migratory behavior. The Introduction also does not set up the key findings of the paper. It is recommended that the authors re-focus the overall framing of the paper to highlight their key results, remove tangential analyses (see below for specifics), and generally streamline and contextualize results within the existing literature throughout the entire paper.

2) There was concern from all reviewers about the phylogeography section. These analyses were not clearly integrated with the rest the paper. Furthermore, the way the PCA and ADMIXTURE plots were used to interpret ancestral state seemed very ad-hoc, and the maximum likelihood tree was not well supported. The inference that migration is the ancestral state and was lost on islands is generally reasonable, as that is a pattern found in other taxa. However, if the authors choose to include the phylogeography section moving forward, removal of the PCA and ADMIXTURE interpretations is recommended. The authors should also acknowledge the limitations of their inference more openly, given the weak support in their data. Further, the overall contribution of the phylogeographic analysis to the main conclusions of the paper should be clarified throughout the text.

3) The use of SMC++ to infer divergence times was considered problematic, as this method assumes populations do not exchange genes after divergence. To more accurately date divergence times, the authors should use a demographic modeling approach based on the joint site frequency spectrum (e.g. dadi or moments) or evaluate alternative models with and without migration in an ABC framework. Alternatively, the authors could refocus that section on variable demography rather than the timing of splits. Demographic trajectories diverging earlier than previously estimated split times could then be a "suggestive" point, but shouldn't be the primary finding of that analysis. It might be useful to refer to Mazet et al., 2016 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4806692/) for guidance with these interpretations.

4) There were concerns about the authors overinterpreting causal associations between migratory behavior and loci putatively under selection in both the orientation and residency sections. Particularly in the transition to residency section, outliers cannot be separated from population structure, and could therefore be due to a variety of factors unrelated to migration. The authors suggest that finding an elevated region on super scaffold 99 in both island and resident populations indicates this region may be associated with the transition to residency. This analysis is rather unconvincing, as these shared elevated regions could be due to shared ancestry. It is also not clear if the extensive subsequent discussion of candidate loci focuses only on this shared elevated region on scaffold 99, or looks at all outliers. The language in this section should tempered, alternative explanations discussed, and it should be made clear which populations are included when identifying candidate loci.

5) Analyses of selection on standing genetic variation: as with the inferences about phylogeography, these sections read as somewhat ad-hoc and did not use previously validated methods. The first section about standing genetic variation (dealing with short distance migrants) is weakly supported- removal this section is recommended. The second section, if retained, should use methods that have been previously shown through theory or simulation to distinguish selection on standing genetic variation. It is also recommended that the authors articulate clear predictions about expected results if migratory behavior arises from selection on novel vs. standing variation.

6) More information needs to be given about sampling and phenotyping. This is particularly important for explaining how phenotypes were assigned to "short-distance SW migrants," which fall into two different clusters on PC2. This is a major gap in the paper.

7) There were concerns about the analysis of migratory distance, both in classifying distance as a categorical trait and in why a different method for identifying outliers was used in this section. The analysis of distance, if retained, should either apply the same approaches for outlier detections as were used for the analyses of propensity and orientation, or clearly explain use of a different approach. There should also be further justification for classifying distance as a categorical variable.

Reviewer #1:

This paper uses the European blackcap, a classic model system in migration research, to reconstruct the evolution of migratory behavior and identify genomic regions associated with different components of migratory phenotype. To me, the most impactful results are that the authors do not find candidate loci associated with migratory behavior in other systems to be under selection in the blackcap, adding to growing evidence against a common underlying genetic architecture of migratory behavior. Also important is that the authors looked at multiple components of migratory behavior (propensity, orientation, and distance). However, while the dataset and analyses are very impressive, I think that the authors try to do too much at the expense of clarity and a focused message. Several sections of results are unclear, key information about sampling is missing, there are many tangential and speculative sections, and the paper as a whole does not have a strong unifying framework or question.

1) While the dataset here is remarkable, the structure of the paper is quite challenging. The Introduction, in my opinion, does not articulate a clear overarching question, and the questions that are laid out in the Introduction are not linked well to the analyses presented in the paper. For example, the Introduction seems to set up a study focused on the genetic architecture of migration. However, the first part of the paper is a lengthy analysis of population structure and phylogeography that seems unrelated to any topics introduced in the Introduction and is not connected to the background provided on the system. When we get to the genetic architecture component of the paper, it is not contextualized well with previous knowledge of the system, and it is therefore hard to evaluate the significance of the results. It is also not clear how the earlier pop structure and phylogeography component is related to analyses of selection and genetic architecture.

2) Very little information is given about the sampling or how phenotypes were identified. The only information provided is in the legend of Figure 1, which states that samples were collected on the breeding grounds with the exception of a few collected during the winter or during migration. This is problematic, given that the paper relies on individual-level phenotype assignments for these samples in all analyses. Explanations of how migratory distance, propensity, and orientation are determined for individual samples are needed.

3) The section on inferring ancestral state reads as very ad-hoc and confusing. The way the PCA is interpreted is not convincing and is not supported by any citations. The results from TREEMIX and ADMIXTURE are conflicting, but no explanation for how to interpret this discordance is offered. The purpose of this section overall is also not entirely clear- how do these analyses link back to overall questions about genetic architecture of migration?

4) The treatment of short-distance migrants is confusing. Based on the PCA, ADMIXTURE plot, and the map, it looks to me like these populations are admixed between residents and migrants (possibly due to post-Pleistocene secondary contact, based on the authors' explanation of biogeography). It is unclear to me how any of these data support variation in migratory behavior arising from standing genetic variation. Are the authors suggesting that individuals in the short-distance migrant population have different migratory strategies (e.g. some are residents and some are migrants) due to variation within the population? Later evidence for the role of standing genetic variation is more convincing. The purpose of this section needs to be made clearer.

5) I was surprised, given low support for the maximum likelihood tree and the history of gene flow indicated by TREEMIX, that the authors didn't use a method of demographic reconstruction that allows migration and determination of the timing of admixture events.

6) The way results on the propensity to migrate are presented strongly suggests that regions under selection in non-migratory populations are directly associated with loss of migratory behavior. However, these regions could just as reasonably be under other sources of divergent selection between these populations (e.g. ecological differences, differences in diet, etc). Couldn't shared regions under selection in island and resident populations also be due to gene flow and shared ancestry in these populations, as indicated by TREEMIX? Overall, I don't find it very convincing that regions under selection in non-migratory populations are directly associated with migratory propensity. The authors should either temper this section and discuss alternative explanations, or make the support for their assertion clearer.

7) In the first analyses of genomic regions under selection, the authors "consider all three traits together." What does this mean? What groups of samples were actually compared?

8) I wondered why a different method was used for the analysis of migratory distance than was used for orientation and propensity. Why not make these analyses comparable? This last section in general felt somewhat tacked-on and not well integrated into the rest of the paper.

9) Overall, I think the results could be more clearly contextualized within the broader literature throughout the paper. The authors have a very cool dataset, but the parts that are novel and exciting are not highlighted very well throughout. The amount of background information given seems to assume familiarity with the blackcap system, which limits the accessibility of the paper to a broad audience. I think more effort needs to be put towards explaining why this system is so well-suited for asking these questions, and what new things we learn about migratory behavior from these analyses.

Reviewer #2:

In this manuscript, the authors use whole-genome data to study the genetic basis of three migratory traits in European blackcaps: the propensity to migrate, distance, and orientation. Leveraging the diversity of migratory strategies in blackcaps, the authors document associations between migratory behavior and SNPs in regulatory regions – providing the first genome-wide characterization of migratory behavior in this species. Not only is this an interesting study system, but the sampling and experimental design also provide a robust foundation for investigating the genomic basis of migratory behavior. In addition, I found the methodological approach to be sound and thorough and was particularly impressed with the sample sizes/coverage for re-sequencing data as well as the quality of the reference genome. Overall, I thought it was an interesting data set and enjoyed reading the manuscript.

I have a few comments/questions outlined below. It is my opinion that, if these concerns are addressed, this work would make a nice contribution to the existing literature.

Subsection “Resident populations evolved from a migratory ancestor”: While I like this section, I think you can combine the TREEMIX and demographic analyses to cover most of this section in a more compelling way. While I don't entirely disagree, I find the use of the PCA and ADMIXTURE results to infer ancestral state a bit weak compared to the demographic data. I think you have enough analyses outside of the PCA and ADMIXTURE results to draw conclusions regarding ancestral state that are more appropriate for this type of question. I would recommend that you narrow this section down to focus on fewer, but more definitive, analyses, as it would strengthen the overall argument.

Subsection “Evidence for standing genetic variation in short distance migrants”: Similar to my comment above, I found parts of this section distracting and less compelling compared to the rest of the manuscript. I think you could remove this section as I am not really sure it adds much to the paper. I feel similarly about subsection “Selection on shared genetic variation could facilitate rapid changes in migration”.

Subsection “Resequencing analysis” paragraph three: This is more just curiosity: did you try using HaplotypeCaller from GATK? I think the fact that variant calling in GATK is combined with other programs makes this a robust approach but was wondering why you chose UnifiedGenotyper.

Subsection “Linkage disequilibrium”: I don't believe that there is any mention of linkage disequilibrium in the main text of the manuscript. May be worth mentioning somewhere in the results

Figure 4: Add a Y-axis label to panel c

Figure 5: I think the figure is very aesthetically pleasing, but I am really struggling to understand what is going on in this figure. It may just be a matter of clarifying the legend, or perhaps it is just me. But I am not convinced this figure adds much. Also, I think you mean to say "panel f shows haplotype.…."

Reviewer #3:

In this study the authors analyze genetic variation in what has historically been the model species for songbird migration and present a set of results covering both phylogeographic history and selection on genes putatively associated with migratory behaviors. Helbig and Berthold's captive breeding studies in black-caps showing that migratory orientation is heritable are classics in ornithology, and an obvious followup question is "what genes are responsible for this behavior?", so I was excited to see a rigorous genome-wide analysis of the species. From that perspective I think one of the most interesting thing here is that this is now the third study (fourth if you count the brand new Toews et al. PNAS study) looking for associations of specific genomic regions with migratory behavior, and as far as I can tell there are zero overlapping outlier regions across these studies. The authors make this point but I think it could be emphasized, particularly given that much of the interest in migration as a tractable trait for genetic mapping is driven by the captive breeding studies conducted in this species.

In general I thought the new data in this paper was very good, but the results presented cover so much territory that individual analyses sometimes feel rushed and the whole picture is hard to follow. I have a few methodological issues with the phylogeography, and for the selection/association analyses I think the more speculative post-hoc analyses of specific genes should be limited. One general issue that should be addressed more directly when describing results for regions putatively associated with migratory behaviors is that migration itself may be tangential to the actual selective force – climatic variation on either the breeding or wintering range, dietary differences across geographic regions, or any other environmental factor varying among populations could create the basis for selection that could be detected by approaches like PBS. Because migratory phenotypes covary with many of these other environmental factors, it is inevitably going to be difficult to identify genes that drive (rather than being driven by) aspects of migration like orientation, distance, or phenology.

That being said I think this is an important paper in the area of migration genomics because it is in the only system with truly compelling captive-breeding results from crosses and because the dataset is excellent.

Subsection “Differentiation between populations is low”: Are migrant populations differentiable on other PC's? What about if PCA is run on just continental, or just migrant populations alone? Because PCA will represent variance in the full dataset, running it with divergent island populations may mask differentiation among migrant populations. Probably same issues in the admixture analysis. I'd like to see a supplemental figure showing at least PC1-2 when the analysis is run on just continental birds.

Subsection “Resident populations evolved from a migratory ancestor”: The methods for ancestral state reconstruction here were unfamiliar to me and seemed ad-hoc. Are there citations available for studies outlining the logic of using PCA and/or admixture results for this task? As it is the clearest evidence seemed to be from the phylogenetic analysis, which relies on poorly supported nodes in a topology constructed using a method that effectively assumes no gene flow.

Subsection “Divergence began ~250,000 year ago”: SMC++ assumes populations are isolated, so it isn't an appropriate method for dating population splits in groups that continue to exchange genes after divergence. In addition, the mutation rate and generation times (though they look about right relative to other migratory birds) don't appear to be pulled from the citation listed (Noor and Bennett, 2009 – apologies if I missed it buried in there somewhere), and these will directly scale the inferred timing of population size changes. The message that divergence is old is likely right – if gene flow occurs but isn't being captured by the analysis then divergence times should be *even older*, but if divergence time estimation is the goal then a demographic inference method explicitly incorporating gene flow and returning estimates of split times should be employed, and the empirical rates should be better justified. I'd suggest demographic modeling of the joint site frequency spectrum in dadi or moments. In addition, the end of this section regarding the possible speed of evolution of variation in migratory traits should acknowledge that much of that literature is based on the documented contemporary evolution of the NW migration and not on biogeographic reconstruction of island colonizations.

Subsection “The transition to residency may be controlled by regulatory elements” paragraph two: It was hard to tell what in this paragraph referred to results from this study, vs results from previous studies. Are Clock/Npas2/bmal1 in an outlier region here? Or is it just that there is one bHLH motif found in an outlier region? If the latter, I'd suggest cutting the second half of this section as the evidence of any specific regulatory element being involved seems quite weak.

Subsection “Examining genes associated with the transition to residency” paragraph one: Needs citations.

Subsection “Limited overlap with previous genomic studies examining this trait” paragraph two: Worth making the point here that though this gene is *associated* with migratory orientation, the proposed mechanism is not at all *causal* for orientation. This point could be more generally made in the Introduction, as well.

Subsection “Considerable variance in migratory distance is controlled by genetic variation”: This analysis did not seem particularly compelling, and (given that migratory distance is actually a continuous trait) the binning of distance as an ordinal trait seems likely to offer little power to identify causal alleles in a small cohort like this. It also isn't clear to me why a new method is used here that wasn't used in the previous association/selection analyses. I suggest cutting this section.

Subsection “Selection on shared genetic variation could facilitate rapid changes in migration”: I like this question quite a bit and think there are some cool results here, but I found this section hard to follow. I think it needs a rewrite, and the methods should be better justified. I suggest starting with a paragraph laying out expectations for what you expect given selection from novel vs standing variation (see Barret and Schluter, 2008), and using only analyses that have been previously proposed and shown through either theory or simulation to distinguish these processes.

Subsection “GWAS.”: What specifically was the factor used to represent population structure here? A matrix of pairwise genetic distance over the whole genome? Please provide a little more detail on your implementation of this method.

Figure 1C: These colors seem to partially but not entirely match the map, which is confusing (especially when referring back to this figure for color references). I'd use different colors than in the map. Greyscale would work ok with K=3.

Figure 5 legend: the letters seem to be off here – please check that e and f are correctly referenced, and discuss panels in order (a-e).

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "The evolutionary history and genomics of European blackcap migration" for further consideration by eLife. Your revised article has been evaluated by Patricia Wittkopp (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

All three reviewers found the manuscript much improved. Only reviewer 1 had a substantial outstanding comment that requires clarification. After consultation, the other two reviewers and the Reviewing Editor concur with the points raised by reviewer 1. Please address the below questions about how individuals were assigned to different phenotype groups in the different analyses. The manuscript is provisionally recommended for acceptance pending satisfactory revision on these points. The authors are also encouraged to consider the minor points raised by all three reviewers.

Reviewer #1:

I find the resubmitted manuscript to be a substantial improvement over the initial submission. The authors are to be commended for their thorough revision and for incorporating reviewer comments.

The only major comment from my previous review that I feel is not clarified in the revision is the way the three different migratory phenotypes are combined in the demographic analyses and the initial hapFLK/PBS/nSL analyses that include all populations (point #7 from my review). Let me better explain my confusion: in the analyses of population structure (Figure 1 Fst plot), there are nine populations with distinct phenotypes: continental residents, short distance sw migrants, long distance SE migrants, medium distance SW, SE, and NE migrants, and the three island populations. However, in subsequent analyses, these populations are grouped together in different ways that are not clearly explained. For the demographic analysis, the medium- and long- distance migrants are grouped together, despite variation in orientation among these populations. Why is this? Likewise, in the analyses of the genetic basis of traits, the authors describe analyzing all three phenotypes (orientation, distance, and propensity); however, each of these of course has multiple categories (e.g. SW, SE, and NE orientation). In the legend for Figure 3, they note that they are grouping birds into 5 phenotypes for panels A and B and 3 phenotypes for panels B and C. Figure 3—figure supplement 1 shows six different phenotypes (7 including those shown in Figure 3) for the ΔPBS analysis. The authors note that islands were excluded and residents limited to a single population, but it is otherwise unclear to me what these different phenotype groups are. Adding to the confusion is that 1) the analysis of differentiation and selection comes after the demographic modeling, which did not consider orientation in the grouping of phenotypes; 2) Figure 5 also combines the medium and long- distance migrants and removes orientation; and 3) sample sizes are not given for each population used in the comparisons. Some clarification of exactly which populations and phenotypes are being compared in the analyses, throughout this this section is needed, as well as justification for why medium- and long- distance migrants are combined for some analyses but not others, and clear reporting of sample sizes for each comparison in the main text (please remove from legend in Figure 3).

Reviewer #2:

The authors have done a commendable job revising this manuscript. The Introduction is much easier to follow and sets up the study nicely. The revised Materials and methods and increased focus on linking methods/results directly to specific study questions and objectives greatly improve the overall flow and readability of the paper. I appreciate the effort that has gone into addressing reviewer comments.

Reviewer #3:

The authors have done a really excellent job with this revision and I'm pleased to recommend acceptance. The revised demographic analysis and Introduction are significant improvements and all of my major questions from the first submission have been addressed well in the response.

https://doi.org/10.7554/eLife.54462.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

1) A primary issue identified by all reviewers is that the paper was generally hard to follow and contained many unrelated analyses. The Introduction does not set up clear questions and gives little background information on why the blackcap system is uniquely well-suited for addressing questions about the genetic basis of migratory behavior. The Introduction also does not set up the key findings of the paper. It is recommended that the authors re-focus the overall framing of the paper to highlight their key results, remove tangential analyses (see below for specifics), and generally streamline and contextualize results within the existing literature throughout the entire paper.

Thank you for summarizing the reviewers concerns here. We have taken them on board, re-writing the Introduction entirely and reducing the phylogeographic portion of our study substantially, including all discussion of ancestral state derived from our results (see our response to your second comment).

Our new Introduction highlights the importance of phylogeographic studies with blackcaps. For example, these studies provided important evolutionary context for work on the genetics of migration. Limited genetic differentiation was documented between blackcap populations suggesting transitions between migratory states must be very rapid and may involve on a small number of genetic changes. These studies also speak to the timing and number of transitions that have occurred within blackcaps between migratory and resident behaviour. This information is important for interpreting local signatures of differentiation.

All of the phylogeographic work conducted prior to our study relied on limited mitochondrial sequence data making our work – updating results with genome-wide data – essential. Our results support previous findings and uncover novel patterns that are important for interpreting our findings on the genetics of migration. For example, we document genomic differentiation between migrants and both resident continent and island populations. This had not been observed before and guided our analysis on the genetics of migratory orientation specifically as we knew the inclusion of genomically differentiated resident birds was limiting our power to detect signatures related to this trait. The exclusion of residents allowed us to identify regions under positive selection in the NW phenotype, one of the most exciting aspects of the blackcap system that we have been able to underline with our results.

2) There was concern from all reviewers about the phylogeography section. These analyses were not clearly integrated with the rest the paper. Furthermore, the way the PCA and ADMIXTURE plots were used to interpret ancestral state seemed very ad-hoc, and the maximum likelihood tree was not well supported. The inference that migration is the ancestral state and was lost on islands is generally reasonable, as that is a pattern found in other taxa. However, if the authors choose to include the phylogeography section moving forward, removal of the PCA and ADMIXTURE interpretations is recommended. The authors should also acknowledge the limitations of their inference more openly, given the weak support in their data. Further, the overall contribution of the phylogeographic analysis to the main conclusions of the paper should be clarified throughout the text.

We are confident that our new Introduction integrates our phylogeographic work much better than before. We discuss findings in the blackcap on the genetics of migration first and devote a second paragraph to phylogeographic studies, highlighting their importance for interpreting data on the former topic.

We have kept the PCA and ADMIXTURE analyses in our study but removed TREEMIX and ML trees. We do not use any of these analyses to infer ancestral states; we only use them to discuss patterns of genomic differentiation between populations and study demographic history. As we mentioned above, these analyses highlighted new results for the blackcap system that were important for setting up subsequent analyses (e.g. removing resident continent birds) and interpreting findings (e.g. given the low levels of population differentiation we documented in the phylogeographic portion of our work we can be confident population differentiation between migratory phenotypes (i.e. NW, SW and SE migrants) is not affecting our local genomic results). Combined with other observations from the system, genomic differentiation between resident populations also suggests that parallel signatures of selection are independent but make use of the same variation.

3) The use of SMC++ to infer divergence times was considered problematic, as this method assumes populations do not exchange genes after divergence. To more accurately date divergence times, the authors should use a demographic modeling approach based on the joint site frequency spectrum (e.g. dadi or moments) or evaluate alternative models with and without migration in an ABC framework. Alternatively, the authors could refocus that section on variable demography rather than the timing of splits. Demographic trajectories diverging earlier than previously estimated split times could then be a "suggestive" point, but shouldn't be the primary finding of that analysis. It might be useful to refer to Mazet et al., 2016 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4806692/) for guidance with these interpretations.

We agree that it may have been problematic to use of SMC++ to date population divergence in our initial submission. Accordingly, in the revised version of our manuscript we used a different program for demographic inference and complimented this analysis with cross-coalescent rate analyses. The approach that we used was introduced by the MSMC model (Schiffels and Durbin, 2011) and extended in MSMC2 (Malaspinas et al., 2012). Comparing the rate of coalescence between pairs of individuals taken in two populations to the rate of coalescence within each population allows dating the differentiation of two populations. This approach is particularly well suited to our dataset as it does not require extensive sample sizes (as for instance needed by SFS estimation) and makes use of complete genome data as it intrinsically accounts for linkage disequilibrium. In addition to the dating of population differentiation, cross-coalescence rates can be used to infer the temporal pattern of migration rate between populations (https://www.biorxiv.org/content/10.1101/585265v1.full). As our aim is to date population divergence, we decided to restrict our analyses to the differentiation timing and did not compute estimates of migration rates.

4) There were concerns about the authors overinterpreting causal associations between migratory behavior and loci putatively under selection in both the orientation and residency sections. Particularly in the transition to residency section, outliers cannot be separated from population structure, and could therefore be due to a variety of factors unrelated to migration. The authors suggest that finding an elevated region on super scaffold 99 in both island and resident populations indicates this region may be associated with the transition to residency. This analysis is rather unconvincing, as these shared elevated regions could be due to shared ancestry. It is also not clear if the extensive subsequent discussion of candidate loci focuses only on this shared elevated region on scaffold 99, or looks at all outliers. The language in this section should tempered, alternative explanations discussed, and it should be made clear which populations are included when identifying candidate loci.

We have noted the potential for other selection pressures to lead to the patterns of positive selection.

We believe we have presented strong evidence for parallel positive selection on Super-Scaffold_99 related to the transition from migrants to residents. We outline our reasons below and include them in our manuscript but are of course open to further discussion on the topic.

– Three separate population genetic measures suggest this region is under positive selection: hapFLK (which relies on haplotypes and controls for hierarchical population structure), ΔPBS (which uses SNPs and controls for linked selection) and nLS (which is a measure of within population variation that relies on haplotypes). The effect linked selection can have on signatures of selection is a very prominent problem discussed in the literature and we have used ΔPBS to address this problem (proposed in Vijay et al., 2017). Population structure between populations is low to begin with and while there was some evidence for differentiation between migrants and residents, the actual level of differentiation between migrant and resident populations is low (~0.03), should have more of a genome-wide effect and is controlled for in the analysis using hapFLK.

– It is very unlikely that selection related to other ecological features is responsible for repeated patterns of positive selection on Super-Scaffold_99 in resident continent and island birds as ecological features (both biotic and abiotic) are very different between these regions. For example, considering climate, the Iberian Peninsula is quite variable but in general it would be considered mediterranean in nature (hot, dry summers and cool, wet winters). By contrast, many of the Atlantic islands have limited seasonality and are dominated by hot desert like conditions (the Canary Islands and Cape Verde). The Azores are more seasonal (temperate with no dry season and a mild/hot summer) and these islands occur across a broad latitudinal range with increasing temperatures (and decreasing precipitation) as you move towards the equator (Cropper, 2013). Community assemblages experienced by resident populations are also different, with islands having lower diversity in general but exhibiting their own variation as well. For example, the Canary Islands host ~3x the number of bird species and endemics owing to their proximity to the mainland and greater surface area (Valente et al., 2017).

– Finally, very similar findings and interpretations have been made by other authors, including Zhan et al., 2014, who documented parallel signatures of positive selection in transitions from migratory monarchs in North America to islands. We are lucky we have a system where this kind of comparison (multiple transitions) can be made, providing more support than a single comparison.

We do not see why the source of variation upon which selection acts precludes the region on Super-Scaffold_99 from being involved in the transition to residency. Whether the variation comes from new mutations, gene flow or standing genetic variation should not matter. It is still variation in this region (with its specific genes and regulatory regions) that experiences positive selection and is likely involved in the transition. The Atlantic islands were colonized by blackcaps in different waves and before areas in southern Europe (Perez-Tris et al., 2004, Dietzen et al., 2008). It seems the use of shared variation (gene flow or standing genetic variation) could have facilitated these transitions and is actually quite interesting re. how rapid phenotypic transitions can occur.

5) Analyses of selection on standing genetic variation: as with the inferences about phylogeography, these sections read as somewhat ad-hoc and did not use previously validated methods. The first section about standing genetic variation (dealing with short distance migrants) is weakly supported- removal this section is recommended. The second section, if retained, should use methods that have been previously shown through theory or simulation to distinguish selection on standing genetic variation. It is also recommended that the authors articulate clear predictions about expected results if migratory behavior arises from selection on novel vs. standing variation.

We have removed the first section on standing genetic variation and short distance migrants. Please see our response to reviewer 3 concerning the use of previous methods for inferring standing genetic variation. Briefly, we used methods outlined in the reference they suggested (Barrett and Schluter 2008) to the best of our ability; most of the methods described rely on the availability of an ancestral population (e.g. marine sticklebacks for freshwater adaptation in the three-spinned stickleback) and the authors themselves (i.e. Barrett and Schluter 2008) note there are limitations to the methods they suggest (e.g. looking for a “soft” sweep requires knowledge of what a “hard” sweep would look like in our system).

6) More information needs to be given about sampling and phenotyping. This is particularly important for explaining how phenotypes were assigned to "short-distance SW migrants," which fall into two different clusters on PC2. This is a major gap in the paper.

We had summarised all phenotyping and sampling details in Supplementary file 4, but now more prominently further added descriptions in the respective Materials and method part. Here we clarify that in most cases we scored populations, not individuals (i.e. all individuals in a population had equal phenotype even if we assume that there may be some within-population variation). This being said, our scoring methods in these locations (specifically those sampled on the Iberian Peninsula) were based on morphometry of individuals, and there is a strong correlation between wing length and migration distance in blackcaps. Specifically, the two north-Iberian populations are short winged compared to central and northern European blackcaps, indicating short-distance movements. With respect to the short-distance SW migrants we know they are migratory because they leave in winter (winter blackcap censuses in these localities resulted in zero counts in all cases). Southern residents are known to be resident because of the combination of extremely short and rounded wings and individual ringing recoveries of individuals that spend the whole year in the same territory.

7) There were concerns about the analysis of migratory distance, both in classifying distance as a categorical trait and in why a different method for identifying outliers was used in this section. The analysis of distance, if retained, should either apply the same approaches for outlier detections as were used for the analyses of propensity and orientation, or clearly explain use of a different approach. There should also be further justification for classifying distance as a categorical variable.

We have provided more context for this analysis in our response to reviewers 1 and 2. We cannot quantify migratory distance as continuous as we do not know how far an average individual from each population travels and we chose to retain some of this variation by quantifying distance as an ordinal variable. If we had used the same method as our work on migratory orientation (hapFLK) we would have lost the continuous nature of this variable all together. We have included a note in the manuscript to this effect.

Reviewer #1:

1) While the dataset here is remarkable, the structure of the paper is quite challenging. The Introduction, in my opinion, does not articulate a clear overarching question, and the questions that are laid out in the intro are not linked well to the analyses presented in the paper. For example, the Introduction seems to set up a study focused on the genetic architecture of migration. However, the first part of the paper is a lengthy analysis of population structure and phylogeography that seems unrelated to any topics introduced in the Introduction and is not connected to the background provided on the system. When we get to the genetic architecture component of the paper, it is not contextualized well with previous knowledge of the system, and it is therefore hard to evaluate the significance of the results. It is also not clear how the earlier pop structure and phylogeography component is related to analyses of selection and genetic architecture.

Thank you for these suggestions. We have re-written the Introduction, providing a more equitable consideration of both migration genetics and phylogeography and highlighting how results from phlogeographic studies are relevant to our understanding of migration genetics. For example, classic studies on both migration genetics and phylogeography highlight how rapidly migratory behaviour can evolve and limited genetic differentiation between blackcap phenotypes could suggest the effect sizes of loci underlying migration are small (or that a small number of genes control these traits). Concerning the inclusion of phylogeographic analyses as a whole, we feel they provide an important setting for subsequent analyses of migration genetics. For example, in the first analyses on this topic using hapFLK and all phenotypes we find most signatures of selection are limited to residents, likely reflecting the fact these populations evolved resident phenotypes as they colonized areas further south in Europe and on the Atlantic islands. Note we have reduced the section on phylogeography substantially including the exclusion of several analyses including TREEMIX and the ML tree.

2) Very little information is given about the sampling or how phenotypes were identified. The only information provided is in the legend of Figure 1, which states that samples were collected on the breeding grounds with the exception of a few collected during the winter or during migration. This is problematic, given that the paper relies on individual-level phenotype assignments for these samples in all analyses. Explanations of how migratory distance, propensity, and orientation are determined for individual samples are needed.

We have summarised all phenotyping and sampling details in Supplementary file 4, but now more prominently further added descriptions in the respective method part, we have also fully replied to a similar comment raised by the editor, see above. There we clarify that in most cases we scored populations, not individuals (i.e. all individuals in a population had equal phenotype even if we assume that there may be some within-population variation). Scoring methods in these locations (specifically those sampled on the Iberian Peninsula) were based on morphometry of individuals, and there is a strong correlation between wing length and migration distance in blackcaps. Other measures of phenotype are based on ringing-recovery data, stable isotope assessments as detailed in Supplementary file 4.

3) The section on inferring ancestral state reads as very ad-hoc and confusing. The way the PCA is interpreted is not convincing and is not supported by any citations. The results from TREEMIX and ADMIXTURE are conflicting, but no explanation for how to interpret this discordance is offered. The purpose of this section overall is also not entirely clear- how do these analyses link back to overall questions about genetic architecture of migration?

Thank you for this comment. We have removed all discussion of ancestral state from our Results. For example, we have excluded the ML tree and discuss results from the PCA, ADMIXTIRE and FST in terms of population differentiation only. All speculation has been removed. This section has been reduced substantially in response to comments from all reviewers.

4) The treatment of short-distance migrants is confusing. Based on the PCA, ADMIXTURE plot, and the map, it looks to me like these populations are admixed between residents and migrants (possibly due to post-Pleistocene secondary contact, based on the authors' explanation of biogeography). It is unclear to me how any of these data support variation in migratory behavior arising from standing genetic variation. Are the authors suggesting that individuals in the short-distance migrant population have different migratory strategies (e.g. some are residents and some are migrants) due to variation within the population? Later evidence for the role of standing genetic variation is more convincing. The purpose of this section needs to be made clearer.

Thank you for noting this. We have removed this section to focus on our main questions.

5) I was surprised, given low support for the maximum likelihood tree and the history of gene flow indicated by TREEMIX, that the authors didn't use a method of demographic reconstruction that allows migration and determination of the timing of admixture events.

Thank you for noting this. In the revised version of our manuscript, we complemented our demographic inference with cross-coalescent rate analyses. The approach that we used was introduced by the MSMC model (Schiffels and Durbin, 2011) and extended in MSMC2 (Malaspinas et al., 2012). Comparing the rate of coalescence between pairs of individuals taken in two populations to the rate of coalescence within each population allows dating the differentiation of two populations. This approach is particularly well-suited to our dataset as it does not require extensive sample sizes (as for instance needed by SFS estimation) and make use of complete genome data as it intrinsically accounts for linkage disequilibrium. In addition to the dating of population differentiation, cross-coalescence rates can be used to infer the temporal pattern of migration rate between populations (https://www.biorxiv.org/content/10.1101/585265v1.full). As our aim is to date population divergence, we decided to restrict our analyses to the differentiation timing and did not compute estimates of migration rates.

6) The way results on the propensity to migrate are presented strongly suggests that regions under selection in non-migratory populations are directly associated with loss of migratory behavior. However, these regions could just as reasonably be under other sources of divergent selection between these populations (e.g. ecological differences, differences in diet, etc). Couldn't shared regions under selection in island and resident populations also be due to gene flow and shared ancestry in these populations, as indicated by TREEMIX? Overall, I don't find it very convincing that regions under selection in non-migratory populations are directly associated with migratory propensity. The authors should either temper this section and discuss alternative explanations, or make the support for their assertion clearer.

We have added a note in the manuscript that parallel signatures of divergent selection in resident populations could derive from a different source of selection than migration but do not believe this is the case as ecological features (biotic and abiotic) experienced by birds on the continent and islands are very different. For example, climate on the Iberian Peninsula is variable but in general it would be considered mediterranean in nature (hot, dry summers and cool, wet winters). By contrast, many of the Atlantic islands have limited seasonality and are dominated by hot desert like conditions (the Canary Islands and Cape Verde). The Azores are more seasonal (temperate with no dry season and a mild/hot summer) and more generally, these islands occur across a broad latitudinal range with increasing temperatures (and decreasing precipitation) as you move towards the equator, Cropper, 2013). Community assemblages experienced by resident populations are also different, with islands having lower diversity in general but exhibiting their own variation as well. For example, the Canary islands host ~3x the number of birds species and endemics owing to their proximity to the mainland and greater surface area (Valente et al., 2017). A very similar argument was made by Zhan et al., 2014, in their observation that resident monarch butterflies on islands derive from continent migrant monarchs.

Concerning the source of these parallel signatures of selection, we agree these signatures could be due to gene flow or shared ancestry. We are not sure how this affects our results or interpretations. Whether the source of variation is from new mutations or shared variation does not affect our conclusion that the same regions are under selection in resident continent and island populations. Wherever the variation comes from it’s under positive selection that we believe is related to the transition from migratory to resident in these locations following their colonization (although as we discuss above we have included a note about the role other selection pressures could play). There is strong evidence from previous phylogeographic studies that colonization of the islands and southern regions of the continent occurred at different times (and at least two colonizations of the islands have occurred [0.3-3 Myr to Canaries and 4,000-40,000 ya to the other islands], Perez-Tris et al., 2004, Dietzen et al., 2008).

7) In the first analyses of genomic regions under selection, the authors "consider all three traits together." What does this mean? What groups of samples were actually compared?

The three phenotypes are orientation, distance and propensity. This means that all birds were included except for resident island populations (n=15), and resident continent birds from Asni and Cazalla (see text, n=13). We have made this clearer.

8) I wondered why a different method was used for the analysis of migratory distance than was used for orientation and propensity. Why not make these analyses comparable? This last section in general felt somewhat tacked-on and not well integrated into the rest of the paper.

Migration distance is an ordinal variable: short (1), medium (2), and long (3) distance migrants. We did not want to lose this information by running this analysis through hapFLK where it would be considered a categorical variable. GEMMA allows you to code your phenotype as ordinal. We have made this clear in the manuscript and think this analysis is integrated well enough as it is clear in our statement of objectives we will be studying the genetic basis of all three traits and this is restated at the start of the localized analyses as well.

9) Overall, I think the results could be more clearly contextualized within the broader literature throughout the paper. The authors have a very cool dataset, but the parts that are novel and exciting are not highlighted very well throughout. The amount of background information given seems to assume familiarity with the blackcap system, which limits the accessibility of the paper to a broad audience. I think more effort needs to be put towards explaining why this system is so well-suited for asking these questions, and what new things we learn about migratory behavior from these analyses.

Thank you for this suggestion. We have re-written the Introduction to provide much more background information on the blackcap.

Reviewer #2:

[…] I have a few comments/questions outlined below. It is my opinion that, if these concerns are addressed, this work would make a nice contribution to the existing literature.

Subsection “Resident populations evolved from a migratory ancestor”: While I like this section, I think you can combine the TREEMIX and demographic analyses to cover most of this section in a more compelling way. While I don't entirely disagree, I find the use of the PCA and ADMIXTURE results to infer ancestral state a bit weak compared to the demographic data. I think you have enough analyses outside of the PCA and ADMIXTURE results to draw conclusions regarding ancestral state that are more appropriate for this type of question. I would recommend that you narrow this section down to focus on fewer, but more definitive, analyses, as it would strengthen the overall argument.

Thank you for this suggestion. We have eliminated all discussion of ancestral states in our analysis. We have removed the ML tree as population splits were not well supported along with TREEMIX. We have not removed our PCA or ADMIXTURE analyses. Instead we use them to describe population differentiation alone. Their use in this capacity is well supported in the literature. We have described this decision in full in our response to the editor above.

Subsection “Evidence for standing genetic variation in short distance migrants”: Similar to my comment above, I found parts of this section distracting and less compelling compared to the rest of the manuscript. I think you could remove this section as I am not really sure it adds much to the paper. I feel similarly about subsection “Selection on shared genetic variation could facilitate rapid changes in migration”.

Thank you for this suggestion. We have removed both of these sections.

Subsection “Resequencing analysis” paragraph three: This is more just curiosity: did you try using HaplotypeCaller from GATK? I think the fact that variant calling in GATK is combined with other programs makes this a robust approach but was wondering why you chose UnifiedGenotyper.

We relied mostly on genotype probabilities from ANGSD and so did not think the additional computational time required for HaplotypeCaller is necessary.

Subsection “Linkage disequilibrium”: I don't believe that there is any mention of linkage disequilibrium in the main text of the manuscript. May be worth mentioning somewhere in the results

We have removed this section, thank you for highlighting this.

Figure 4: Add a Y-axis label to panel c

Thank you for noting this. We have done so.

Figure 5: I think the figure is very aesthetically pleasing, but I am really struggling to understand what is going on in this figure. It may just be a matter of clarifying the legend, or perhaps it is just me. But I am not convinced this figure adds much. Also, I think you mean to say "panel f shows haplotype.…."

We have reduced the number of panels, split this into three figures (Figure 4, 5 and Figure 4—figure supplement 1) and improved the figure legends.

Reviewer #3:

[…] In general I thought the new data in this paper was very good, but the results presented cover so much territory that individual analyses sometimes feel rushed and the whole picture is hard to follow. I have a few methodological issues with the phylogeography, and for the selection/association analyses I think the more speculative post-hoc analyses of specific genes should be limited. One general issue that should be addressed more directly when describing results for regions putatively associated with migratory behaviors is that migration itself may be tangential to the actual selective force – climatic variation on either the breeding or wintering range, dietary differences across geographic regions, or any other environmental factor varying among populations could create the basis for selection that could be detected by approaches like PBS. Because migratory phenotypes covary with many of these other environmental factors, it is inevitably going to be difficult to identify genes that drive (rather than being driven by) aspects of migration like orientation, distance, or phenology.

We have included this caveat, that other selection pressures may be responsible for signatures of positive selection in our study but note, in the case of consistent patterns exhibited by resident island and continent populations this is unlikely (see our response to editor and the revised text for a full description).

That being said I think this is an important paper in the area of migration genomics because it is in the only system with truly compelling captive-breeding results from crosses and because the dataset is excellent.

Subsection “Differentiation between populations is low”: Are migrant populations differentiable on other PC's? What about if PCA is run on just continental, or just migrant populations alone? Because PCA will represent variance in the full dataset, running it with divergent island populations may mask differentiation among migrant populations. Probably same issues in the admixture analysis. I'd like to see a supplemental figure showing at least PC1-2 when the analysis is run on just continental birds.

We have added the requested figure to the supplementary material (Figure 1—figure supplement 1). There is no additional structure between migratory populations when the island populations are removed.

Subsection “Resident populations evolved from a migratory ancestor”: The methods for ancestral state reconstruction here were unfamiliar to me and seemed ad-hoc. Are there citations available for studies outlining the logic of using PCA and/or admixture results for this task? As it is the clearest evidence seemed to be from the phylogenetic analysis, which relies on poorly supported nodes in a topology constructed using a method that effectively assumes no gene flow.

We have removed all discussion on ancestral states.

Subsection “Divergence began ~250,000 year ago”: SMC++ assumes populations are isolated, so it isn't an appropriate method for dating population splits in groups that continue to exchange genes after divergence. In addition, the mutation rate and generation times (though they look about right relative to other migratory birds) don't appear to be pulled from the citation listed (Noor and Bennett, 2009 – apologies if I missed it buried in there somewhere), and these will directly scale the inferred timing of population size changes. The message that divergence is old is likely right – if gene flow occurs but isn't being captured by the analysis then divergence times should be *even older*, but if divergence time estimation is the goal then a demographic inference method explicitly incorporating gene flow and returning estimates of split times should be employed, and the empirical rates should be better justified. I'd suggest demographic modeling of the joint site frequency spectrum in dadi or moments. In addition, the end of this section regarding the possible speed of evolution of variation in migratory traits should acknowledge that much of that literature is based on the documented contemporary evolution of the NW migration and not on biogeographic reconstruction of island colonizations.

Thank you for this suggestion. In the revised version of our manuscript, we complemented our demographic inference with cross-coalescent rate analyses. The approach that we used was introduced by the MSMC model (Schiffels and Durbin, 2011) and extended in MSMC2 (Malaspinas et al., 2012). Comparing the rate of coalescence between pairs of individuals taken in two populations to the rate of coalescence within each population allows dating the differentiation of two populations. [Note: I would not copy the figure, just refer to the original paper.] [Eventually write:] This approach is particularly well-suited to our dataset as it does not require extensive sample sizes (as for instance needed by SFS estimation) and make use of complete genome data as it intrinsically accounts for linkage disequilibrium. In addition to the dating of population differentiation, cross-coalescence rates can be used to infer the temporal pattern of migration rate between populations (https://www.biorxiv.org/content/10.1101/585265v1.full). As our aim is to date population divergence, we decided to restrict our analyses to the differentiation timing and did not compute estimates of migration rates.

Subsection “The transition to residency may be controlled by regulatory elements” paragraph two: It was hard to tell what in this paragraph referred to results from this study, vs results from previous studies. Are Clock/Npas2/bmal1 in an outlier region here? Or is it just that there is one bHLH motif found in an outlier region? If the latter, I'd suggest cutting the second half of this section as the evidence of any specific regulatory element being involved seems quite weak.

We have made this clearer and while speculative we think it is important to include as the motif that corresponds with the literature search conducted by Ruegg et al., 2014, is located on Super-Scaffold_99 which shows parallel signatures of selection related to the transition from migration to residency in both continent and island birds.

Subsection “Examining genes associated with the transition to residency” paragraph one: Needs citations.

We have added references.

Subsection “Considerable variance in migratory distance is controlled by genetic variation”: This analysis did not seem particularly compelling, and (given that migratory distance is actually a continuous trait) the binning of distance as an ordinal trait seems likely to offer little power to identify causal alleles in a small cohort like this. It also isn't clear to me why a new method is used here that wasn't used in the previous association/selection analyses. I suggest cutting this section.

Migration distance is a continuous variable in theory but we do not have this kind of detail for our populations. We don’t know the exact distances an average bird from each population migrates. We did not want to lose the continuous nature of this variable entirely though so we chose to include it as an ordinal variable: short (1), medium (2), and long (3) distance migrants. We did not want to lose this information by running this analysis through hapFLK where it would be considered a categorical variable. GEMMA allows you to code your phenotype as ordinal. We have made this clear in our manuscript and think this analysis is integrated well enough as it is clear in our statement of objectives we will be studying the genetic basis of all three traits and this is restated at the start of the localized analyses as well.

Subsection “Selection on shared genetic variation could facilitate rapid changes in migration”: I like this question quite a bit and think there are some cool results here, but I found this section hard to follow. I think it needs a rewrite, and the methods should be better justified. I suggest starting with a paragraph laying out expectations for what you expect given selection from novel vs standing variation (see Barret and Schluter, 2008, citation 67), and using only analyses that have been previously proposed and shown through either theory or simulation to distinguish these processes.

We appreciate the reviewers comment. Barrett and Schluter, 2008, make three suggestions for showing selection acts on standing genetic variation. We go through each of these below and note that the authors themselves stated that “none of [their] approaches [are] infallible” and they “identify possible difficulties with each”.

1) Barrett and Schluter, 2008, suggest looking for evidence of soft sweeps around regions under selection using different population parameters. This analysis relies on the assumption that nearby neutral alleles will sweep with beneficial allele and is largely descriptive. For example, recombination around an allele that has been segregating in an ancestral population for some time will break up associations with nearby neutral sites. This should result in a narrow, shallow reduction in polymorphism compared to hard sweeps. Barrett and Schluter, 2008, do not provide any guidance on how to quantify how “narrow” and “shallow” regions have to be under this scenario. In addition, by their nature these signatures are more difficult to detect (narrow and shallow signatures will stand out less than broad, deep patterns). Finally, depending on how long the allele has been in the population any signature of a sweep could be eliminated completely by recombination. The latter point was noted by Barrett and Schluter, 2008. Given the difficulties associated with this method we have chosen not to employ it but we do note in the text that the peaks we identified are quite narrow.

2) Barrett and Schluter, 2008, also suggest finding evidence the allele under selection still occurs as standing variation in the ancestral population. This suggestion obviously requires us to know and still have an ancestral population to sample. This is easier to do in some systems (e.g. Colosimo et al., 2005 focus on an allele associated with armour in freshwater sticklebacks and are able to sample that marine population for this allele); we have done our best, noting that the migratory phenotype is likely ancestral in blackcaps (based on previous work to ours) and most of the haplotypes under selection are present in migratory populations (e.g. Figure 4). If these are the ancestral populations (suggested by previous authors including Perez-Tris et al., 2004 and Voelker and Light, 2011) our data on this topic support the use of standing genetic variation. A similar argument was made by Zhan et al. (2014) focusing on monarch butterflies where multiple losses of migratory behaviour involved selection at the same haplotype that is still present in the migratory population.

3) The last suggestion by Barrett and Schluter, 2008, is to use a phylogenetic study to determine the age of genomic regions under selection. Again their example comes from the Colosimo et al., 2005 paper on sticklebacks. These authors compared estimates of neutral sequence divergence between populations that differ in their armour phenotype to splits of known age within the same system to show the allele under selection is quite old. Unfortunately we do not have any closely related population pairs of known age to calibrate estimates of divergence in blackcaps but we have estimated neutral sequence divergence (the number of synonymous segregating sites) in all migratory populations and do not find a difference. Please note, this suggestion by Barrett and Schluter, 2008, may not be very reliable as the regions we’re looking at have undergone selective sweeps and patterns of neutral divergence may be obscured by the effects of selective sweeps on polymorphisms. This would make it difficult to use diversity to age divergence between haplotypes and was not noted by Barrett and Schluter, 2008.

We have also compared sequence divergence between all haplotypes and our ancestral sequence (derived using a comparison with two closely related species, the garden warbler (migratory) and hill babbler (resident)]).

We have also added two new analyses. First, we compare a phylogeny built using all data (ML tree) to one built using data from the genomic region under selection on Super-Scaffold 73 only (Figure 6). The genome-wide phylogeny is largely unresolved and has the garden warbler and hill babbler as a sister clade to blackcaps (as noted in many other phylogenetic studies on the Sylvia clade). By contrast, the phylogeny built using data from Super-Scaffold 73 suggests garden warblers are more closely related to blackcaps and medium distance NW birds diverged first, with the remaining blackcaps forming a polytomy as in the genome-wide tree. Garden warblers are migratory and the placement of NW birds at the base of the phylogeny suggests the haplotype in this population old and had been in the population before it underwent selection in the NW phenotype.

Our second new analysis is related to a pattern documented by Przeworski and Coop, 2011. These authors showed that SFS for regions experiencing selection from standing genetic variation are often more variable then those from de novo mutations and have more alleles at intermediate frequencies (see text for additional details). We note exactly this pattern in our study (Figure 6). Note this analysis involves resampling and a comparison of SFS from our data and the resamples. It would be ideal to compare the distribution of our observed data to resamples but at this time we are not aware of any way to compare distributions with these patterns (e.g. high frequencies at one end.).

Subsection “GWAS.”: What specifically was the factor used to represent population structure here? A matrix of pairwise genetic distance over the whole genome? Please provide a little more detail on your implementation of this method.

It is a kinship matrix generated using genome-wide SNP data.

Figure 1C: These colors seem to partially but not entirely match the map, which is confusing (especially when referring back to this figure for color references). I'd use different colors than in the map. Greyscale would work ok with K=3.

They are the exact same colours and we feel are instructive, helping the reader quickly see that there is admixture in both resident groups from the migrants. Accordingly, which we appreciate this comment we have chosen to stick with our original colour scheme.

Figure 5 legend: the letters seem to be off here – please check that e and f are correctly referenced, and discuss panels in order (a-e).

Thank you for noticing this. We have made the change.

[Editors’ note: what follows is the authors’ response to the second round of review.]

Reviewer #1:

I find the resubmitted manuscript to be a substantial improvement over the initial submission. The authors are to be commended for their thorough revision and for incorporating reviewer comments.

The only major comment from my previous review that I feel is not clarified in the revision is the way the three different migratory phenotypes are combined in the demographic analyses and the initial hapFLK/PBS/nSL analyses that include all populations (point #7 from my review). […] Some clarification of exactly which populations and phenotypes are being compared in the analyses, throughout this this section is needed, as well as justification for why medium- and long- distance migrants are combined for some analyses but not others, and clear reporting of sample sizes for each comparison in the main text (please remove from legend in Figure 3).

Thank you for this comment. There appear to be two problems here: (1) why were medium and long distance migrants combined for the demographic analysis (and in Figure 5A) and (2) what samples were used for analyses on the genetic basis of different migratory traits. We will address these concerns in turn below.

1) Why were medium and long distance migrants combined for demographic analyses (and in Figure 5A)?

Each one of the orientations (NW, SW and SE) were plotted individually in Figure 5A. They show similar trends making this difficult to see and our figure legend lacked clarity. We have improved the legend for Figure 5A stating that medium distance migrants were not combined for this plot.

We combined medium and long distance migrants for the demographic analyses because they exhibited very little population structure or differentiation (Figure 1) and thus we assumed they would exhibit similar demographic trajectories. We have confirmed this assumption now, running demographic analyses separately for each phenotype. The trajectories are indistinguishable and can now be found in Figure 2—figure supplement 4. We also reran cross-coalescent analyses and no population splits were observed (relative CCR stay close to 1 all the time; Figure 3—figure supplement 1). This information is in the main text as well.

2) What samples were used for analyses on the genetic basis of different migratory traits?

Analyses of population structure and demography included the full dataset. Subsets of the dataset were used in analyses focused on the genetics of migratory traits. We have included three new columns in Supplementary file 4 showing the exact birds used in each subset and – as requested by the reviewer – we have moved information on sample sizes from figure captions to the main text.

We have also removed the panel in Figure 3—figure supplement 1 showing estimates of PBS for residents as this is already shown in the main text (Figure 3A). Six groups (residents [shown in Figure 3A], short distance SW, medium distance NW, SW, SE and long distance SE birds were used in this analysis.

https://doi.org/10.7554/eLife.54462.sa2

Article and author information

Author details

  1. Kira Delmore

    Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
    Present address
    Department of Biology, Texas A&M University, College Station, United States
    Contribution
    Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    delmore@evolbio.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4108-9729
  2. Juan Carlos Illera

    Research Unit of Biodiversity (UO-CSIC-PA), Oviedo University, Mieres, Spain
    Contribution
    Resources, Data curation, Funding acquisition, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4389-0264
  3. Javier Pérez-Tris

    Department of Biodiversity, Ecology and Evolution, Complutense University of Madrid, Madrid, Spain
    Contribution
    Resources, Data curation, Funding acquisition, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  4. Gernot Segelbacher

    Wildlife Ecology and Management, University Freiburg, Freiburg, Germany
    Contribution
    Resources, Data curation, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Juan S Lugo Ramos

    Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
    Contribution
    Formal analysis, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Gillian Durieux

    Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
    Contribution
    Formal analysis, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Jun Ishigohoka

    Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
    Contribution
    Formal analysis, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5713-9391
  8. Miriam Liedvogel

    Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Methodology, Project administration, Writing - review and editing
    For correspondence
    liedvogel@evolbio.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8372-8560

Funding

Natural Sciences and Engineering Research Council of Canada (PDF)

  • Kira Delmore

Max-Planck-Gesellschaft (MPRG)

  • Miriam Liedvogel

Regional Government of Asturias (GRUPIN: IDI/2018/000151)

  • Juan Carlos Illera

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We acknowledge funding from the Max Planck Society (MPRG grant to ML), NSERC (PDF to KD) and Regional Government of Asturias (GRUPIN to JCI, Ref.: IDI/2018/000151). We thank: Staffan Bensch, Andreas Helbig, Stuart Bearhop, and Thord Fransson for providing samples; Conny Burghardt, Heinke Buhtz and Sven Künzel for lab work; Diethard Tautz, Tobias Kaiser and Sandra Bouwhuis for comments on early drafts; Julien Dutheil, Reto Burri, Kristian Ullrich, Christine Merlin and Aldrin Lugena for discussions of analyses; and Saki Chan at BioNano for the hybrid assembly of our genome. We would also like to thank Elizabeth Scordato, Patricia Wittcopp and three anonymous reviewers for constructive feedback on an earlier version of this manuscript. Permits were provided to JCI for samples collected in Morocco (Haut Commissariat aux Eaux et Forets et a la Lutte Contre la Desertification, 206/2011, 13 Jan 2011), Cape Verde (Ministerio do Ambiente - Habitacao e Ordenamento do Territorio, 18/CITES/DNA, 17 Dec 2015) and the Azores (Instituto da Conservacao da Natureza e da Biodiversidade, 171/2008, 31 Mar 2009). JP-T received permits for samples collected in Gibraltar and Cazalla de la Sierra (Consejeria de Medio Ambiente, 50.725.548-Z, 12 May 2011), Alava (Arabako Foru Aldundia, 50.725.548-Z, 12 Apr 2011) and Guadaramma (Consejeria de Medio Ambiente - Vivenda y Ordenacion del Territorio, 10/160876.9/10, 12 Apr 2010). Thord Fransson received permits for samples collected in Stockholm (Stockholms djurförsöksetiska nämnd Dnr N 16/16 2016-02-25). Permits were provided to GS for samples collected in the remaining locations (Regierungspräsidium Freiburg, 55–8853.17/0).

Ethics

Animal experimentation: Permits to JCI for samples collected in Morocco (Haut Commissariat aux Eaux et Forets et a la Lutte Contre la Desertification, 206/2011, 13 Jan 2011), Cape Verde (Ministerio do Ambiente - Habitacao e Ordenamento do Territorio, 18/CITES/DNA, 17 Dec 2015) and the Azores (Instituto da Conservacao da Natureza e da Biodiversidade, 171/2008, 31 Mar 2009); Permits to JP-T for samples collected in Gibraltar and Cazalla de la Sierra (Consejeria de Medio Ambiente, 50.725.548-Z, 12 May 2011), Alava (Arabako Foru Aldundia, 50.725.548-Z, 12 Apr 2011) and Guadaramma (Consejeria de Medio Ambiente - Vivenda y Ordenacion del Territorio, 10/160876.9/10, 12 Apr 2010); Permit to Thord Fransson for samples collected in Stockholm (Stockholms djurförsöksetiska nämnd Dnr N 16/16 2016-02-25); Permits to GS for samples collected in the remaining locations (Regierungspräsidium Freiburg, 55-8853.17/0).

Senior Editor

  1. Patricia J Wittkopp, University of Michigan, United States

Reviewing Editor

  1. Elizabeth Scordato, California State Polytechnic University, United States

Publication history

  1. Received: December 16, 2019
  2. Accepted: March 13, 2020
  3. Version of Record published: April 21, 2020 (version 1)

Copyright

© 2020, Delmore et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,956
    Page views
  • 261
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Epidemiology and Global Health
    2. Evolutionary Biology
    Jonathan Stieglitz et al.
    Research Article

    In comparative cross-species perspective, humans experience unique physical impairments with potentially large consequences. Quantifying the burden of impairment in subsistence populations is critical for understanding selection pressures underlying strategies that minimize risk of production deficits. We examine among forager-horticulturalists whether compromised bone strength (indicated by fracture and lower bone mineral density, BMD) is associated with subsistence task cessation; we estimate the magnitude of productivity losses associated with compromised bone strength. Fracture is associated with cessation of hunting, tree chopping and walking long distances, but not tool manufacture. Age-specific productivity losses from hunting cessation associated with fracture and lower BMD are substantial: ~397 lost kcals/day, with expected future losses of up to 1.9 million kcals (22% of expected production). Productivity loss is thus substantial for high strength and endurance tasks. Determining the extent to which impairment obstructs productivity in contemporary subsistence populations improves our ability to infer past consequences of impairment.

    1. Ecology
    2. Evolutionary Biology
    Susanne RK Zajitschek et al.
    Research Article Updated

    Biomedical and clinical sciences are experiencing a renewed interest in the fact that males and females differ in many anatomic, physiological, and behavioural traits. Sex differences in trait variability, however, are yet to receive similar recognition. In medical science, mammalian females are assumed to have higher trait variability due to estrous cycles (the ‘estrus-mediated variability hypothesis’); historically in biomedical research, females have been excluded for this reason. Contrastingly, evolutionary theory and associated data support the ‘greater male variability hypothesis’. Here, we test these competing hypotheses in 218 traits measured in >26,900 mice, using meta-analysis methods. Neither hypothesis could universally explain patterns in trait variability. Sex bias in variability was trait-dependent. While greater male variability was found in morphological traits, females were much more variable in immunological traits. Sex-specific variability has eco-evolutionary ramifications, including sex-dependent responses to climate change, as well as statistical implications including power analysis considering sex difference in variance.