Research Article

Genetics and Genomics

The genetic basis for ecological adaptation of the Atlantic herring revealed by genome sequencing

Uppsala University, Sweden
University of Macau, China
BGI-Shenzhen, China
Qingdao University, China
Stockholm University, Sweden
Southeast University, China
Swedish University of Agricultural Sciences, Sweden
University of Bergen, Norway
Hjort Center of Marine Ecosystem Dynamics, Norway
Institute of Marine Research, Norway
Texas A&M University, United States

May 3, 2016

https://doi.org/10.7554/eLife.12081

Open access
Copyright information

Abstract
eLife digest
Main text
Data availability
References
Article and author information
Metrics

Abstract

Ecological adaptation is of major relevance to speciation and sustainable population management, but the underlying genetic factors are typically hard to study in natural populations due to genetic differentiation caused by natural selection being confounded with genetic drift in subdivided populations. Here, we use whole genome population sequencing of Atlantic and Baltic herring to reveal the underlying genetic architecture at an unprecedented detailed resolution for both adaptation to a new niche environment and timing of reproduction. We identify almost 500 independent loci associated with a recent niche expansion from marine (Atlantic Ocean) to brackish waters (Baltic Sea), and more than 100 independent loci showing genetic differentiation between spring- and autumn-spawning populations irrespective of geographic origin. Our results show that both coding and non-coding changes contribute to adaptation. Haplotype blocks, often spanning multiple genes and maintained by selection, are associated with genetic differentiation.

https://doi.org/10.7554/eLife.12081.001

eLife digest

The Atlantic herring is one of the most common fish in the world and has been a crucial food resource in northern Europe. One school of herring may comprise billions of fish, but previous studies had only revealed very few genetic differences in herring from different geographic regions. This was unexpected since Atlantic herring is one of the few marine species that can reproduce throughout the brackish Baltic Sea, which can be about a tenth as salty as the Atlantic Ocean.

This unexpected finding could be explained in at least two different ways. Firstly, perhaps Atlantic herring are flexible enough to adapt to very different environments (i.e. high or low salinity) without much genetic change. Secondly, the previous studies only looked at a handful of sites in the Atlantic herring’s genome and so it is possible that genetic differences at other genes control this fish’s adaptation instead.

Now, Martinez Barrio, Lamichhaney, Fan, Rafati et al. have sequenced entire genomes from groups of Atlantic herring and revealed hundreds of sites that are associated with adaptation to the Baltic Sea. The analysis also identified a number of genes that control when these fish reproduce by comparing herring that spawn in the autumn with those that spawn in spring. This is important because natural populations must carefully time when they reproduce to maximize the survival of their young.

These new findings provide compelling evidence that changes in protein-coding genes and stretches of DNA that regulate the expression of other genes both contribute to adaptation in herrings. The analysis also clearly shows that variants of genes that contribute to adaptation were likely to evolve over time by accumulating multiple sequence changes affecting the same gene. Furthermore, these gene variants essentially form a rich “tool-box” that underlies the Atlantic herring’s adaptation to its environment, and different subpopulations of herring were found to have their own optimal sets of gene variants. For instance, autumn-spawning herring and spring-spawning herring from the Baltic Sea both have gene variants that favor adaptation to low salinity. However, autumn-spawning Baltic herring also share gene variants that favor spawning in the autumn with autumn-spawning herring from the North Sea, but not with spring-spawning Baltic herring.

The next step will be to study how the 500 or so genes identified affect adaptation at the molecular level. This will likely involve experiments with other model fish such as zebrafish and sticklebacks. Finally, these new findings can be directly applied to monitor stocks of herring to make herring fisheries more sustainable.

https://doi.org/10.7554/eLife.12081.002

Main text

The Atlantic herring (Clupea harengus) is a pelagic fish that occurs in huge schools, up to billions of individuals. The herring fishery has been crucial for food security and economic development in Northern Europe and currently ranks among the five largest fisheries in the world with nearly 2 million tons fish landed annually (FAO, 2014). The herring is one of few marine fishes that reproduce throughout the Baltic Sea where the salinity drops to 2–3‰ in the Bothnian Bay, compared with 35‰ in the Atlantic Ocean (Figure 1A). This ecological adaptation must be recent because the brackish Baltic Sea has only existed for 10,000 years following the last glaciation (Andrén et al., 2011). Fishery biologists have for more than a century recognized stocks of herring defined by spawning location, spawning time, morphological characters and life history parameters (Iles and Sinclair, 1982; McQuinn, 1997). Several decades of genetic studies based on limited numbers of genetic markers (allozymes, microsatellites or SNPs) have not been able to verify this divergence; extremely low levels of differentiation even between geographically distant populations as well as between spring- and autumn-spawning herring have been observed (Andersson et al., 1981; Ryman et al., 1984; Larsson et al., 2007; 2010, Limborg et al., 2012). It has been proposed that lack of precision in homing behaviour of herring causes sufficient gene flow between stocks to counteract genetic differentiation (McQuinn, 1997). However, in a recent study we constructed an exome assembly and used this in combination with whole genome sequencing of eight population samples and found more than 400,000 SNPs (Lamichhaney et al., 2012). We confirmed lack of differentiation at most loci, whereas a small percentage showed highly significant differentiation. Simulations demonstrated that the distribution of fixation index (F_ST)-values among herring populations deviated significantly from expectation for selectively neutral loci.

Figure 1 with 1 supplement see all

Download asset Open asset

Demographic history and phylogeny.

(A) Geographic location of samples. The salinity of the surface water in different areas is indicated schematically. Autumn spawners are marked with an asterisk. (B) Demographic history. Black circles indicate effective population size over time estimated by diCal (Sheehan et al., 2013); estimates are averages from four arbitrarily chosen genomic regions. The grey field is confidence interval ( ± 2 sd), while light grey lines show the underlying estimates from each genomic region. (C) Neighbor-joining phylogenetic tree. The evolutionary distance between Atlantic and Pacific herring was calculated using mtDNA cytochrome B sequences; right panel, zoom-in on the cluster of Atlantic and Baltic herring populations. Colour codes for sampling locations are the same as in Figure 1A. (D) Global distribution of *F_ST* –values based on 19 populations of Atlantic and Baltic herring. The inset illustrates the tail of the distribution. The mean and median of this distribution are indicated. To reduce the *F_ST* sampling variance, we only used SNPs with ≥30x coverage in each population.

https://doi.org/10.7554/eLife.12081.003

Genetic studies of ecological adaptation in natural populations is challenging because genetic differentiation caused by natural selection is often confounded with genetic differences due to genetic drift caused by restricted effective population sizes. An ideal species for studying the genetic basis of ecological adaptation should comprise subpopulations of infinite size and exposed to different ecological conditions. In such a species there is minute genetic drift and genetic differentiation is caused by selection resulting in local adaptation. The herring is close to being such an ideal subject for studies of ecological adaptation due to the extremely low levels of genetic differentiation at most loci as documented in previous studies (Andersson et al., 1981; Ryman et al., 1984; Larsson et al., 2007; 2010; Limborg et al., 2012; Lamichhaney et al., 2012). This unique opportunity together with herring being such a valuable natural resource prompted us to generate a genome assembly and perform genome sequencing of populations adapted to different ecological conditions.

Here we present a high-quality genome assembly for the Atlantic herring, and results of whole genome sequencing of 20 population samples using pooled DNA. The results were verified by individual genotyping using a custom-made 70k SNP array. Our study addresses two fundamentally different types of adaptations; one example of niche expansion (adaptation to low salinity), and one example of sympatric balancing selection (variation in the timing of reproduction). The results provide a comprehensive list of hundreds of independent loci underlying ecological adaptation and shed light on the relative importance of coding and non-coding variation. The results have important implications for sustainable fishery management, and provide a road map for cost effective high-resolution characterization of genetic diversity in natural populations.

Results

Genome assembly and annotation

Clupeiformes represents an early diverging clade of the otomorpha (Near et al., 2012) (Figure 2A). The genome size for herring has been estimated at ~850 Mb (Hinegardner and Rosen, 1972; Ida et al., 1991; Ohno et al., 1969) with no recent whole genome duplications reported. We performed whole genome assembly based on short read sequencing of libraries ranging from 170 bp to 20 kb insert sizes (Supplementary file 1A). The 808 Mb assembly had a scaffold N50 of 1.84 Mb with 23,336 predicted coding gene models. It showed a high degree of completeness based on RNAseq alignments, core gene analyses and comparisons to other fish gene sets (Table 1, Supplementary files 2, 3A–D, Figure 2B, Figure 2—figure supplements 1–2). The GC content was 44%, and repetitive elements made up 31% of the assembly (Table 1). Alignments of synthetic long reads (SLRs; Illumina) failed to significantly improve the assembly due to coincidental gaps between the assembly and the SLRs, but proved useful in phasing parental alleles (Materials and methods; Figure 2—figure supplements 3–4) and dramatically improved the discovery of indels larger than 30 bp compared to short Illumina reads (Supplementary file 1F). We identified 150 endogenous retroviruses (ERVs) constituting ~0.14% of the genomic sequence but none included open reading frames in all gag, pol and env genes (Supplementary file 1, Figure 2—figure supplement 5).

Figure 2 with 5 supplements see all

Download asset Open asset

Genome assembly and annotation.

(A) Phylogeny of ray-finned fishes (*Actinopterygii*) from the Devonian to the present, time-calibrated to the geological time scale based on Near et al. (2012). Geological abbreviations: C (Carboniferous), CZ (Cenozoic), D (Devonian), J (Jurassic), K (Cretaceous), Ng (Neogene), P (Permian), Pg (paleogene) and Tr (Triassic). Dating of the specific rounds of whole genome duplication is based on Glasauer and Neuhauss (2014). Abbreviations: Ts3R (teleost-specific third round) and Ss4R (salmonid-specific fourth round) of duplication. The number of species with a genome assembly available is marked within parentheses after their group’s name. Atlantic herring belongs to Clupeiformes, the order indicated in red letters. (B) Orthologous gene families across four fish genomes (*C. harengus, D. rerio, L. chalumnae* and *G. morhua*).

https://doi.org/10.7554/eLife.12081.005

Table 1

Summary of the herring assembly compared to other sequenced fish genomes.

https://doi.org/10.7554/eLife.12081.011

Species	Herring (Clupea harengus)	Zebrafish (Danio rerio)	Cod (Gadus morhua)	Coelacanth (Latimeria chalumnae)	Stickleback (Gasteosteus aculeatus)
Estimated genome size (Mb)	850	1,454^a	830^b	3,530^c	530^d
Assembly size (Mb)	808	1,412	753^b	2,861^e	463^f
Contig N50 (kb)	21.3	25.0	2.8	12.7	83.2
Scaffold N50 (Mb)	1.84	1.55	0.69	0.92	10.8
Sequencing technology^g	I	S+I	R+I	I	S
Repeat content	30.9	52.2	25.4	27.7	25.2
%GC content	44.1	36.7	45.4	43.0	44.6
Heterozygosity	1/309	n.a.	1/500	1/435	1/700
Protein-coding gene count	23,336	26,459	22,154	19,033	20,787

^a(Freeman et al., 2007; Vinogradov, 1998; Howe et al., 2013)
^b(Star et al., 2011)
^cGenome size calculated as pg x 0.978 × 10⁹ bp/pg; picogram values taken from Cimino and Bahr (1974)
^d(Vinogradov, 1998; Jones et al., 2012)
^e(Amemiya et al., 2013)
^f(Jones et al., 2012)
^gI=Illumina sequencing; S=Sanger sequencing; R=Roche 454 n.a.=not available

Population genetics and demographic history

Whole genome pooled sequencing was done using 20 population samples of herring from the Baltic Sea, Skagerrak, Kattegat, North Sea, Atlantic Ocean and Pacific Ocean (Figure 1A; Table 2); the latter sample represents the closely related Pacific herring (Clupea pallasii). Each pool comprised 47–100 fish and was sequenced to ~30x coverage. Furthermore, 16 fish, eight Baltic and eight Atlantic herring (Table 2), were sequenced individually to ~10x coverage. All data were aligned to the reference assembly and SNPs were called after rigorous quality filtering. We found 8.83 million SNPs when Pacific herring was included and 6.04 million among Atlantic and Baltic herring.

Table 2

Samples of herring used for whole genome resequencing.

https://doi.org/10.7554/eLife.12081.012

Locality^a	Sample	n	Position		Salinity (‰)	Date (yy/mm/dd)	Spawning season
Baltic Sea
Gulf of Bothnia (Kalix)^b	BK	47	N 65°52’	E 22°43’	3	800629	spring
Bothnian Sea (Hudiksvall)	BU	100	N 61°45’	E 17°30’	6	120419	spring
Bothnian Sea (Gävle)	BÄV	100	N 60°43’	E 17°18’	6	120507	spring
Bothnian Sea (Gävle)	BÄS	100	N 60°43’	E 17°18’	6	120718	summer
Bothnian Sea (Gävle)	BÄH	100	N 60°44’	E 17°35’	6	120904	autumn
Bothnian Sea (Hästskär)^c	BH	50	N 60°35’	E 17°48’	6	130522	spring
Central Baltic Sea (Vaxholm)^b	BV	50	N 59°26’	E 18°18’	6	790827	spring
Central Baltic Sea (Gamleby)^b	BG	49	N 57°50’	E 16°27’	7	790820	spring
Central Baltic Sea (Kalmar)	BR	100	N 57°39’	E 17°07’	7	120509	spring
Central Baltic Sea (Karlskrona)	BA	100	N 56°10’	E 15°33’	7	120530	spring
Central Baltic Sea	BC	100	N 55°24’	E 15°51’	8	111018	unknown
Southern Baltic Sea (Fehmarn)^b	BF	50	N 54°50’	E 11°30’	12	790923	autumn
Kattegat, Skagerrak, North Sea, Atlantic Ocean
Kattegat (Träslövsläge)^b	KT	50	N 57°03’	E 12°11’	20	781023	unknown
Kattegat (Björköfjorden)	KB	100	N 57°43’	E 11°42’	23	120312	spring
Skagerrak (Brofjorden)	SB	100	N 58°19’	E 11°21’	25	120320	spring
Skagerrak (Hamburgsund)^b	SH	49	N 58°30’	E 11°13’	25	790319	spring
North Sea^b	NS	49	N 58°06’	E 06°10’	35	790805	autumn
Atlantic Ocean (Bergen)^b	AB1	49	N 64°52’	E 10°15’	35	800207	spring
Atlantic Ocean (Bergen)^c	AB2	8	N 60°35’	E 05°00’	33	130522	spring
Atlantic Ocean (Höfn)	AI	100	N 65°49’	W 12°58’	35	110915	spring
Pacific Ocean
Strait of Georgia (Vancouver)	PH	50	-	-	35	121124	-

^aPlaces where the sample was landed (if known) are given in parenthesis
^bSamples from previous study (Lamichhaney et al., 2012)
^cEight Baltic herring from the BH sample and eight Atlantic herring from the AB2 sample were used for individual sequencing n=number of fish

Average nucleotide diversity was estimated by counting the frequency of heterozygous sites in the reference individual after stringent filtering for sequence quality and coverage (within one standard deviation of mean coverage). The estimate was one heterozygous site per 309 bp, giving a nucleotide diversity of 0.32%; no estimate based on the 16 herring sequenced individually deviated significantly from this value and there was no significant difference between Atlantic and Baltic herring. The average decay of linkage disequilibrium between loci was very steep, with average r² falling to 0.1 at a distance of 100 base pairs (Figure 1—figure supplement 1A).

The allele frequency distribution deviated significantly from the one expected for selectively neutral alleles at genetic equilibrium (p<2x10^-16, Kolmogorov-Smirnov test), due to an excess of rare alleles (Figure 1—figure supplement 1B) consistent with population expansion. The result is supported by the genome-wide distribution of Tajima’s D, which shows a global shift towards negative values (mean=−0.57 ± 0.01; Figure 1—figure supplement 1C). A demographic analysis using the diCal software (Sheehan et al., 2013) confirmed that herring have experienced an expansion in effective population size, roughly five- to ten-fold, and that the current N_e is on the order of 10⁶ individuals (Figure 1B); the results for Baltic and Atlantic herring were essentially identical. The result indicates that the effective population size minimum occurred at around one to two MYA, after the onset of the Quaternary ice age.

Phylogeny

The neighbor-joining phylogenetic tree including Atlantic, Baltic and Pacific herring shows a large phylogenetic distance between Pacific and Atlantic herring, as compared with the tiny genetic divergence among samples of Atlantic and Baltic herring (Figure 1C). We estimated the split between Atlantic and Pacific herring to ~2.2 million years ago based on mtDNA cytochrome B sequence divergence. The phylogenetic tree is consistent with minute differentiation at selectively neutral loci in Atlantic herring (Ryman et al., 1984; Lamichhaney et al., 2012); all subpopulations in the Eastern North Atlantic may have expanded from a common ancestral population after the last glaciation as indicated by demographic analysis (Figure 1B).

A closer examination of the tight cluster of Atlantic and Baltic herring populations reveals some structure consistent with geographic origin (Figure 1C). Samples from the Baltic Sea cluster on one half while samples from marine waters cluster on the other half of the tree. Only three populations are located at intermediate positions. Two of these are autumn-spawners from the Baltic Sea (BÄH and BF), indicating that autumn-spawning herring are genetically distinct from spring- and summer-spawning herring. The third sample (KT) at an intermediate position was sampled outside the spawning season and at the border between Kattegat and Baltic Sea and may represent a mixed sample of local Kattegat population and fish that spawn in the Baltic Sea but migrate into Kattegat for feeding.

Genetic adaptation to a new niche environment

The Atlantic (Clupea harengus harengus) and Baltic herring (Clupea harengus membras) were classified as subspecies by Linnaeus (1761) in the 18^th century. They are adapted to strikingly different environments, in particular regarding salinity that ranges from 2–3‰ in the Gulf of Bothnia to 12‰ in Southern Baltic Sea, whereas salinity in Kattegat, Skagerrak, North Sea and Atlantic Ocean is in the range 20‰–35‰ (Figure 1A; Table 2). To reveal loci underlying genetic adaptation associated with the recent niche expansion into brackish waters after the last glaciation we compared allele frequencies, SNP by SNP, in two superpools: one Atlantic including all populations from Atlantic Ocean, Skagerrak and Kattegat and a pool comprising all samples collected in Baltic Sea; this is justified by low differentiation at neutral loci as documented by the low F_ST-values when comparing all samples of Atlantic and Baltic herring (Figure 1D). Samples of autumn-spawning herring, a possible confounding factor, were excluded from the analysis. We used a stringent significance threshold of p<1x10^-10 (Bonferroni correction, p=8.2x10^-9).

We identified 46,045 SNPs that showed an allele frequency difference with p<1x10^-10 in the χ² test (Figure 3A; Supplementary file 3A). An important question is how many independent loci these represent. A conservative estimate of 472 independent loci was obtained (i) by only using SNPs with p<1x10^-20, (ii) by taking into account gaps in the assembly and (iii) by using the Comb-P software (Pedersen et al., 2012) to combine strongly correlated SNPs from the same genomic region (see Materials and methods). Figure 3A (lower panel) illustrates one of the most striking associations. For a large part of scaffold 218 there are no significant differences among Atlantic and Baltic samples whereas there are striking allele frequency differences over a 119.4 kb region; this is a characteristic pattern for differentiated regions, indicating that genetic adaptation typically occur as large haplotype blocks, often including multiple genes. A phylogenetic tree based on SNPs showing genetic differentiation between Atlantic and Baltic (Figure 3B) differs profoundly from the tree based on all SNPs (Figure 1C). With the exception of the two autumn-spawning populations BF and BÄH from the Baltic Sea, the position of all other populations match the variation in salinity perfectly with the population samples from the North Sea and Atlantic Ocean (35‰) at one end of the tree and samples from the brackish Baltic Sea (3‰–12‰) at the other end and with samples from Skagerrak (25‰) and Kattegat (20‰) at intermediate positions. The low genetic differentiation among Baltic samples, excluding the two autumn-spawning populations BF and BÄH, suggests that adaptation to brackish waters is a derived state.

Figure 3 with 2 supplements see all

Download asset Open asset

Genetic differentiation between Atlantic and Baltic herring.

(A) Manhattan plot of significance values testing for allele frequency differences between pools of herring from marine waters (Kattegat, Skagerrak, Atlantic Ocean) versus the brackish Baltic Sea. Lower panel, corresponding plot for scaffold 218 only; both P- and *F_ST*-values are shown. (B) Neighbor-joining phylogenetic tree based on all SNPs showing genetic differentiation in this comparison (p<10^-10). (C) Comparison of allele frequencies in five strongly differentiated regions. The major allele in the AB1 sample (Atlantic Ocean) was used as reference at each SNP. Lower panel, neighbor-joining tree based on haplotypes formed by 128 differentiated SNPs from scaffold 218. (D) Heat map showing copy number variation partially overlapping the *HCE* gene. Orientation of transcription is marked with an arrow; the position of SNPs significant in the χ² test is indicated by stars. Population samples and salinity at sampling locations are indicated to the right; abbreviations are explained in Table 2. (E) Strong genetic differentiation between Atlantic and Baltic herring in a region downstream of *SLC12A3*; statistical significance based on the χ² test is indicated.

https://doi.org/10.7554/eLife.12081.013

Figure 3C (upper panel) shows estimated allele frequencies for highly differentiated SNPs from five genomic regions in six population samples, each region showing an underlying genetic architecture with large and distinctly defined haplotype blocks. The Atlantic Ocean and North Sea samples are both nearly fixed for the reference allele at these SNPs. In contrast, the samples of Baltic herring were close to fixation for the alternate alleles. Interestingly, the sample (SB) collected in Skagerrak (salinity ~25‰) is most similar to the Atlantic Ocean and North Sea samples, but consistently shows a trend towards more intermediate allele frequencies at these loci.

We developed a 70k custom SNP chip to study differentiated regions in more detail and to use data from individual fish to confirm associations detected by pooled sequencing. The chip included 13,355 neutral SNPs evenly distributed across the genome and 59,205 SNPs showing genetic differentiation between subpopulations. Thirty fish each from 12 populations were used in the SNP screen. There was an excellent correlation between allele frequencies estimated with pooled sequencing and with the SNP chip (Figure 3—figure supplement 1). We constructed a phylogenetic tree (Figure 3C, lower panel) for haplotypes of highly differentiated SNPs from scaffold 218 present among individual fish from six representative populations, after phasing haplotypes using BEAGLE (Browning and Browning, 2007). As expected all fish from Atlantic Ocean and North Sea carried closely related “Atlantic” haplotypes. Two major haplotype groups were present among Baltic herring and with few exceptions Baltic herring carried only “Baltic” haplotypes. Fish from Skagerrak predominantly carried Atlantic haplotypes, but with a considerable proportion of Baltic haplotypes. Phylogenetic trees for other top scaffolds are presented in Figure 3—figure supplement 2.

There are many environmental and ecological differences between Atlantic Ocean and Baltic Sea e.g. temperature variability, eutrophication of the Baltic Sea, zooplankton and predator populations), but the most obvious difference concerns salinity. We used the Bayenv 2.0 (Günther and Coop, 2013) software to reveal which of the 472 independent loci detected with the χ² test showed the most consistent correlation with salinity. This analysis identified 3,335 SNPs from 122 independent regions with highly significant association to salinity (Supplementary file 3A). Twenty-one of the genes in these regions have previously been associated with hypertension in human and 36 of these genes showed differential expression in sticklebacks kept in freshwater or sea water (Supplementary file 3A).

Here we present three loci with striking association to salinity. Firstly, the 11 kb region in scaffold 899 (Figure 3C) contains a single gene, prolactin receptor (PRLR), that is essential for mammalian reproduction but has a central role for osmoregulation in fish (Manzon, 2002), and possibly in mammals (Schennink et al., 2015). Secondly, strong genetic differentiation was also observed at scaffold 346 (Figure 3A; p<1x10^-39). This signal overlaps HCE encoding high choriolytic enzyme. This locus was also identified as one of the most differentiated region in our screen for structural changes (Supplementary file 3B). A 4 kb region including part of the coding sequence showed a massive copy number amplification that had a strong negative correlation with salinity (Figure 3D). The outgroup, Pacific herring, showed an intermediate copy number. Interestingly, the Pacific herring spawns exclusively in shallow nearshore waters (Hay et al., 2009) often in estuaries and tidal zones where salinity varies, in contrast to deeper-spawning Atlantic herring. HCE is a protease, also denoted hatching enzyme, that solubilizes the inner layer of the egg envelope during hatching and adaptive evolution of this protein in relation to salinity has been reported (Kawaguchi et al., 2013). In herring, we found no coding changes implying altered transcriptional regulation. In fact, massive amplification of the promoter region is expected to alter gene expression. Hatching of the egg is probably a particularly challenging stage of development for a marine fish adapting to brackish conditions. Thirdly, a ~65 kb region downstream of solute carrier family 12 (sodium/chloride transporter) member 3 (SLC12A3) shows strong correlation with salinity (Figure 3E, Supplementary file 3A). SLC12A3, which has an established role in regulating osmotic balance, is associated with hypertension in human and shows differential expression in kidney tissue between sticklebacks kept in freshwater or sea water (Wang et al., 2014).

Genetic basis underlying timing of reproduction

Herring spawn from early spring to late fall. Prior to this study it was unknown if spawning time is entirely due to phenotypic plasticity, set by nutritional status and environmental conditions, or if genetic factors contribute (McQuinn, 1997). For example, it has been hypothesized that spawning time in the Baltic Sea is regulated by productivity of the system affecting maturation of fish prior to spawning (Aneer, 1985). To study this important question we collected spawning herring from the same geographic area, close to Gävle (Sweden), in May, July and September (Table 2). Our sampling included two other autumn-spawning populations collected in 1979, one from North Sea and the other from Southern Baltic Sea. We formed two superpools including three autumn-spawning and 10 spring-spawning population samples, respectively; the summer-spawners and one population of non-spawning herring (KT in Table 2) were excluded from the initial analysis. We identified 10,195 SNPs with significant allele frequency differences between pools (p<1x10^-10) and 69 regions with copy number variation (p<0.001) (Figure 4A); the highly differentiated SNPs represented at least 125 independent loci based on our strict criteria (see Materials and methods). The result demonstrates for the first time that autumn- and spring-spawning herring are genetically distinct and indicates that genetic factors affect spawning time. In a phylogenetic tree based on these 10,195 SNPs the autumn-spawning populations from the Baltic Sea and North Sea tended to cluster with spring-spawning herring from the Atlantic Ocean (Figure 4B).

Figure 4 with 2 supplements see all

Download asset Open asset

Genetic differentiation between spring- and autumn-spawning herring.

(A) Manhattan plot of significance values testing for allele frequency differences. (B) Neighbor-joining phylogenetic tree based on all SNPs showing genetic differentiation in this comparison (p<10^-10). (C) Comparisons of allele frequencies in four strongly differentiated regions. The major allele in the AB1 sample (Atlantic Ocean) was set as reference at each SNP. Scaffolds 190 and 1420 have been merged in this plot since it was obvious that they were overlapping. *The signal in scaffold s1440 is present ~27 kb upstream of *SOX11* and ~46 kb downstream of *DCDC2/ALLC*. (D) Neighbor-joining tree based on haplotypes formed by 70 differentiated SNPs from scaffold 190/1420; same populations as in Figure 4C. (E) Plot of average heterozygosity, per SNP in 5 kb windows, across scaffold 1420 indicating a selective sweep among spring-spawners in the region marked with vertical hatched lines. Autumn-spawning populations are marked by an asterisk.

https://doi.org/10.7554/eLife.12081.016

A general linear mixed model was used to identify which of the 125 independent loci showed the most consistent allele frequency differences between spring and autumn spawners. This analysis revealed 17 independent genomic regions that passed the stringent significance threshold of p<10^–10 (Bonferroni correction, p=4.9x10^-6) (Supplementary file 3C). We then illustrate the striking allele frequency differences at the four most significant regions using data from six different populations. As observed for the genetic adaptation to declined salinity (above), the most significant regions underlying seasonal reproductive timing typically consists of large haplotype blocks often containing multiple genes. Spring-spawning Atlantic and Baltic herring showed nearly identical allele frequencies at these loci while autumn-spawning herring from Baltic Sea and North Sea showed high frequencies of the alternate alleles (Figure 4C). Remarkably, summer-spawning herring showed a clear trend towards intermediate allele frequencies at all loci, most pronounced for scaffold 481 (Figure 4C). This may either reflect that this sample is an admixture of spring- and autumn-spawning herring or that it represents a distinct population. To explore this we investigated deviations from Hardy-Weinberg equilibrium using the F_IT statistics because we expect a heterozygote deficiency if this is a mix of two populations. The results, based on 1,500 SNPs all showing strong genetic differentiation between spring- and autumn-spawners and genotyped individually using the SNP chip, showed that the summer spawners (BÄS) did not deviate markedly from F_IT= 0 and in fact to a lesser extent than the spring-spawning population (BÄV) sampled at the same locality (Figure 4—figure supplement 1). For instance, individual genotyping of the highly differentiated SNPs from scaffold 481 (Figure 4C) resulted in mean F_IT= −0.10 (excess of heterozygotes) for the summer spawners (BÄS) whereas if the sample had constituted an equal mix of spring- and autumn spawners from the same locality (BÄV and BÄH) the expected F_IT-value would have been 0.46 (strong heterozygote deficiency). Thus, the data strongly suggest that these summer spawners represent a distinct population rather than admixture. Spawning time may be fine-tuned by the dosage of alleles affecting spawning time. The three populations from Gävle showed nearly identical allele frequencies at loci with strong genetic differentiation between Atlantic Ocean and Baltic Sea (Figure 3C), whereas they showed dramatic allele frequency differences at loci associated with spawning time (Figure 4C).

We used SNP-chip data to construct a haplotype tree based on highly differentiated SNPs in scaffold 190/1420. Two haplotype groups were strongly associated with spring- and autumn spawning (Figure 4D); haplotype trees for other top scaffolds are in Figure 4—figure supplement 2. The estimated average heterozygosity per polymorphic site across scaffold 1420 indicated a selective sweep among spring-spawning herring but not in autumn-spawning populations (Figure 4E). However, the nucleotide diversity did not show a significant difference between groups (spring: 0.24% ± 0.004%; autumn: 0.27% ± 0.003%). Thus, the number of variable sites are higher among spring-spawning herring, but the average heterozygosity per site is lower. One possible explanation for this observation is that a selective sweep happened at this locus in the past in spring-spawning herring, which was then followed by a population expansion allowing the accumulation of new mutations. This interpretation is supported by strong negative Tajima’s D-values in this region among spring-spawning Atlantic and Baltic herring (Figure 1—figure supplement 1E).

Genetic differences in spawning time are expected to involve photoperiodic regulation of reproduction. Interestingly, our strongest signals (p<1x10^-120) in this contrast is located within and up to 25 kb upstream of TSHR encoding thyroid-stimulating hormone receptor, which has a central role in this pathway in birds and mammals (Nakao et al., 2008; Ono et al., 2008; Hanon et al., 2008). Further, a second gene in the same scaffold (190/1420), calmodulin has a role in initiating reproduction following secretion of gonadotropin-releasing hormone (GnRH) (Melamed et al., 2012) downstream of TSHR signalling in photoperiodic regulation of reproduction. SOX11, one of the genes in the associated region in scaffold 1440 (Figure 4C), encodes a transcription factor that controls GnRH expression in GnRH-secreting neurons (Kim et al., 2011). Finally, ESR2a, in scaffold 312, encodes estrogen receptor beta that has a well established function in reproductive biology (Bondesson et al., 2015). Interestingly, a previous experimental study in sticklebacks also indicate that estrogen receptor signaling is involved in photoperiodic regulation of reproduction since treatment with aromatase inhibitors, which leads to an inhibition of the conversion of androgens to estrogens, altered photoperiodic regulation of male sexual maturation (Bornestaf et al., 1997). Also, the expression of ESR2 but not ESR1 is regulated by circadian factors in mice (Cai et al., 2008), consistent with our data suggesting that estrogen receptor beta (encoded by ESR2) is more important than estrogen receptor alpha (encoded by ESR1) for photoperiodic regulation of reproduction.

Adaptive haplotype blocks are maintained by selection

A common feature for the signatures of selection for adaptation to low salinity and for seasonal reproduction in herring is the presence of haplotype blocks (10–200 kb in size) showing strong differentiation (Figures 3C, 4C), despite the rapid decay of linkage disequilibrium at selectively neutral sites (Figure 1—figure supplement 1A). A possible explanation for the pattern is the presence of inversions suppressing recombination as previously shown in three-spined stickleback (Jones et al., 2012). We constructed 3.3 kb Nextera mate pair libraries for two Atlantic and two Baltic herring individuals to scan for inversions with a particular focus on regions under selection. However, few convincing inversion candidates were detected and none coincided with the regions highlighted in Figures 3C, 4C. Thus, inversions do not appear to be an important explanation for the presence of haplotype blocks.

Having excluded inversions as a major explanation for the long haplotype blocks, two other possible explanations were considered. Haplotype blocks may occur as a consequence of recent fast selective sweeps that leads to hitchhiking of neutral polymorphism in close genetic linkage with causal variants (Maynard-Smith and Haigh, 1974; Charlesworth et al., 1997). Alternatively, haplotype blocks involving multiple causal mutations may be maintained by natural selection. These two models give entirely different predictions as regards nucleotide diversity in the differentiated regions of the genome. The hitchhiking model predicts reduced levels of genetic diversity in the differentiated region whereas the haplotype evolution model implies that nucleotide diversity in the differentiated regions, even within populations, may be as high or even higher than in neutral regions because the haplotypes are expected to have been maintained during an evolutionary process. We decided to test this by comparing nucleotide diversity for the 30 most differentiated regions in the contrast Atlantic vs. Baltic within and between one population of Atlantic herring (Bergen) and one population of Baltic herring (Kalix). The nucleotide diversity turned out to be significantly higher in the differentiated regions than in random regions of the genome both within and between populations (Figure 5A). The same conclusion emerged from the analysis of the 30 most differentiated regions between autumn- and spring-spawning herring using the samples collected at the same locality (Gävle) in May and September (Figure 5B). Thus, we conclude that our data on genetic differentiation in herring is consistent with the evolution of haplotype blocks harbouring multiple causal variants. The model also implies that the presence of multiple alleles containing different combinations of causal variants is expected.

Figure 5

Download asset Open asset

Nucleotide diversity within and between samples with different ecological adaptations as regards

(A) salinity and (B) spawning time. For each contrast 30 strongly differentiated regions of the genome and 30 control regions showing no significant differentiation were used. The nucleotide diversity within and between populations for the control regions was estimated around 0.3% consistent with the genome average whereas diversity in differentiated regions was significantly higher. BK=Baltic herring, Kalix; AB=Atlantic herring, Bergen; BÄH=autumn-spawning Baltic herring from Gävle; BÄV, spring-spawning Baltic herring from Gävle; see Table 1. The data are presented as box plots; the central rectangle spans the first to third quartiles of the distribution, and the ‘whiskers’ above and below the box show the maximum and minimum estimates. The line inside the rectangle shows the median.

https://doi.org/10.7554/eLife.12081.019

Genomic distribution of causal variants

Genome-wide analysis combined with strong signatures of selection enabled us to explore the genomic distribution of sequence polymorphisms underlying ecological adaptation. We carried out an enrichment analysis as previously used to identify categories of SNPs showing differentiation between domestic and wild rabbits (Carneiro et al., 2014). We calculated the absolute allele frequency difference (dAF) for different categories of SNPs in the two contrasts Atlantic vs. Baltic and spring- vs. autumn spawning herring and sorted these into bins (dAF 0–0.05, etc.) for different categories of SNPs. In both contrasts the great majority of SNPs (>90%) showed a dAF lower than 0.10 (Figure 6, Supplementary file 3E).

Figure 6

Download asset Open asset

Analysis of delta allele frequency (dAF) for different categories of SNPs.

(A) dAF calculated for the contrast marine vs. brackish water. (B) dAF calculated for the contrast spring- vs. autumn-spawning. The black line represents the total number of SNPs in each dAF bin and coloured lines represent M values of different SNP types. M values were calculated by comparing the frequency of SNPs in a given annotation category in a specific bin with the corresponding frequency across all bins.

https://doi.org/10.7554/eLife.12081.020

Non-synonymous substitutions showed the most striking enrichment in both contrasts and showed a steady increase above dAF=0.15 reaching a two-fold enrichment at dAF>0.50 (Figure 6, Supplementary file 3E). This enrichment must reflect natural selection acting on the protein sequence because synonymous substitutions did not show a similar strong enrichment at high dAF. All non-synonymous substitutions showing dAF>0.50 in any of the two contrasts are compiled in Supplementary file 3F. A striking feature of this list is the common occurrence of multiple high dAF SNPs in the same gene. The 74 non-synonymous changes with dAF>0.50 in the contrast Atlantic vs. Baltic occur in only 29 different genes and the corresponding figure for the contrast spring- vs. autumn-spawning is 21 non-synonymous changes in 9 genes. We excluded the possibility that the presence of multiple non-synonymous changes in many of the genes was explained by errors in gene models (non-coding sequences annotated as exons) by a comparative analysis with other teleosts. We identified the orthologous position for about two thirds of the positions listed in Supplementary file 3F, the great majority of these (58/62) were annotated as coding sequence also in other species (Supplementary file 3F).

SNPs located in the 5’untranslated and 3’untranslated regions (UTRs) showed a more consistent enrichment compared to synonymous changes implying that this enrichment is unlikely to be caused entirely by close linkage to coding sequences under selection. Thus, changes in UTRs have contributed to ecological adaptation in the herring, most likely due to their role in regulating mRNA stability and translation efficiency. In this analysis we combined 5’UTR and 3’UTR SNPs to avoid too small classes for the extremely high dAF. However, an analysis based on all SNPs showing a dAF > 0.1 in the Atlantic vs. Baltic contrast and all SNPs showing a dAF > 0.2 for the spring- vs. autumn-spawning contrast demonstrated that both 5’UTR and 3’UTR SNPs are overrepresented at high dAF and the trend is particularly strong for 5’UTR SNPs (Supplementary file 3G).

The importance of regulatory changes underlying ecological adaptation is evident from the highly significant enrichment of SNPs within 5 kb upstream and downstream of coding sequences (Figure 6, Supplementary file 3E). Further, the excess is particularly pronounced within 1 kb upstream of the coding sequence where the promoter is expected to be located (Supplementary file 3H). The enrichment is not as high as for non-synonymous changes but this does not mean that regulatory changes are less important than coding changes because a much higher proportion of SNPs within the 5 kb region flanking coding sequences are expected to be selectively neutral compared with those causing non-synonymous changes. Thus, it is possible that the enrichment of non-coding SNPs would be much higher if there was a better annotation of the functional significance of non-coding sequences in Atlantic herring.

Intergenic and intronic SNPs were in general underrepresented among SNPs showing high dAF (Figure 6). For the most differentiated SNPs (dAF > 0.50) the intergenic SNPs showed a marked underrepresentation in the Atlantic – Baltic contrast (M=-0.64; p=5.1 x 10^-25; Supplementary file 3E) while intronic SNPs were most underrepresented in the spring- vs. autumn-spawning contrast (M=-0.55; p=6.7 x 10^-7; Supplementary file 3E).

We also explored the possibility that loss of function-mutations have contributed to ecological adaptation. We identified a total of 469 nonsense mutations but expect that many of these will be false predictions due to errors in the gene model. Eight predicted nonsense mutations had a dAF higher than 0.20 in one of the contrasts and were further examined. Seven of these were unlikely to be correct annotations since the positions were not annotated as coding in zebrafish, and the remaining one had a dAF of 0.21 but was far from statistical significance. Thus, we conclude that gene inactivation is not a common mechanism for ecological adaptation.

Discussion

We have generated an Atlantic herring genome assembly and used this for a comprehensive analysis of the genetic basis for ecological adaptation. Hundreds of independent loci underlying ecological adaptation were revealed by comparing spring- and autumn-spawners as well as populations adapted to marine and brackish waters. The data show that both coding and non-coding changes contribute to ecological adaptation and we find that haplotype blocks spanning up to hundreds of kb show strong genetic differentiation.

The genetic architecture of multifactorial traits and disorders is an important topic in current biology. Genome-wide association studies (GWAS) in humans as well as in livestock have indicated that most multifactorial traits and disorders are controlled by large number of loci each explaining a tiny fraction of trait variation (Wood et al., 2014; Meuwissen et al., 2013). Thus, if ecological adaptation has a similar complex genetic background, in particular in a species with a large population size where each base in the genome is expected to mutate many times each generation, it may be difficult to reveal individual loci underlying adaptation. In contrast, this and our previous study (Lamichhaney et al., 2012) have revealed that genomic regions harbouring a small portion of all SNPs show strong genetic differentiation in the herring whereas the rest of the genome shows very low levels of genetic differentiation. However, there are some important differences between the herring and human data. Firstly, human GWAS reveal loci that contribute to standing genetic variation and therefore includes deleterious alleles that have not yet been eliminated by purifying selection. Secondly, the phenotypic effects of the loci reported here in the herring may be small and the strong genetic differentiation may have accumulated over many generations. There is also plenty of room for natural selection to operate in a species with a large reproductive output like the herring. Thirdly, our study gives no insight in how much of the genetic variation in ecological adaptation these loci control since we do not have information on genotype-phenotype relationships for individual fish. We cannot exclude the possibility that there are additional loci with tiny differences in allele frequency between populations or loci with an extensive allelic heterogeneity that are not detected using our approach. The question how much of the genetic variation the loci reported in this study explains needs to be addressed in future experimental studies.

An important finding was the presence of large haplotype blocks (10–200 kb in size) showing strong genetic differentiation, standing in sharp contrast to the rapid decay of linkage disequilibrium at selectively neutral sites (Figure 1—figure supplement 1A). Although it is expected that the majority of sequence polymorphisms associated with these haplotype blocks are selectively neutral, the data presented here is consistent with a scenario where haplotype blocks evolve over time by the accumulation of multiple, consecutive mutations affecting one or more genes similar to the evolution of haplotypes carrying multiple causal mutations as has been documented in domestic animals (Andersson, 2013) as well as suggested for the evolution of the blunt beak ALX1 haplotype in Darwin’s finches (Lamichhaney et al., 2015). Under this scenario, the shift from one allelic state to another rarely happens through a single mutational event since the fitness of a haplotype depends on the combined effect of multiple sequence polymorphisms affecting function. Furthermore, it is expected that there will be selection for supressed recombination within these regions to avoid that favoured haplotype blocks break up. Our analysis showing that nucleotide diversity is higher within the differentiated regions than in the rest of the genome (Figure 5) strongly supports our hypothesis that the large haplotype blocks are maintained by selection rather than being the consequence of genetic hitchhiking (Maynard-Smith and Haigh, 1974; Charlesworth et al., 1997). The common occurrence of multiple non-synonymous changes in genes showing strong genetic differentiation provides further support for the haplotype evolution model (Supplementary file 3F). The model proposed here is in line with the evolution of complex adaptive alleles in species with large current effective population sizes like modern Drosophila melanogaster populations (Karasov et al., 2010).

A long-standing question in evolutionary biology is the relative importance of genetic variation in regulatory and coding sequences. King and Wilson (1975) argued already 40 years ago that regulatory changes are more important than protein changes for phenotypic differences among primates. The large number of loci associated with ecological adaptation detected in the present study allowed us to explore their genomic distribution. There was a highly significant excess of non-synonymous changes as well as SNPs in UTRs and within 5 kb upstream and downstream of coding sequences among the loci showing strong genetic differentiation (Figure 6). Thus, both coding and non-coding changes contribute to ecological adaptation in the herring. The enrichment was clearly most pronounced for non-synonymous SNPs but it is likely that regulatory changes are in majority among the causal variants because there are more than 10 times as many non-coding as coding changes among the SNPs showing the strongest genetic differentiation (Supplementary file 3F). However, at present we cannot judge the relative importance of coding and non-coding changes, partially due to the strong linkage disequilibrium between coding and non-coding changes and partially because we have no data on the effect size of individual loci. We observed a highly significant excess of several categories of SNPs even for loci with only a 10–15% allele frequency difference between populations (Supplementary file 3E) suggesting that SNPs with such minor changes in allele frequencies contribute to ecological adaptation in the herring. Consistent with previous studies in domestic animals (Carneiro et al., 2014; Rubin et al., 2010), we did not find any indication that gene inactivation has contributed to adaptive evolution.

Timing of reproduction is of utmost importance for fitness in plants and animals and it is well documented that climate change affects reproductive success in both terrestrial (Visser et al., 2015) and aquatic organisms (Edwards and Richardson, 2004). We identified more than 100 independent loci showing strong genetic differentiation between spring- and autumn-spawners. Not all of these are expected to control reproduction since other life history parameters differ between populations. However, several of the most strongly associated regions overlapped genes with a role in photoperiodic regulation of reproduction in birds and mammals, such as thyroid-stimulating hormone receptor (TSHR), calmodulin and SOX11 (Ono et al., 2008; Hanon et al., 2008; Nakao et al., 2008; Melamed et al., 2012; Kim et al., 2011). Photoperiodic regulation in fish is poorly studied, but a recent study showed that the saccus vasculosus brain region is a sensor of changes in day length and suggested that changes in day length affect TSHR expression in this region in Masu salmon (Nakane et al., 2013). Interestingly, strong signatures of selection at TSHR in chicken (Rubin et al., 2010) and sheep (Kijas et al., 2012) may reflect selection against seasonal reproduction in domestic animals.

The population structure of Atlantic herring has been under debate for more than a century (McQuinn, 1997; Iles and Sinclair, 1982). The discussion has concerned the taxonomic status of stocks associated with different spawning and feeding locations, and whether populations are reproductively isolated. Our data are consistent with a metapopulation structure (McQuinn, 1997) in which subpopulations (stocks) are not reproductively isolated. Gene flow combined with large effective population sizes explains low genetic differentiation at selectively neutral loci. Despite this, natural selection is sufficiently strong to cause genetic differentiation at many loci underlying adaptation.

Many populations of marine fish, including the herring, have been severely affected by overfishing (Worm et al., 2006; Dickey-Collas et al., 2010). Our study shows how genomic technologies can be used in a cost-effective manner to make major leaps in characterization of population structure and genetic diversity. The study has important implications for sustainable fishery management of herring by providing a comprehensive list of genetic markers that can be used for stock assessments, including the first molecular tools to distinguish autumn- and spring-spawning herring. These can be used to complement the current use of otoliths (ear bones) microstructures. Moreover, the findings that spring- and autumn-spawners constitute distinct populations imply that fisheries management should aim to protect both populations separately, which is currently not the case in the Baltic Sea (ICES, 2014). Finally, the study also has implications for fish aquaculture due to the interest to alter seasonal reproduction and adaptation to different salinities.

Share this article

Cite this article

Demographic history and phylogeny.

Genome assembly and annotation.

Genetic differentiation between Atlantic and Baltic herring.

Genetic differentiation between spring- and autumn-spawning herring.

Nucleotide diversity within and between samples with different ecological adaptations as regards

Analysis of delta allele frequency (dAF) for different categories of SNPs.

Author details

Alvaro Martinez Barrio

Contribution

Contributed equally with

Competing interests

Sangeet Lamichhaney

Contribution

Contributed equally with

Competing interests

Guangyi Fan

Contribution

Contributed equally with

Competing interests

Nima Rafati

Contribution

Contributed equally with

Competing interests

Mats Pettersson

Contribution

Competing interests

He Zhang

Contribution

Competing interests

Jacques Dainat

Contribution

Competing interests

Diana Ekman

Contribution

Competing interests

Marc Höppner

Contribution

Competing interests

Patric Jern

Contribution

Competing interests

Marcel Martin

Contribution

Competing interests

Björn Nystedt

Contribution

Competing interests

Xin Liu

Contribution

Competing interests

Wenbin Chen

Contribution

Competing interests

Xinming Liang

Contribution

Competing interests

Chengcheng Shi

Contribution

Competing interests

Yuanyuan Fu

Contribution

Competing interests

Kailong Ma

Contribution

Competing interests

Xiao Zhan

Contribution

Competing interests

Chungang Feng

Contribution

Competing interests

Ulla Gustafson

Contribution

Competing interests

Carl-Johan Rubin

Contribution

Competing interests

Markus Sällman Almén