Research Article

The Aquilegia genome provides insight into adaptive radiation and reveals an extraordinarily polymorphic chromosome with a unique history

Vienna BioCenter, Austria
University of California, United States
Masaryk University, Czech Republic
Vienna Graduate School of Population Genetics, Austria
Joint Genome Institute, United States
HudsonAlpha Institute of Biotechnology, United States
Centre of the Region Haná for Biotechnological and Agricultural Research, Czech Republic
Harvard University, United States

Oct 16, 2018

Open access
Copyright information

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

The columbine genus Aquilegia is a classic example of an adaptive radiation, involving a wide variety of pollinators and habitats. Here we present the genome assembly of A. coerulea ‘Goldsmith’, complemented by high-coverage sequencing data from 10 wild species covering the world-wide distribution. Our analyses reveal extensive allele sharing among species and demonstrate that introgression and selection played a role in the Aquilegia radiation. We also present the remarkable discovery that the evolutionary history of an entire chromosome differs from that of the rest of the genome – a phenomenon that we do not fully understand, but which highlights the need to consider chromosomes in an evolutionary context.

https://doi.org/10.7554/eLife.36426.001

Introduction

Understanding adaptive radiation is a longstanding goal of evolutionary biology (Schluter, 2000). As a classic example of adaptive radiation, the Aquilegia genus has outstanding potential as a subject of such evolutionary studies (Hodges et al., 2004; Hodges and Derieg, 2009; Kramer, 2009). The genus is made up of about 70 species distributed across Asia, North America, and Europe (Munz, 1946) (Figure 1). Distributions of many Aquilegia species overlap or adjoin one another, sometimes forming notable hybrid zones (Grant, 1952; Hodges and Arnold, 1994b; Li et al., 2014). Additionally, species tend to be widely interfertile, especially within geographic regions (Taylor, 1967).

Figure 1 with 1 supplement see all

Download asset Open asset

Distribution of *Aquilegia* species.

There are ~70 species in the genus *Aquilegia*, broadly distributed across temperate regions of the Northern Hemisphere (grey). The 10 *Aquilegia* species sequenced here were chosen as representatives spanning this geographic distribution as well as the diversity in ecological habitat and pollinator-influenced floral morphology of the genus. *Semiaquilegia adoxoides*, generally thought to be the sister taxon to *Aquilegia* (Fior et al., 2013), was also sequenced. A representative photo of each species is shown and is linked to its approximate distribution.

https://doi.org/10.7554/eLife.36426.002

Phylogenetic studies have defined two concurrent, yet contrasting, adaptive radiations in Aquilegia (Bastida et al., 2010; Fior et al., 2013). From a common ancestor in Asia, one radiation occurred in North America via Northeastern Asian precursors, while a separate Eurasian radiation took place in central and western Asia and Europe. While adaptation to different habitats is thought to be a common force driving both radiations, shifts in primary pollinators also play a substantial role in North America (Whittall and Hodges, 2007; Bastida et al., 2010). Previous phylogenetic studies have frequently revealed polytomies (Hodges and Arnold, 1994b; Ro et al., 1997; Whittall and Hodges, 2007; Bastida et al., 2010; Fior et al., 2013), suggesting that many Aquilegia species are very closely related.

Genomic data are beginning to uncover the extent to which interspecific variant sharing reflects a lack of strictly bifurcating species relationships, particularly in the case of adaptive radiation. Discordance between gene and species trees has been widely observed (Novikova et al., 2016 and references 15, 34–44 therein; Svardal et al., 2017; Malinsky et al., 2017), and while disagreement at the level of individual genes is expected under standard population genetics coalescent models (Takahata, 1989) (also known as ‘incomplete lineage sorting’ [Avise and Robinson, 2008]), there is increased evidence for systematic discrepancies that can only be explained by some form of gene flow (Green et al., 2010; Novikova et al., 2016; Svardal et al., 2017; Malinsky et al., 2017). The importance of admixture as a source of adaptive genetic variation has also become more evident (Lamichhaney et al., 2015; Mallet et al., 2016; Pease et al., 2016). Hence, rather than being a problem to overcome in phylogenetic analysis, non-bifurcating species relationships could actually describe evolutionary processes that are fundamental to understanding speciation itself. Here we generate an Aquilegia reference genome based on the horticultural cultivar Aquilegia coerulea ‘Goldsmith’ and perform resequencing and population genetic analysis of 10 additional individuals representing North American, Asian, and European species, focusing in particular on the relationship between species.

Results

Genome assembly and annotation

We sequenced an inbred horticultural cultivar (A. coerulea ‘Goldsmith’) using a whole genome shotgun sequencing strategy. A total of 4,773,210 Sanger sequencing reads from seven genomic libraries (Supplementary file 1) were assembled to generate 2529 scaffolds with an N50 of 3.1 Mbp (Supplementary file 2). With the aid of two genetic maps, we assembled these initial scaffolds into a 291.7 Mbp reference genome consisting of 7 chromosomes (282.6 Mbp) and an additional 1027 unplaced scaffolds (9.13 Mbp) (Supplementary file 3). The completeness of the assembly was validated using 81,617 full length cDNAs from a variety of tissues and developmental stages (Kramer and Hodges, 2010), of which 98.69% mapped to the assembly. We also assessed assembly accuracy using Sanger sequencing of 23 full-length BAC clones. Of more than 3 million base pairs sequenced, only 1831 were found to be discrepant between BAC clones and the assembled reference (Supplementary file 4). To annotate genes in the assembly, we used RNAseq data generated from a variety of tissues and Aquilegia species (Supplementary file 5), EST data sets (Kramer and Hodges, 2010), and protein homology support, yielding 30,023 loci and 13,527 alternate transcripts. The A. coerulea 'Goldsmith' v3.1 genome release is available on Phytozome (https://phytozome.jgi.doe.gov/). For a detailed description of assembly and annotation, see Materials and methods.

Polymorphism and divergence

We deeply resequenced one individual from each of ten Aquilegia species (Figure 1 and Figure 1—figure supplement 1). Sequences were aligned to the A. coerulea 'Goldsmith' v3.1 reference using bwa-mem (Li and Durbin, 2009; Li, 2013) and variants were called using GATK Haplotype Caller (McKenna et al., 2010). Genomic positions were conservatively filtered to identify the portion of the genome in which variants could be reliably called across all ten species (see Materials and methods for alignment, SNP calling, and genome filtration details). The resulting callable portion of the genome was heavily biased towards genes and included 57% of annotated coding regions (48% of gene models), but only 21% of the reference genome as a whole.

Using these callable sites, we calculated nucleotide diversity as the percentage of pairwise sequence differences in each individual. Assuming random mating, this metric reflects both individual sample heterozygosity and nucleotide diversity in the species as a whole. Of the ten individuals, most had a nucleotide diversity of 0.2–0.35% (Figure 2a), similar to previous estimates of nucleotide diversity in Aquilegia (Cooper et al., 2010), yet lower than that of a typical outcrossing species (Leffler et al., 2012). While likely partially attributable to enrichment for highly conserved genomic regions with our stringent filtration, this atypically low nucleotide diversity could also reflect inbreeding. Additionally, four individuals in our panel had extended stretches of near-homozygosity (defined as nucleotide diversity <0.1%) consistent with recent inbreeding (Figure 2—figure supplement 1). Aquilegia has no known self-incompatibility mechanism, and selfing does appear to be common. However, inbreeding in adult plants is generally low, suggesting substantial inbreeding depression (Montalvo, 1994; Herlihy and Eckert, 2002; Yang and Hodges, 2010).

Figure 2 with 2 supplements see all

Download asset Open asset

Polymorphism and divergence in *Aquilegia*.

(a) The percentage of pairwise differences within each species (estimated from individual heterozygosity) and between species (divergence). $F_{ST}$ values between geographic regions are given on the lower half of the pairwise differences heatmap. Both heatmap axes are ordered according to the neighbor joining tree to the left. This tree was constructed from a concatenated data set of reliably-called genomic positions. (b) Polymorphism within each sample by chromosome. Per-chromosome values are indicated by the chromosome number.

https://doi.org/10.7554/eLife.36426.004

We next considered nucleotide diversity between individuals as a measure of species divergence. Species divergence within a geographic region (0.38–0.86%) was often only slightly higher than within-species diversity, implying extensive variant sharing, while divergence between species from different geographic regions was markedly higher (0.81–0.97%; Figure 2a). $F_{ST}$ between geographic regions (0.245–0.271) was similar to that between outcrossing species of the Arabidopsis genus (Novikova et al., 2016), yet lower than between most vervet species pairs (Svardal et al., 2017), and higher than between cichlid groups in Malawi (Loh et al., 2013) or human ethnic groups (McVean et al., 2012). The topology of trees constructed with concatenated genome data (neighbor joining (Figure 2a), RAxML (Figure 2—figure supplement 2a)) were in broad agreement with previous Aquilegia phylogenies (Hodges and Arnold, 1994a; Ro and McPheron, 1997; Whittall and Hodges, 2007; Bastida et al., 2010; Fior et al., 2013), with one exception: while A. oxysepala is sister to all other Aquilegia species in our analysis, it had been placed within the large Eurasian clade with moderate to strong support in previous studies (Bastida et al., 2010; Fior et al., 2013).

Surprisingly, levels of polymorphism were generally strikingly higher on chromosome four (Figure 2b). Exceptions were apparently due to inbreeding, especially in the case of the A. aurea individual, which appears to be almost completely homozygous (Figure 2a and Figure 2—figure supplement 1). The increased polymorphism on chromosome four is only partly reflected in increased divergence to an outgroup species (Semiaquilegia adoxoides), suggesting that it represents deeper coalescence times rather than simply a higher mutation rate (mean ratio chromosome four/genome at fourfold degenerate sites: polymorphism = 2.258, divergence = 1.201, Supplementary file 6).

Discordance between gene and species trees

To assess discordance between gene and species (genome) trees, we constructed a cloudogram of trees drawn from 100 kb windows across the genome (Figure 3a). Fewer than 1% of these window-based trees were topologically identical to the species tree. North American species were consistently separated from all others (96% of window trees) and European species were also clearly delineated (67% of window trees). However, three bifurcations delineating Asian species were much less common: the A. japonica and A. sibirica sister relationship (45% of window trees), A. oxysepala as sister to all other species (30% of window trees), and the split demarcating the Eurasian radiation (31% of window trees). These results demonstrate a marked discordance of gene and species trees throughout both Aquilegia radiations.

Figure 3 with 3 supplements see all

Download asset Open asset

Discordance between gene and species trees.

(a) Cloudogram of neighbor joining (NJ) trees constructed in 100 kb windows across the genome. The topology of each window-based tree is co-plotted in grey and the whole genome NJ tree shown in Figure 2a is superimposed in black. Blue numbers indicate the percentage of window trees that contain each of the subtrees observed in the whole genome tree. (b) Genome NJ tree topology. Blue letters a-c on the tree denote subtrees a-c in panel (d). (c) Chromosome four NJ tree topology. Blue letters d and e on the tree denote subtrees d and e in panel (d). (d) Prevalence of each subtree that varied significantly by chromosome. Genomic (black bar) and per chromosome (chromosome number) values are given.

https://doi.org/10.7554/eLife.36426.007

The gene tree analysis also highlighted the unique evolutionary history of chromosome four. Of 217 unique subtrees observed in gene trees, nine varied significantly in frequency between chromosomes (chi-square test p-value < 0.05 after Bonferroni correction; Figure 3b–d and Figure 3—figure supplements 1 and 2). Trees describing a sister species relationship between A. pubescens and A. barnebyi were more common on chromosome one, but chromosome four stood out with respect to eight other relationships, most of them related to A. oxysepala (Figure 3d). Although A. oxysepala was sister to all other species in our genome tree, the topology of the chromosome four tree was consistent with previously-published phylogenies in that it placed A. oxysepala within the Eurasian clade (Bastida et al., 2010; Fior et al., 2013) (Figure 2—figure supplement 2b,c). Subtree prevalences were in accordance with this topological variation (Figure 3b–d). The subtree delineating all North American species was also less frequent on chromosome four, indicating that the history of the chromosome is discordant in both radiations. We detected no patterns in the prevalence of any chromosome-discordant subtree that would suggest structural variation or a large introgression (Figure 3—figure supplement 3).

Polymorphism sharing across the genus

We next polarized variants against an outgroup species (S. adoxoides) to explore the prevalence and depth of polymorphism sharing. Private derived variants accounted for only 21–25% of polymorphic sites in North American species and 36–47% of variants in Eurasian species (Figure 4a). The depth of polymorphism sharing reflected the two geographically-distinct radiations. North American species shared 34–38% of their derived variants within North America, while variants in European and Asian species were commonly shared across two geographic regions (18–22% of polymorphisms, predominantly shared between Europe and Asia; Figure 4b,c; Figure 4—figure supplement 1). Strikingly, a large percentage of derived variants occurred in all three geographic regions (22–32% of polymorphisms, Figure 4d), demonstrating that polymorphism sharing in Aquilegia is extensive and deep.

Figure 4 with 1 supplement see all

Download asset Open asset

Sharing patterns of derived polymorphisms.

Proportion of derived variants (a) private to an individual species, (b) shared within the geographic region of origin, (c) shared across two geographic regions, and (d) shared across all three geographic regions. Genomic (black bar) and chromosome (chromosome number) values, for all 10 species.

https://doi.org/10.7554/eLife.36426.011

In all species examined, the proportion of deeply shared variants was higher on chromosome four (Figure 4d), largely due to a reduction in private variants, although sharing at other depths was also reduced in some species. Variant sharing on chromosome four within Asia was higher in both A. oxysepala and A. japonica (Figure 4b), primarily reflecting higher variant sharing between these species (Figure 6a).

Evidence of gene flow

Consider three species, H1, H2, and H3. If H1 and H2 are sister species relative to H3, then, in the absence of gene flow, H3 must be equally related to H1 and H2. The D statistic (Green et al., 2010; Durand et al., 2011) tests this hypothesis by comparing the number of derived variants shared between H3, and H1 and H2, respectively. A non-zero D statistic reflects an asymmetric pattern of allele sharing, implying gene flow between H3 and one of the two sister species, that is that speciation was not accompanied by complete reproductive isolation. If Aquilegia diversification occurred via a series of bifurcating species splits characterized by reproductive isolation, bifurcations in the species tree should represent combinations of sister and outgroup species with symmetric allele sharing patterns (D = 0). Given the high discordance of gene and species trees at the individual species level, we focused on testing a simplified tree topology based on the three groups whose bifurcation order seemed clear: (1) North American species, (2) European species, and (3) Asian species not including A. oxysepala. In all tests, S. adoxoides was used to determine the ancestral state of alleles.

We first tested each North American species as H3 against all combinations of European and Asian (without A. oxysepala) species as H1 and H2 (Figure 5a–c). As predicted, the North American split was closest to resembling speciation with strict reproductive isolation, with little asymmetry in allele sharing between North American and Asian species and low, but significant, asymmetry between North American and European species (Figure 5b). Next, we considered allele sharing between European and Asian (without A. oxysepala) species (Figure 5d,e). Here we found non-zero D statistics for all species combinations. Interestingly, the patterns of asymmetry between these two regions were reticulate: Asian species shared more variants with the European A. vulgaris while European species shared more derived alleles with the Asian A. sibirica. D statistics therefore demonstrate widespread asymmetry in variant sharing between Aquilegia species, suggesting that speciation processes throughout the genus were not characterized by strict reproductive isolation.

Figure 5

Download asset Open asset

D statistics demonstrate gene flow during *Aquilegia* speciation.

D statistics for tests with (**a–c**) all North American species, (d) both European species, (e) Asian species other than *A. oxysepala*, and (f) *A. oxysepala* as H3 species. All tests use *S. adoxoides* as the outgroup. D statistics outside the green shaded areas are significantly different from zero. In (**a–e**), each individual dot represents the D statistic for a test done with a unique species combination. In (f), D statistics are presented by chromosome (chromosome number) or by the genome-wide value (black bar). In all panels, E = European and A = Asian without *A. oxysepala*. In some cases, individual species names are given when the geographical region designation consists of a single species. Right hand panels are a graphical representation of the D statistic tests in the corresponding left hand panels. Trees are a simplified version of the genome tree topology (Figure 2b), in which the bold sub tree(s) represent the bifurcation considered in each set of tests. H3 species are noted in blue while the H1 and H2 species are specified in black. (Figure 5—source data 1).

https://doi.org/10.7554/eLife.36426.013

Figure 5—source data 1 (D statistics).: https://doi.org/10.7554/eLife.36426.014
Download elife-36426-fig5-data1-v2.txt

Although non-zero D statistics are usually interpreted as being due to gene flow in the form of admixture between species, they can also result from gene flow between incipient species. Either way, speciation precedes reproductive isolation. The possibility that different levels of purifying selection in H1 or H2 explain the observed D statistics can probably be ruled out, since D statistics do not differ when calculated with only fourfold degenerate sites (p-value < 2.2 x 10^-16, adjusted R²= 0.9942, Figure 5—source data 1). Non-zero D statistics could also indicate that the bifurcation order tested was incorrect, but even tests based on alternative tree topologies resulted in few D statistics that equal zero (Figure 5—source data 1). Therefore, the non-zero D statistics observed in Aquilegia most likely reflect a pattern of reticulate evolution throughout the genus.

Since variant sharing between A. oxysepala and A. japonica was higher on chromosome four (Figure 6a), and hybridization between these species has been reported (Li et al., 2014) we wondered whether gene flow could explain the discordant placement of A. oxysepala between chromosome four and genome trees (Figure 3b,c). Indeed, when the genome tree was taken as the bifurcation order, D statistics were elevated between these species (Figure 5f). A relatively simple coalescent model allowing for bidirectional gene flow between A. oxysepala and A. japonica (Figure 6b) demonstrated that doubling the population size (N) to reflect chromosome four’s polymorphism level (i.e. halving the coalescence rate) could indeed shift tree topology proportions (Figure 6c, row 2). However, recreating the observed allele sharing ratios on chromosome four (Figure 6a) required some combination of increased migration (m) and/or N (Figure 6c, rows 3–4). It is plausible that gene flow might differentially affect chromosome four, and we will return to this topic in the next section. Although the similarity of the D statistic across chromosomes (Figure 5f) might seem inconsistent with increased migration on chromosome four, the D statistic reaches a plateau in our simulations such that many different combinations of m and N produce similar D values (Figure 6c and Figure 6—figure supplement 1). Therefore, an increase in migration rate and deeper coalescence can explain the tree topology of chromosome four, a result that might explain inconsistencies in A. oxysepala placement in previous phylogenetic studies (Bastida et al., 2010; Fior et al., 2013).

Figure 6 with 2 supplements see all

Download asset Open asset

The effect of differences in coalescence time and gene flow on tree topologies.

(a) The observed proportion of informative derived variants supporting each possible Asian tree topology genome-wide and on chromosome four. Species considered include *A. oxysepala* (oxy), *A. japonica* (japon), and *A. sibirica* (sib). (b) The coalescent model with bidirectional gene flow in which *A. oxysepala* diverges first at time t2, but later hybridizes with *A. japonica* between t = 0 and t1 at a rate determined by per-generation migration rate, m. The population size (N) remains constant at all times. (c) The proportion of each tree topology and estimated D statistic for simulations using four combinations of m and Nvalues (t1 = 1 in units of N generations). The combination presented in the first row (m = 2x10^-5and N = 11667) generates tree topology proportions that match observed allele sharing proportions genomewide. Simulations with increased m and/or N (rows 3–4) result in proportions which more closely resemble those observed for chromosome four. Colors in proportion plots refer to tree topologies in (a), with black bars representing the residual probability of seeing no coalescence event. While this simulation assumes symmetric gene flow, similar results were seen for models incorporating both unidirectional and asymmetric gene flow (Figure 6—figure supplements 1 and 2).

https://doi.org/10.7554/eLife.36426.015

The pattern of polymorphism on chromosome four

In most of the sequenced Aquilegia species, the level of polymorphism on chromosome four is twice as high as in the rest of the genome (Figure 2b). This unique pattern could be: (1) an artifact of biases in polymorphism detection between chromosomes, (2) the result of a higher neutral mutation rate on chromosome four, or (3) the result of deeper coalescence times on chromosome four (allowing more time for polymorphism to accumulate).

While it is impossible to completely rule out phenomena such as cryptic copy number variants (CNV), for the pattern to be entirely attributable to artefacts would require that half of the polymorphism on chromosome four be spurious. This scenario is extremely unlikely given the robustness of the result to a variety of CNV detection methods (Supplementary file 7). Similarly, the pattern cannot wholly be explained by a higher neutral mutation rate. If this were the case, both divergence and polymorphism would be elevated to the same extent on chromosome four (Kimura, 1983). As noted above, this not the case (Supplementary file 6). Thus the higher level of polymorphism on chromosome four must to some extent reflect differences in coalescence time, which can only be due to selection.

Although it is clear that selection can have a dramatic effect on the history of a single locus, the chromosome-wide pattern we observe (Figure 2—figure supplement 1) is difficult to explain. Chromosome four recombines freely (Figure 7a), suggesting that polymorphism is not due to selection on a limited number of linked loci, such as might be observed if driven by an inversion or large supergene. Selection must thus be acting on a very large number of loci across the chromosome.

Figure 7 with 2 supplements see all

Download asset Open asset

Recombination and selection on chromosome four (a) Physical vs.

genetic distance for all chromosomes calculated in an *A. formosa* x *A. pubescens* mapping population. High nucleotide diversity on chromosome four was also observed in parental plants of this population (Figure 7—figure supplement 1. (b) Relationship between gene density (proportion exonic) and recombination rate (main effect p-value < 2 x 10^-16, chromosome four effect p-value < 2 x 10^-16, interaction p-value < 1.936 x 10^-11, adjusted R² = 0.8045). (c) Relationship between gene density and D statistic for *A. oxysepala* and *A. japonica* gene flow. (d) Relationship between gene density and mean neutral nucleotide diversity. Figure 7—source data 1.

https://doi.org/10.7554/eLife.36426.018

Figure 7—source data 1 (Physical and genetic distance for A.formosa x A.pubescens markers).: https://doi.org/10.7554/eLife.36426.021
Download elife-36426-fig7-data1-v2.txt

Balancing selection is known to elevate polymorphism, and in a number of plant species, disease resistance (R) genes show signatures of balancing selection (Karasov et al., 2014). While such signatures have not yet been demonstrated in Aquilegia, chromosome four is enriched for the defense gene GO category, which encompasses R genes (Table 1). However, while significant, this enrichment involves a relatively small number of genes (less than 2% of genes on chromosome four) and is therefore unlikely to completely explain the polymorphism pattern (Nordborg and Innan, 2003).

Table 1

GO term enrichment on chromosome four

https://doi.org/10.7554/eLife.36426.022

GO	Corrected P-value	Number on Chr_04		Percent of Chr_04 genes	GO term
GO	Corrected P-value	Observed	Expected	Percent of Chr_04 genes	GO term
0043531	$5.61 \times 10^{- 79}$	140	9	7.57	ADP binding
0016705	$4.40 \times 10^{- 48}$	179	39	9.68	Oxidoreductase activity, actingon paired donors, withincorporation or reduction of molecular oxygen
0004497	$7.19 \times 10^{- 46}$	158	32	8.55	Monooxygenase activity
0005506	$2.73 \times 10^{- 41}$	181	46	9.79	Iron ion binding
0020037	$2.57 \times 10^{- 37}$	186	53	10.06	Heme binding
0010333	$1.72 \times 10^{- 15}$	39	4	2.11	Terpene synthase activity
0016829	$2.08 \times 10^{- 13}$	39	5	2.11	Lyase activity
0055114	$9.53 \times 10^{- 10}$	247	149	13.36	Oxidation-reduction process
0016747	$6.66 \times 10^{- 5}$	44	16	2.38	Transferase activity,transferring acyl groups other than amino-acyl groups
0000287	$1.23 \times 10^{- 4}$	42	15	2.27	Magnesium ion binding
0008152	$2.56 \times 10^{- 4}$	137	83	7.41	Metabolic process
0006952	$3.60 \times 10^{- 4}$	32	10	1.73	Defense response
0004674	$4.52 \times 10^{- 4}$	23	5	1.24	Protein serine/threoninekinase activity
0016758	$1.35 \times 10^{- 3}$	44	18	2.38	Transferase activity, transferringhexosyl groups
0005622	$4.14 \times 10^{- 3}$	14	42	0.76	Intracellular
0008146	$2.68 \times 10^{- 2}$	9	1	0.49	Sulfotransferase activity
0016760	$3.72 \times 10^{- 2}$	12	2	0.65	Cellulose synthase(UDP-forming) activity

Another potential explanation is reduced purifying selection. In fact, several characteristics of chromosome four suggest that it could experience less purifying selection than the rest of the genome. Gene density is markedly lower (Table 2 and Figure 7b), it harbors a higher proportion of repetitive sites (Table 2), and is enriched for many transposon families, including Copia and Gypsy elements (Supplementary file 8). Additionally, a higher proportion of genes on chromosome four were either not expressed or expressed at a low level (Figure 7—figure supplement 2). Gene models on the chromosome were also more likely to contain variants that could disrupt protein function (Table 2). Taken together, these observations suggest less purifying selection on chromosome four.

Table 2

Content of the A. coerulea v3.1 reference by chromosome

https://doi.org/10.7554/eLife.36426.023

	Chromosome							Genome
	1	2	3	4	5	6	7	Genome
Number of genes	5041	4390	4449	3149	4786	3292	4443	29550
Genes per Mb	112	102	104	69	107	108	102	100
Mean gene length (bp)	3629	3641	3689	3020	3712	3620	3708	3580
Percent repetitive	38.9	41.1	39.1	54.2	39.4	39.3	40.6	42.0
Percent genes withHIGH effect variant	25.3	23.8	23.6	32.3	24.1	22.1	23.6	24.7
Percent GC	36.8	37.0	36.9	37.0	37.1	36.8	36.8	37.0

Reduced purifying selection could also explain the putatively higher gene flow between A. oxysepala and A. japonica on chromosome four (Figure 6); the chromosome would be more permeable to gene flow if loci involved in the adaptive radiation were preferentially located on other chromosomes. Indeed, focusing on A. oxysepala/A. japonica gene flow, we found a negative relationship between introgression and gene density in the Aquilegia genome (Figure 7c, p-value = 2.202 x 10^-7, adjusted R-squared = 0.068), as would be expected if purifying selection limited introgression. Notably, this relationship is the same for chromosome four and the rest of the genome (p-value = $0.051$ ), suggesting that gene flow on chromosome four is higher simply because the gene density is lower.

However, the picture is very different for nucleotide diversity. While there is a negative relationship between gene density and neutral nucleotide diversity genome-wide (p-value = 5.174 x 10^-6, adjusted R²= 0.052), more careful analysis reveals that chromosome four has a completely different distribution from the rest of the genome (Figure 7d, p-value < 2 x 10^-16). In both cases, there is a weak (statistically insignificant) negative relationship between gene density and nucleotide diversity (chromsome four: p-value = 0.0814, adjusted R² = 0.0303, rest of the genome: p-value = 0.315 , adjusted R² = 3.373 x 10^-5), but nucleotide diversity is consistently much higher for chromosome four. Thus the genome-wide relationship reflects this systematic difference between chromosome four and the rest of the genome, and gene density differences alone are insufficient to explain higher polymorphism on chromosome four. Therefore, if reduced background selection explains higher polymorphism on this chromosome, something other than gene density must distinguish it from the rest of the genome. As noted above, there is reason to believe that purifying selection, in general, is lower on this chromosome.

For comparison with data from other organisms, we performed the partial correlation analysis of Corbett-Detig et al. (2015). Here we found a significant relationship between neutral diversity and recombination rate (without chromosome four, Kendall’s tau = 0.222, p-value = 3.804 x 10^-6), putting Aquilegia on the higher end of estimates of the strength of linked selection in herbaceous plants.

While selection during the Aquilegia radiation contributes to the pattern of polymorphism on chromosome four, the pattern itself predates the radiation. Divergence between Aquilegia and Semiaquilegia is higher on chromosome four (2.77% on chromosome four, 2.48% genome-wide, Table 3), as is nucleotide diversity within Semiaquilegia (0.16% chromosome four, 0.08% genome-wide, Table 3). This suggests that the variant evolutionary history of chromosome four began before the Aquilegia/Semiaquilegia split.

Table 3

Population genetics parameters for Semiaquilegia by chromosome

https://doi.org/10.7554/eLife.36426.024

	Percent pairwise differences
	Chromosome							Genome
	1	2	3	4	5	6	7	Genome
Polymorphism within Semiaquilegia	0.079	0.085	0.081	0.162	0.076	0.078	0.071	0.082
Divergence between Aquilegia and Semiaquilegia	2.46	2.47	2.47	2.77	2.48	2.47	2.47	2.48

The 35S and 5S rDNA loci are uniquely localized to chromosome four

The observation that one Aquilegia chromosome is different from the others is not novel; previous cytological work described a single nucleolar chromosome that appeared to be highly heterochromatic (Linnert, 1961). Using fluorescence in situ hybridization (FISH) with rDNA and chromosome four-specific bulked oligo probes (Han et al., 2015), we confirmed that both the 35S and 5S rDNA loci were localized uniquely to chromosome four in two Aquilegia species and S. adoxoides (Figure 8). The chromosome contained a single large 35S repeat cluster proximal to the centromeric region in all three species. Interestingly, the 35S locus in A. formosa was larger than that of the other two species and formed variable bubbles and fold-backs on extended pachytene chromosomes similar to structures previously observed in Aquilegia hybrids (Linnert, 1961) (Figure 8, last panels). The 5S rDNA locus was also proximal to the centromere on chromosome four, although slight differences in the number and position of the 5S repeats between species highlight the dynamic nature of this gene cluster. However, no chromosome appeared to be more heterochromatic than others in our analyses (Figure 8); FISH with 5-methylcytosine antibody showed no evidence for hypermethylation on chromosome four (Figure 8—figure supplement 1) and GC content was similar for all chromosomes (Table 2). However, similarities in chromosome four organization across all three species reinforce the idea that the exceptionality of this chromosome predated the Aquilegia/Semiaquilegia split and raise the possibility that rDNA clusters could have played a role in the variant evolutionary history of chromosome four.

Figure 8 with 1 supplement see all

Download asset Open asset

Cytogenetic characterization of chromosome four in *Semiaquilegia* and *Aquilegia* species.

Pachytene chromosome spreads were probed with probes corresponding to oligoCh4 (red), 35S rDNA (yellow), 5S rDNA (green) and two (peri)centromeric tandem repeats (pink). Chromosomes were counterstained with DAPI. Scale bars = 10 μm.

https://doi.org/10.7554/eLife.36426.025

Discussion

We constructed a reference genome for the horticultural cultivar Aquilegia coerulea ‘Goldsmith’ and resequenced ten Aquilegia species with the goal of understanding the genomics of ecological speciation in this rapidly diversifying lineage. Although our reference genome size is smaller than previous estimates ( $\sim$ 300 Mb versus $\sim$ 500 Mb, [Bennett et al., 1982; Bennett and Leitch, 2011]), the completeness and accuracy of our assembly (Supplementary file 4), as well as consistency between reference and k-mer based estimates of genome size (Supplementary file 9), suggest that this difference is likely due to highly repetitive content, including the large rDNA loci on chromosome four.

Variant sharing across the Aquilegia genus is widespread and deep, even across exceptionally large geographical distances. Although much of this sharing is presumably due to stochastic processes, as expected given the rapid time-scale of speciation, asymmetry of allele sharing demonstrates that the process of speciation has been reticulate throughout the genus, and that gene flow has been a common feature. Aquilegia species diversity therefore appears to be an example of ecological speciation, rather than being driven by the development of intrinsic barriers to gene flow (Coyne et al., 2004; Schluter and Conte, 2009; Seehausen et al., 2014). In the future, studies incorporating more taxa and/or population-level variation will provide additional insight into the dynamics of this process. Given the extent of variant sharing, it will be also be interesting to explore the role of standing variation and admixture in adaptation throughout the genus.

Our analysis also led to the remarkable discovery that the evolutionary history of an entire chromosome differed from that of the rest of the genome. The average level of polymorphism on chromosome four is roughly twice that of the rest of the genome and gene trees on this chromosome appear to reflect a different species relationship (Figure 3). To the best of our knowledge, with the possible exception of sex chromosomes (Toups and Hahn, 2010; Nam et al., 2015), such chromosome-wide patterns have never been observed before (although recombination has been shown to affect hybridization; see Schumer et al., 2018). Importantly, this chromosome is large and appears to be freely recombining, implying that these differences are unlikely to be due to a single evolutionary event, but rather reflect the accumulated effects of evolutionary forces acting differentially on the chromosome.

While no single explanation for the elevated polymorphism on chromosome four has emerged, selection clearly plays a role. Our results demonstrate that chromosome four could be affected by balancing selection as well as by reduced purifying and/or background selection. Future work will focus on clarifying the role and importance of each of these types of selection, and determining whether the rapid adaptive radiation in Aquilegia has played a role in accelerating the differences between chromosome four and the rest of the genome.

The chromosome four patterns, appear to predate the Aquilegia adaptive radiation, however, extending at least into the genus Semiaquilegia. Differences in gene content may thus be a proximal explanation for the higher polymorphism levels on chromosome four, but we still lack an explanation for why these differences would have been established on chromosome four in the first place. One possibility is that chromosome four is a reverted sex chromosome, a phenomenon that has been observed in Drosophila (Vicoso and Bachtrog, 2013). Although species with separate sexes exist in the Ranunculaceae, these transitions seem to be recent (Soza et al., 2012), and all Aquilegia and Semiaquilegia species are hermaphroditic. Furthermore, no heteromorphic sex chromosomes have been observed in the Ranunculales (Westergaard, 1958; Ming et al., 2011), making this an unlikely hypothesis. It has also been suggested that chromosome four is a fusion of two homeologous chromosomes (Linnert, 1961), as could result from the ancestral whole genome duplication (Cui et al., 2006; Vanneste et al., 2014; Tiley et al., 2016), however, analysis of synteny blocks shows that this is not the case (Aköz and Nordborg, 2018).

B chromosomes also have evolutionary histories that differ from those of other chromosomes. Like chromosome four, B chromosomes accumulate repetitive sequences and frequently contain rDNA loci (Jones, 1995; Green, 1990; Valente et al., 2017). However, chromosome four does not appear to be supernumerary, and unlike B chromosomes which seem to have only a few loci, chromosome four contains thousands of coding sequences (Table 2). Again, while it is impossible to rule out the hypothesis that chromosome four has been impacted by the reincorporation of B chromosomes into the A genome, this would be a novel phenomenon.

It is tempting to speculate that the distinct evolutionary history of chromosome four is connected to its large rDNA repeat clusters. Although rDNA clusters in Aquilegia and Semiaquilegia are consistently found on chromosome four, cytology demonstrates that the exact location of these loci is dynamic. Could the movement of these components somehow contribute to an accumulation of structural variants, copy number variants, and repeats that make chromosome four an inhospitable and unreliable place to harbor critical coding sequences? If so, then forces of genome evolution could underlie the more proximal causes (lower gene content and reduced selection) of increased polymorphism on chromosome four.

rDNA clusters could also have played a role in initiating chromosome four’s different evolutionary history. Cytological (Langlet, 1927; Langlet, 1932) and phylogenetic (Ro et al., 1997; Wang et al., 2009; Cossard et al., 2016) work separates the Ranunculaceae into two main subfamilies marked by different base chromosome numbers: the Thalictroideae (T-type, base n = 7, including Aquilegia and Semiaquilegia) and the Ranunculoideae (R-type, predominantly base n = 8). In the three T-type species tested here, the 35S is proximal to the centromere, a localization seen for only 3.5% of 35S sites reported in higher plants (Roa and Guerra, 2012). In contrast, all R-type species examined have terminal or subterminal 45S loci (Hizume et al., 2013; Mlinarec et al., 2006; Weiss-Schneeweiss et al., 2007; Liao et al., 2008). Given that 35S repeats can be fragile sites (Huang et al., 2008) and 35S rDNA clusters and rearrangement breakpoints co-localize (Cazaux et al., 2011), a 35S-mediated chromosomal break could explain differences in base chromosome number between R-type and T-type species. If the variant history of chromosome four can be linked to this this R- vs T-type split, this could implicate chromosome evolution as the initiator of chromosome four’s variant history. Comparative genomics work within the Ranunculaceae will therefore be useful for understanding the role that rDNA repeats have played in chromosome evolution and could provide additional insight into how rDNA could have contributed to chromosome four’s variant evolutionary history.

In conclusion, the Aquilegia genus is a beautiful example of adaptive radiation through ecological speciation. Although our current genome analyses based on a limited number of individuals and species, we see evidence that the radiation was shaped by introgression, selection, and the presence of abundant standing variation. On-going work focuses on understanding the contributions of each of these factors to adaptation in Aquilegia using population and quantitative genetics. Additionally, the unexpected variant evolutionary history of chromosome four, while still a mystery, illustrates that standard population genetics models are not always sufficient to the explain the pattern of variation across the genome. Future studies of chromosome four have the potential to increase our understanding of how genome evolution, chromosome evolution, and population genetics interact to generate organismal diversity.

Share this article

Cite this article

Distribution of Aquilegia species.

Polymorphism and divergence in Aquilegia.

Discordance between gene and species trees.

Sharing patterns of derived polymorphisms.

D statistics demonstrate gene flow during Aquilegia speciation.

Figure 5—source data 1

The effect of differences in coalescence time and gene flow on tree topologies.

Recombination and selection on chromosome four (a) Physical vs.

Figure 7—source data 1

GO term enrichment on chromosome four

Content of the A. coerulea v3.1 reference by chromosome

Population genetics parameters for Semiaquilegia by chromosome

Cytogenetic characterization of chromosome four in Semiaquilegia and Aquilegia species.

Author details

Danièle L Filiault

Contribution

Competing interests

Evangeline S Ballerini

Contribution

Competing interests

Terezie Mandáková

Contribution

Competing interests

Gökçe Aköz

Contribution

Competing interests

Nathan J Derieg

Contribution

Competing interests

Jeremy Schmutz

Contribution

Competing interests

Jerry Jenkins

Contribution

Competing interests

Jane Grimwood

Contribution

Competing interests

Shengqiang Shu

Contribution

Competing interests

Richard D Hayes

Contribution

Competing interests

Uffe Hellsten

Contribution

Competing interests

Kerrie Barry

Contribution

Competing interests

Juying Yan

Contribution

Competing interests

Sirma Mihaltcheva

Contribution

Competing interests

Miroslava Karafiátová

Contribution

Competing interests

Viktoria Nizhynska

Contribution

Competing interests

Elena M Kramer

Contribution

Competing interests

Martin A Lysak

Contribution

Competing interests

Scott A Hodges

Contribution

For correspondence

Competing interests

Magnus Nordborg

Contribution

For correspondence

Competing interests