The Aquilegia genome provides insight into adaptive radiation and reveals an extraordinarily polymorphic chromosome with a unique history

  1. Danièle L Filiault
  2. Evangeline S Ballerini
  3. Terezie Mandáková
  4. Gökçe Aköz
  5. Nathan J Derieg
  6. Jeremy Schmutz
  7. Jerry Jenkins
  8. Jane Grimwood
  9. Shengqiang Shu
  10. Richard D Hayes
  11. Uffe Hellsten
  12. Kerrie Barry
  13. Juying Yan
  14. Sirma Mihaltcheva
  15. Miroslava Karafiátová
  16. Viktoria Nizhynska
  17. Elena M Kramer
  18. Martin A Lysak
  19. Scott A Hodges  Is a corresponding author
  20. Magnus Nordborg  Is a corresponding author
  1. Vienna BioCenter, Austria
  2. University of California, United States
  3. Masaryk University, Czech Republic
  4. Vienna Graduate School of Population Genetics, Austria
  5. Joint Genome Institute, United States
  6. HudsonAlpha Institute of Biotechnology, United States
  7. Centre of the Region Haná for Biotechnological and Agricultural Research, Czech Republic
  8. Harvard University, United States
8 figures, 3 tables and 15 additional files


Figure 1 with 1 supplement
Distribution of Aquilegia species.

There are ~70 species in the genus Aquilegia, broadly distributed across temperate regions of the Northern Hemisphere (grey). The 10 Aquilegia species sequenced here were chosen as representatives spanning this geographic distribution as well as the diversity in ecological habitat and pollinator-influenced floral morphology of the genus. Semiaquilegia adoxoides, generally thought to be the sister taxon to Aquilegia (Fior et al., 2013), was also sequenced. A representative photo of each species is shown and is linked to its approximate distribution.
Figure 1—figure supplement 1
Origin of species samples used for sequencing.
Figure 2 with 2 supplements
Polymorphism and divergence in Aquilegia.

(a) The percentage of pairwise differences within each species (estimated from individual heterozygosity) and between species (divergence). FST values between geographic regions are given on the lower half of the pairwise differences heatmap. Both heatmap axes are ordered according to the neighbor joining tree to the left. This tree was constructed from a concatenated data set of reliably-called genomic positions. (b) Polymorphism within each sample by chromosome. Per-chromosome values are indicated by the chromosome number.
Figure 2—figure supplement 1
Polymorphism across the genome in all ten species samples.
Figure 2—figure supplement 2
Species and chromosome trees of Aquilegia.
Figure 3 with 3 supplements
Discordance between gene and species trees.

(a) Cloudogram of neighbor joining (NJ) trees constructed in 100 kb windows across the genome. The topology of each window-based tree is co-plotted in grey and the whole genome NJ tree shown in Figure 2a is superimposed in black. Blue numbers indicate the percentage of window trees that contain each of the subtrees observed in the whole genome tree. (b) Genome NJ tree topology. Blue letters a-c on the tree denote subtrees a-c in panel (d). (c) Chromosome four NJ tree topology. Blue letters d and e on the tree denote subtrees d and e in panel (d). (d) Prevalence of each subtree that varied significantly by chromosome. Genomic (black bar) and per chromosome (chromosome number) values are given.
Figure 3—figure supplement 1
Proportion of significantly-varying subtrees by chromosome.
Figure 3—figure supplement 2
P-values of proportion tests by chromosome for significantly-different trees.
Figure 3—figure supplement 3
Subtree prevalence across chromosomes for the nine significantly-different subtrees.
Figure 4 with 1 supplement
Sharing patterns of derived polymorphisms.

Proportion of derived variants (a) private to an individual species, (b) shared within the geographic region of origin, (c) shared across two geographic regions, and (d) shared across all three geographic regions. Genomic (black bar) and chromosome (chromosome number) values, for all 10 species.
Figure 4—figure supplement 1
Sharing pattern percentages by pattern type.
D statistics demonstrate gene flow during Aquilegia speciation.

D statistics for tests with (a–c) all North American species, (d) both European species, (e) Asian species other than A. oxysepala, and (f) A. oxysepala as H3 species. All tests use S. adoxoides as the outgroup. D statistics outside the green shaded areas are significantly different from zero. In (a–e), each individual dot represents the D statistic for a test done with a unique species combination. In (f), D statistics are presented by chromosome (chromosome number) or by the genome-wide value (black bar). In all panels, E = European and A = Asian without A. oxysepala. In some cases, individual species names are given when the geographical region designation consists of a single species. Right hand panels are a graphical representation of the D statistic tests in the corresponding left hand panels. Trees are a simplified version of the genome tree topology (Figure 2b), in which the bold sub tree(s) represent the bifurcation considered in each set of tests. H3 species are noted in blue while the H1 and H2 species are specified in black. (Figure 5—source data 1).
Figure 6 with 2 supplements
The effect of differences in coalescence time and gene flow on tree topologies.

(a) The observed proportion of informative derived variants supporting each possible Asian tree topology genome-wide and on chromosome four. Species considered include A. oxysepala (oxy), A. japonica (japon), and A. sibirica (sib). (b) The coalescent model with bidirectional gene flow in which A. oxysepala diverges first at time t2, but later hybridizes with A. japonica between t = 0 and t1 at a rate determined by per-generation migration rate, m. The population size (N) remains constant at all times. (c) The proportion of each tree topology and estimated D statistic for simulations using four combinations of m and Nvalues (t1 = 1 in units of N generations). The combination presented in the first row (m = 2x10-5 and N = 11667) generates tree topology proportions that match observed allele sharing proportions genomewide. Simulations with increased m and/or N (rows 3–4) result in proportions which more closely resemble those observed for chromosome four. Colors in proportion plots refer to tree topologies in (a), with black bars representing the residual probability of seeing no coalescence event. While this simulation assumes symmetric gene flow, similar results were seen for models incorporating both unidirectional and asymmetric gene flow (Figure 6—figure supplements 1 and 2).
Figure 6—figure supplement 1
Model output for all three gene flow scenarios.
Figure 6—figure supplement 2
Tree topology proportions simulated under assymmetric and unidirectional models.
Figure 7 with 2 supplements
Recombination and selection on chromosome four (a) Physical vs.

genetic distance for all chromosomes calculated in an A. formosa x A. pubescens mapping population. High nucleotide diversity on chromosome four was also observed in parental plants of this population (Figure 7—figure supplement 1. (b) Relationship between gene density (proportion exonic) and recombination rate (main effect p-value < 2 x 10-16, chromosome four effect p-value < 2 x 10-16, interaction p-value < 1.936 x 10-11, adjusted R2 = 0.8045). (c) Relationship between gene density and D statistic for A. oxysepala and A. japonica gene flow. (d) Relationship between gene density and mean neutral nucleotide diversity. Figure 7—source data 1.
Figure 7—source data 1

(Physical and genetic distance for A.formosa x A.pubescens markers).
Figure 7—figure supplement 1
Polymorphism in the A. formosa x A. pubescens mapping population.
Figure 7—figure supplement 2
Distribution of gene expression values by chromosome.
Figure 8 with 1 supplement
Cytogenetic characterization of chromosome four in Semiaquilegia and Aquilegia species.

Pachytene chromosome spreads were probed with probes corresponding to oligoCh4 (red), 35S rDNA (yellow), 5S rDNA (green) and two (peri)centromeric tandem repeats (pink). Chromosomes were counterstained with DAPI. Scale bars = 10 μm.
Figure 8—figure supplement 1
Immunodetection of anti-5mC antibody.


Table 1
GO term enrichment on chromosome four
Number on Chr_04Percent of
Chr_04 genes
GO term


14097.57ADP binding


179399.68Oxidoreductase activity, actingon paired donors, withincorporation or reduction of
molecular oxygen


158328.55Monooxygenase activity


181469.79Iron ion binding


1865310.06Heme binding


3942.11Terpene synthase activity


3952.11Lyase activity


24714913.36Oxidation-reduction process


44162.38Transferase activity,transferring acyl groups other
than amino-acyl groups


42152.27Magnesium ion binding


137837.41Metabolic process


32101.73Defense response


2351.24Protein serine/threoninekinase activity


44182.38Transferase activity, transferringhexosyl groups




910.49Sulfotransferase activity


1220.65Cellulose synthase(UDP-forming) activity
Table 2
Content of the A. coerulea v3.1 reference by chromosome

Number of genes504143904449314947863292444329550
Genes per Mb11210210469107108102100
Mean gene length (bp)36293641368930203712362037083580
Percent repetitive38.941.
Percent genes withHIGH effect variant25.323.823.632.324.122.123.624.7
Percent GC36.837.036.937.037.136.836.837.0
Table 3
Population genetics parameters for Semiaquilegia by chromosome

Percent pairwise differences
Polymorphism within Semiaquilegia0.0790.0850.0810.1620.0760.0780.0710.082
Divergence between Aquilegia and Semiaquilegia2.462.472.472.772.482.472.472.48

Additional files

Supplementary file 1

Genomic libraries included in the A. coerulea genome assembly and their respective assembled sequence coverage levels in the A. coerulea v3.1 release
Supplementary file 2

Summary statistics of the output of the whole genome shotgun assembly prior to screening, removal of organelles and contaminating scaffolds and chromosome-scale pseudomolecule construction
Supplementary file 3

Final summary assembly statistics for chromosome-scale assembly
Supplementary file 4

Placement of the individual BAC clones and their contribution to the overall error rate
Supplementary file 5

RNAseq data sets used for gene annotation
Supplementary file 6

Ratio of polymorphism or divergence on each chromosome versus genome-wide for each species
Supplementary file 7

Robustness of nucleotide diversity patterns to copy number variant detection methods
Supplementary file 8

Repeat family prevalence and permutation results in the A. coerulea v3.1 genome release
Supplementary file 9

K-mer based estimates of genome size and repetitive sequence proportion
Supplementary file 10

Mean and median coverage by species
Supplementary file 11

Proportion of sites removed by each filter - initial filtration without Semiaquilegia
Supplementary file 12

Proportion of sites removed by each filter - final filtration with Semiaquilegia
Supplementary file 13

Number of derived variants by species
Supplementary file 14

Transition matrix for the Five-State Markov process
Supplementary file 15

Transition matrix for the Eight-State Markov process

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Danièle L Filiault
  2. Evangeline S Ballerini
  3. Terezie Mandáková
  4. Gökçe Aköz
  5. Nathan J Derieg
  6. Jeremy Schmutz
  7. Jerry Jenkins
  8. Jane Grimwood
  9. Shengqiang Shu
  10. Richard D Hayes
  11. Uffe Hellsten
  12. Kerrie Barry
  13. Juying Yan
  14. Sirma Mihaltcheva
  15. Miroslava Karafiátová
  16. Viktoria Nizhynska
  17. Elena M Kramer
  18. Martin A Lysak
  19. Scott A Hodges
  20. Magnus Nordborg
The Aquilegia genome provides insight into adaptive radiation and reveals an extraordinarily polymorphic chromosome with a unique history
eLife 7:e36426.