Genetic variation, environment and demography intersect to shape Arabidopsis defense metabolite variation across Europe

  1. Ella Katz
  2. Jia-Jie Li
  3. Benjamin Jaegle
  4. Haim Ashkenazy
  5. Shawn R Abrahams
  6. Clement Bagaza
  7. Samuel Holden
  8. Chris J Pires
  9. Ruthie Angelovici
  10. Daniel J Kliebenstein  Is a corresponding author
  1. Department of Plant Sciences, University of California, Davis, United States
  2. Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter (VBC), Austria
  3. Department of Molecular Biology, Max Planck Institute for Developmental Biology, Germany
  4. Division of Biological Sciences, Bond Life Sciences Center, University of Missouri, United States
  5. Division of Biological Sciences, Interdisciplinary Plant Group, Christopher S. Bond Life Sciences Center, University of Missouri, United States
  6. DynaMo Center of Excellence, University of Copenhagen, Denmark
6 figures, 2 tables and 3 additional files

Figures

Parallel and convergent evolution.

The schema describes our use of parallel (A) and convergent (B) evolution for within-species chemotypic variation. The letters in the blue box represent the state of the source/ancestral haplotypes. …

Aliphatic glucosinolate (GSL) biosynthesis pathway.

Short names and structures of the GSLs are in black. Genes encoding the causal enzyme for each reaction (arrow) are in gray. GS-OX is a gene family of five or more genes. OH-But: 2-OH-3-Butenyl.

Figure 3 with 3 supplements
Glucosinolate variation across Europe is dominated by two loci.

(A) The accessions are plotted on the map based on their collection site and colored based on their principal component (PC)1 score. (B) Manhattan plot of genome-wideassociation analyses using PC1. …

Figure 3—figure supplement 1
Glucosinolate (GSL)-based principal component (PC) analysis.

(A) Percentage of variance explained by each PC. (B, C) Contribution of the individual GSLs to PC1 (B) and PC2 (C). Red bars: contribution of four carbon GSLs; blue bars: contribution of three …

Figure 3—figure supplement 2
Glucosinolate variation across Europe is dominated by two loci.

(A) The accessions were plotted on the map based on their collection site and colored based on their principal component (PC)2 score. (B) Manhattan plot of genome-wideassociation analyses using PC2. …

Figure 3—figure supplement 3
Manhattan plots of genome-wideassociation performed based on individual glucosinolate amounts as traits.

Horizontal lines represent 5% significance thresholds using Bonferroni (red) and Benjamini–Hochberg (blue).

Figure 4 with 4 supplements
Phenotypic classification based on glucosinolate (GSL) content.

(A) Using the GSL accumulation, each accession was classified to one of seven aliphatic short-chained GSL chemotypes based on the enzyme functions as follows: MAM2, AOP null: classified as 3MSO …

Figure 4—source data 1

Environmental conditions differentially associate with MAM status in the north versus the south.

Linear model for MAM status (carbon side chain length) was conducted with the indicated environmental parameters, for the northern and southern collection, separately (for more details, see Methods). The tables show p values for each term from the linear model. For the interaction with geography, the linear model was run using the total dataset, and the geography parameter (north or south) was added to the model.

https://cdn.elifesciences.org/articles/67784/elife-67784-fig4-data1-v2.pptx
Figure 4—figure supplement 1
Phenotypic classification based on the dominant MAM enzyme.

Accessions were classified based on the side chain length of the aliphatic short-chained glucosinolates (GSLs). Accessions with a majority of GSLs containing three carbons in their side chains are …

Figure 4—figure supplement 2
Phenotypic classification based on the dominant AOP enzymes.

Relative amounts of alkenyl glucosinolates (GSLs), alkyl GSLs and methylsulfinyl (MSO) GSLs were calculated in respect to the total short-chained aliphatic GSLs as described in the Methods section. …

Figure 4—figure supplement 3
Phenotypic classification based on GS-OH enzyme activity.

The ratio between 2-OH-3-Butenyl to 3-Butenyl glucosinolate (GSL) was calculated only for MAM1-dominant accessions (accessions with GSLs containing four carbons in their side chain). Accessions with …

Figure 4—figure supplement 4
Geographic partitioning of the collection.

(A) The accessions were divided to two collections using the following chain of mountains: the Pyrenees between Spain and France, the Alps between Italy and Germany, and the Carpathians in the …

Figure 5 with 5 supplements
MAM3 phylogeny.

(A) MAM3 phylogeny of Arabidopsis thaliana accessions, rooted by Arabidopsis lyrata MAMb, which is not shown because of distance. Tree tips are colored based on the accession chemotype. (B) The …

Figure 5—figure supplement 1
Support for the MAM3 tree clades classification.

(A) MAM3 phylogeny of 637 Arabidopsis thaliana accessions, rooted by Arabidopsis lyrata MAMb, excluding accessions with low-quality sequences. (BE) MAM3 phylogeny of different combinations of A. …

Figure 5—figure supplement 2
Genomic structure of the GS-Elong regions.

The GS-Elong locus from different accessions was sequenced, and the MAM1 and MAM2 structures were analyzed. The table indicates the dominant chemotype of each accession, the MAM status of each …

Figure 5—figure supplement 2—source data 1

Sequences of MAM locus.

Sequencing for the accessions in Figure 5—figure supplement 2 were generated. The extracted regions are 30000bps upstream of AT5G23000 (MYB37) and 60000 bps downstream of AT5G23020 (MAM3).

https://cdn.elifesciences.org/articles/67784/elife-67784-fig2-data2-v2.zip
Figure 5—figure supplement 3
MAM2 is an Arabidopsis thaliana specific gene.

Domain (A) and full sequence (B) amino acid phylogenies of the MAM/IPMS gene family. Sequences were taken from Abrahams et al., 2020, which uses Arabidopsis thalina Col-0 genome and the MAM2 amino …

Figure 5—figure supplement 4
Iberia Peninsula presents low phenotypic variability and high genetic variation.

(A) All accessions from Iberia were plotted, colored and shaped based on the side chain length of the aliphatic short chained GSLs. Accessions with a majority of GSLs containing 3 carbons in their …

Figure 5—figure supplement 5
Geographic distribution of MAM haplotypes.

The MAM phylogeny is split by the major clades/haplotypes and each sub-clade’s phylogeny is reflected on the map. Tree tips are colored based on the accessions chemotype.

Figure 6 with 1 supplement
AOP genomic structure.

The genomic structure and causality of the major AOP2/AOP3 haplotypes are illustrated. Pink arrows show the AOP2 gene while yellow arrows represent AOP3. The black arrows represent the direction of …

Figure 6—figure supplement 1
AOP phylogeny.

Separate phylogenies of AOP2 (A) and AOP3 (B) across Arabidopsis thaliana accessions. The trees are rooted by the matching gene in Arabidopsis lyrata, which is not shown because of distance. Tree …

Tables

Table 1
Environmental conditions differentially associate with major chemotypes across geographic location.

Linear model for the two major chemotypes, Allyl and 2-OH-3-Butenyl, was conducted with the indicated environmental parameters, for the northern and southern collection, separately (for more …

Environmental parameterEffect on chemotype – northEffect on chemotype – southInteraction with geography
Genomic group<0.0001<0.0001<0.0001
Max temperature of warmest month0.0382<0.00010.3574
Min temperature of coldest month0.0007<0.00010.0049
Precipitation of wettest month0.16450.00030.0094
Precipitation of driest month0.06650.20260.47425
Distance to the coast0.27810.026800.1279
Table 2
GSOH structure.

The structures of GS-OH in the 3-Butenyl accessions are illustrated. Gray boxes represent exons, and blue lines represent introns. Black line represents a mutation, and gray lines represent unknown …

AccessionType of mutationAllele structureFraction (out of C4 Alkenyl accessions)Observed frequency (out of non-C4 Alkenyl accessions)
Sorbo, PienPolymorphism at SNP108313020.009 (2/226)0.067 (38/564)
Cvi-0Active site mutation0.004 (1/226)0.025 (14/564)
IP-Mot-0, IP-Tri-0Gene deletion0.009 (2/226)0.055 (31/564)
Multiple accessions (T670, FlyA-3, Ting-1, T880, T710, T850)Unidentified mutations0.026 (6/226)Unknown

Additional files

Supplementary file 1

GSLs data.

(A) List of GSLs and structures. (B) Accessions and glucosinolate (GSL) data – raw data. (C) Heritability values. (D) Accessions and GSL data – emmeans.

https://cdn.elifesciences.org/articles/67784/elife-67784-supp1-v2.xlsx
Supplementary file 2

SNPs in glucosinolates (GSLs)-related loci under different genome-wide association (GWA) studies: GSL values were used as traits to conduct GWA studies.

The number of significant SNPs in the GSLs related loci (columns c to o) was counted for each GWA study separately. Rows 2–3: common name and AT number of gene/s in the loci. Rows 4–5: upstream and downstream positions of the relevant loci (10 kb were added upstream and downstream of the genes). Rows 6–33: GSLs traits used for GWA studies. In black: number of SNPs with p value between 0.00001 and 0.0000001. In red: number of SNPs with p value equal or smaller than 0.0000001.

https://cdn.elifesciences.org/articles/67784/elife-67784-supp2-v2.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/67784/elife-67784-transrepform1-v2.docx

Download links