Research Article

Independent evolution of ancestral and novel defenses in a genus of toxic plants (Erysimum, Brassicaceae)

Institute of Plant Sciences, University of Bern, Switzerland
Boyce Thompson Institute, United States
Division of Biological Sciences, University of Missouri, United States
Institut für Insektenbiotechnologie, Justus-Liebig-Universität Giessen, Germany
Department of Functional and Evolutionary Ecology, Estación Experimental de Zonas Áridas (EEZA-CSIC), Spain
Research Unit Modeling Nature, Department of Genetics, University of Granada, Spain
Department of Chemical Ecology, Bielefeld University, Germany

Apr 7, 2020

Open access
Copyright information

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Phytochemical diversity is thought to result from coevolutionary cycles as specialization in herbivores imposes diversifying selection on plant chemical defenses. Plants in the speciose genus Erysimum (Brassicaceae) produce both ancestral glucosinolates and evolutionarily novel cardenolides as defenses. Here we test macroevolutionary hypotheses on co-expression, co-regulation, and diversification of these potentially redundant defenses across this genus. We sequenced and assembled the genome of E. cheiranthoides and foliar transcriptomes of 47 additional Erysimum species to construct a phylogeny from 9868 orthologous genes, revealing several geographic clades but also high levels of gene discordance. Concentrations, inducibility, and diversity of the two defenses varied independently among species, with no evidence for trade-offs. Closely related, geographically co-occurring species shared similar cardenolide traits, but not glucosinolate traits, likely as a result of specific selective pressures acting on each defense. Ancestral and novel chemical defenses in Erysimum thus appear to provide complementary rather than redundant functions.

eLife digest

Plants are often attacked by insects and other herbivores. As a result, they have evolved to defend themselves by producing many different chemicals that are toxic to these pests. As producing each chemical costs energy, individual plants often only produce one type of chemical that is targeted towards their main herbivore. Related species of plants often use the same type of chemical defense so, if a particular herbivore gains the ability to cope with this chemical, it may rapidly become an important pest for the whole plant family.

To escape this threat, some plants have gained the ability to produce more than one type of chemical defense. Wallflowers, for example, are a group of plants in the mustard family that produce two types of toxic chemicals: mustard oils, which are common in most plants in this family; and cardenolides, which are an innovation of the wallflowers, and which are otherwise found only in distantly related plants such as foxglove and milkweed. The combination of these two chemical defenses within the same plant may have allowed the wallflowers to escape attacks from their main herbivores and may explain why the number of wallflower species rapidly increased within the last two million years.

Züst et al. have now studied the diversity of mustard oils and cardenolides present in many different species of wallflower. This analysis revealed that almost all of the tested wallflower species produced high amounts of both chemical defenses, while only one species lacked the ability to produce cardenolides. The levels of mustard oils had no relation to the levels of cardenolides in the tested species, which suggests that the regulation of these two defenses is not linked. Furthermore, Züst et al. found that closely related wallflower species produced more similar cardenolides, but less similar mustard oils, to each other. This suggests that mustard oils and cardenolides have evolved independently in wallflowers and have distinct roles in the defense against different herbivores.

The evolution of insect resistance to pesticides and other toxins is an important concern for agriculture. Applying multiple toxins to crops at the same time is an important strategy to slow the evolution of resistance in the pests. The findings of Züst et al. describe a system in which plants have naturally evolved an equivalent strategy to escape their main herbivores. Understanding how plants produce multiple chemical defenses, and the costs involved, may help efforts to breed crop species that are more resistant to herbivores and require fewer applications of pesticides.

Introduction

Plant chemical defenses play a central role in the coevolutionary arms race with herbivorous insects. In response to diverse environmental challenges, plants have evolved a plethora of structurally diverse organic compounds with repellent, antinutritive, or toxic properties (Fraenkel, 1959; Mithöfer and Boland, 2012). Chemical defenses can impose barriers to consumption by herbivores, but in parallel may favor the evolution of specialized herbivores that can tolerate or disable these defenses (Cornell and Hawkins, 2003). Chemical diversity is likely evolving in response to a multitude of plant-herbivore interactions (Salazar et al., 2018), and community-level phytochemical diversity may be a key driver of niche segregation and insect community dynamics (Richards et al., 2015; Sedio et al., 2017).

For individual plants, the production of diverse mixtures of chemicals is often considered advantageous (Romeo, 1996; Firn and Jones, 2003; Gershenzon et al., 2012; Forbey et al., 2013; Richards et al., 2016). For example, different chemicals may target distinct herbivores (Iason et al., 2011; Richards et al., 2015), or may act synergistically to increase overall toxicity of a plant (Steppuhn and Baldwin, 2007). However, metabolic constraints can limit the extent of phytochemical diversity within individual plants (Firn and Jones, 2003). Most defensive metabolites originate from a small group of precursor compounds and conserved biosynthetic pathways, which are modified in a hierarchical process into diverse, species-specific end products (Moore et al., 2014). As constraints are likely strongest for the early stages of these pathways, related plant species commonly share the same functional ‘classes’ of defensive chemicals (Wink, 2003), but vary considerably in the number of compounds within each class (Fahey et al., 2001; Rasmann and Agrawal, 2011).

Functional conservatism in defensive chemicals among related plants should facilitate host expansion and the evolution of tolerance in herbivores (Cornell and Hawkins, 2003), as specialized resistance mechanisms against one type of compound are more likely to be effective against structurally similar than structurally dissimilar compounds. This may result in a seemingly paradoxical scenario, wherein well-defended plants are nonetheless attacked by a diverse community of specialized herbivores (Agrawal, 2005; Bidart-Bouzat and Kliebenstein, 2008). For example, most plants in the Brassicaceae produce glucosinolates as their primary defense, which upon activation by myrosinase (thioglucoside glucohydrolase) enzymes upon leaf damage become potent repellents of many herbivores (Fahey et al., 2001). However, despite the potency of this defense system and the large diversity of glucosinolates produced by the Brassicaceae, several specialized herbivores have evolved strategies to overcome this defense, enabling them to consume most Brassicaceae and even to sequester glucosinolates for their own defense against predators (Müller, 2009; Winde and Wittstock, 2011).

Plants may occasionally overcome the constraints on functional diversification and gain the ability to produce new classes of defensive chemicals as a ‘second line of defense’ (Feeny, 1977). Although this phenomenon is likely widespread across the plant kingdom, it has most commonly been reported from the well-studied Brassicaceae. In addition to producing evolutionarily ancestral glucosinolates, plants in this family have gained the ability to produce saponins in Barbarea vulgaris (Shinoda et al., 2002), alkaloids in Cochlearia officinalis (Brock et al., 2006), cucurbitacins in Iberis spp. (Nielsen, 1978b), alliarinoside in Alliaria petiolata (Frisch and Møller, 2012), and cardenolides in the genus Erysimum (Makarevich et al., 1994). These recently-evolved chemical defenses with modes of action distinct from glucosinolates have likely allowed the plants to escape attack from specialized, glucosinolate-adapted herbivores (Nielsen, 1978b; Dimock et al., 1991; Haribal and Renwick, 2001; Shinoda et al., 2002). Gains of novel defenses are expected to result in a release from selective pressures imposed by specialized antagonists, and thus may represent key steps in herbivore-plant coevolution that lead to rapid phylogenetic diversification (Weber and Agrawal, 2014).

The production of cardenolides by species in the genus Erysimum is one of the longest- and best-studied examples of an evolutionarily recent gain of a novel chemical defense (Jaretzky and Wilcke, 1932; Nagata et al., 1957; Singh and Rastogi, 1970; Makarevich et al., 1994). Cardenolides are a type of cardiac glycoside, which act as allosteric inhibitors of Na⁺/K⁺-ATPase, an essential membrane ion transporter that is expressed ubiquitously in animal cells (Agrawal et al., 2012). Cardiac glycosides are produced by plants in approximately sixty genera belonging to twelve plant families, and several cardiac glycoside-producing plants are known for their toxicity or medicinal uses (Agrawal et al., 2012; Züst et al., 2018). Erysimum is a species-rich genus consisting of diploid and polyploid species with diverse morphologies, growth habits, and ecological niches (Al-Shehbaz, 1988; Polatschek and Snogerup, 2002; Al-Shehbaz, 2010; Gómez et al., 2015). Of the Erysimum species evaluated to date, all produce some of the novel cardenolide defenses (Makarevich et al., 1994). Previous phylogenetic studies suggest a recent and rapid diversification of the genus, with most species divergence occurring within the last 2–3 million years (Gómez et al., 2014; Moazzeni et al., 2014), resulting in 150 to 350 extant species (Polatschek and Snogerup, 2002; Al-Shehbaz, 2010). The large uncertainty in species number reflects taxonomic challenges in this genus, which includes many species that readily hybridize, as well as cryptic species with near-identical morphology (Abdelaziz et al., 2011).

In most Erysimum species, cardenolides appear to have enabled an escape from at least some glucosinolate-adapted specialist herbivores. Cardenolides in Erysimum act as oviposition and feeding deterrents for different pierid butterflies (Chew, 1975; Chew, 1977; Wiklund et al., 1978; Renwick et al., 1989; Dimock et al., 1991), and several glucosinolate-adapted beetles (Phaedon spp. and Phyllotreta spp.) were deterred from feeding by dietary cardenolides at levels commonly found in Erysimum (Nielsen, 1978a; Nielsen, 1978b). Nonetheless, Erysimum plants are still attacked by a range of herbivores and seed predators, including some mammals and several glucosinolate-adapted aphids, true bugs, and lepidopteran larvae (Gómez, 2005; Züst et al., 2018). These herbivores likely rely on general detoxification mechanisms for tolerance of the novel defense, while to date there are no reports of de novo gains of specialized cardenolide resistance in Erysimum herbivores. However, the gain of the novel defense may have facilitated host shifts in at least one cardenolide-adapted herbivore: in addition to its main host Digitalis purpurea (Plantaginaceae), the seed-feeding bug Horvathiolus superbus (Lygaeinae) commonly feeds on seeds of E. crepidifolium, and is able to sequester cardenolides from both sources (Georg Petschenka, personal observations).

The gain of a novel chemical defense makes the genus Erysimum an excellent model system to study the causes and consequences of phytochemical diversification (Züst et al., 2018). While an increasing number of studies are beginning to describe taxon-wide patterns of chemical diversity in plants (e.g., Richards et al., 2015; Sedio et al., 2017; Salazar et al., 2018), the Erysimum system is unique in combining two classes of plant metabolites with primarily defensive function – although a broader role of glucosinolates is increasingly recognized (e.g., Katz et al., 2015). The system thus is ideally suited to evaluate the evolutionary consequences of co-expressing two functionally distinct but potentially redundant defenses. Here, we present a high-quality genome sequence assembly and annotation for the short-lived annual E. cheiranthoides as an important resource for future molecular studies in this system. Furthermore, we present a new phylogeny for 48 species constructed from transcriptome sequences (Figure 1), corresponding to 10–30% of species in the genus Erysimum. We combine this phylogeny with a characterization of the full diversity of glucosinolates and cardenolides in leaves to evaluate macroevolutionary patterns in the evolution of phytochemical diversity across the genus. We complemented the characterization of defensive phenotypes by quantifying glucosinolate-activating myrosinase activity, inhibition of animal Na⁺/K⁺-ATPase by leaf extracts, and defense inducibility in response to exogenous application of jasmonic acid (JA). By assessing co-variation of diversity, abundance and inducibility of ancestral and novel defenses, we provide evidence that these two classes of defense metabolites evolved in response to different selective pressures and appear to have specific, non-redundant functions.

Figure 1

Download asset Open asset

Geographic location of *Erysimum* species source populations used for transcriptome sequencing.

(A) Source populations in Europe. Inset: The Canary Islands (28°N, 16°W) are located further westward and southward than drawn in this map. Green symbols are exact collection locations, while blue symbols indicate approximate locations based on species distributions. Seeds of the originally Mediterranean species *E. cheiri* (CHR, orange symbol) were collected from a naturalized population in the Netherlands. (B) Source populations in North America. Five species/accessions (ALI, ER1, ER2, ER3, ER4) could not be placed on the map due to uncertain species identity. See Supplementary file 1 for more details.

Results

E. cheiranthoides genome assembly

A total of 39.5 Gb of PacBio sequences with an average read length of 10,603 bp were assembled into 1087 contigs with an N50 of 1.5 Mbp (Table 1). Hi-C scaffolding oriented 98.5% of the assembly into eight large scaffolds representing pseudomolecules (Table 1, Figure 2—figure supplement 1), while 216 small contigs remained unanchored. The final assembly (v1.2) had a total length of 174.5 Mbp, representing 86% of the estimated genome size of E. cheiranthoides and capturing 99% of the BUSCO gene set (Table 1, Figure 2—figure supplement 2). Sequences were deposited under GenBank project ID PRJNA563696 and additionally are provided at www.erysimum.org. A total of 29,947 gene models were predicted and captured 98% of the BUSCO gene set (Figure 2—figure supplement 3). In the presumed centromere regions of each chromosome, genic sequences were less abundant, whereas repeat sequences were more common (Figure 2A). Repetitive sequences constituted approximately 29% of the genome (Supplementary file 2). Long terminal repeat retrotransposons (LTR-RT) made up the largest proportion of the repeats identified (Figure 2—figure supplement 4). Among these, repeats in the Gypsy superfamily constituted the largest fraction of the genome (Supplementary file 2). The majority of the LTR elements appeared to be relatively young, with most having estimated insertion times of less than 1 MYA (Figure 2—figure supplement 5). Synteny analysis showed evidence of several chromosomal fusions and fissions between the eight chromosomes of E. cheiranthoides and the five chromosomes of Arabidopsis (Figure 2B).

Table 1

Assembly metrics for the E. cheiranthoides genome: v0.9=Falcon +Arrow assembly results, v1.2=genome assembly after Hi-C scaffolding and Pilon correction.

	v0.9	v1.2 pseudomolecules and contigs	v1.2 pseudomolecules only
total length (Mbp)	177.4	177.2	174.5
expected size (Mbp)	205	205	205
number of contigs	1087	224	8
N50 (Mbp)	1.5	22.4	22.4
complete BUSCOs (out of 1,375)	1359	1346	1356
complete and single copy BUSCOs (out of 1,375)	1271	1300	1306
complete and duplicated BUSCOs (out of 1,375)	88	46	50
fragmented BUSCOs (out of 1,375)	5	8	6
missing BUSCOs (out of 1,375)	11	21	13

Figure 2 with 5 supplements see all

Download asset Open asset

Visualization of the *E.cheiranthoides* genome assembly.

(A) Circos plot of the *E. cheiranthoides* genome with gene densities (outer circle) and repeat densities (inner circle) shown as histogram tracks. Densities are calculated as percentages for 1 Mb windows. (B) Synteny plot of *E. cheiranthoides* and *A. thaliana*. Lines between chromosomes connect aligned sequences between the two genomes.

Glucosinolate and myrosinase genes in the E. cheiranthoides genome

Three aliphatic glucosinolates – glucoiberverin (3-methylthiopropyl glucosinolate), glucoiberin (3-methylsulfinylpropyl glucosinolate), and glucocheirolin (3-methylsulfonylpropyl glucosinolate) – have been reported as the main glucosinolates in E. cheiranthoides (Cole, 1976; Huang et al., 1993). We confirmed their dominance in E. cheiranthoides var. Elbtalaue, but also identified additional aliphatic and indole glucosinolates at lower concentrations. By making use of the known glucosinolate biosynthetic genes from Arabidopsis (Hull et al., 2000; Mikkelsen et al., 2000; Bak and Feyereisen, 2001; Bak et al., 2001; Kliebenstein et al., 2001b; Kroymann et al., 2001; Reintanz et al., 2001; Chen et al., 2003; Kroymann et al., 2003; Naur et al., 2003; Grubb et al., 2004; Mikkelsen et al., 2004; Piotrowski et al., 2004; Textor et al., 2004; Nozawa et al., 2005; Klein et al., 2006; Schuster et al., 2006; Hansen et al., 2007; Textor et al., 2007; Hansen et al., 2008; Knill et al., 2008; Li et al., 2008; Farquharson, 2009; Geu-Flores et al., 2009; Gigolashvili et al., 2009; He et al., 2009; Klein and Papenbrock, 2009; Knill et al., 2009; Pfalz et al., 2009; Sawada et al., 2009; He et al., 2010; Geu-Flores et al., 2011; He et al., 2011; Pfalz et al., 2011; Lächler et al., 2015; Kong et al., 2016; Pfalz et al., 2016), BLASTn comparisons to the E. cheiranthoides genome, and creating phylogenetic trees to compare nucleotide coding sequences of Arabidopsis, Brassica, and E. cheiranthoides, we identified homologs of genes encoding both indole (Figure 3, Figure 3—figure supplements 1–7) and aliphatic (Figure 4—figure supplements 1–8) glucosinolate biosynthetic enzymes.

Figure 3 with 7 supplements see all

Download asset Open asset

Identification of known indole glucosinolate biosynthetic genes and glucosinolate-modifying genes from Arabidopsis in *Erysimum cheiranthoides*.

(A) Starting with tryptophan, indole glucosinolates are synthesized using some enzymes that also function in aliphatic glucosinolate biosynthesis (GGP1; SUR1; UGT74B1) while also using indole glucosinolate-specific enzymes. (B) Indole glucosinolates can be modified by hydroxylation and subsequent methylation. Red square brackets indicate where gene copy numbers differ between Arabidopsis and *E. cheiranthoides*. Glucosinolates with names highlighted in blue were identified in *Erysimum cheiranthoides* var. *Elbtalaue*. Abbreviations: cytochrome P450 monooxygenase (CYP); glutathione S-transferase F (GSTF); glutathione (GSH); γ-glutamyl peptidase 1 (GGP1); SUPERROOT 1 C-S lyase (SUR1); UDP-dependent glycosyltransferase (UGT); sulfotransferase (SOT); glucosinolate (GS); indole glucosinolate methyltransferase (IGMT).

Homologs of all genes of the biosynthetic pathway for glucobrassicin (indol-3-ylmethyl glucosinolate) and its 4-hydroxy and 4-methoxy derivatives were present in E. cheiranthoides (Figure 3). Consistent with the absence of neoglucobrassicin (1-methoxy-indol-3-ylmethyl glucosinolate) in E. cheiranthoides var. Elbtalaue, we did not find homologs of the Arabidopsis genes encoding the biosynthesis of this compound.

Genes encoding the complete biosynthetic pathway of the E. cheiranthoides aliphatic glucosinolates glucoiberverin, glucoiberin, glucoerucin (4-methylthiobutyl glucosinolate), and glucoraphanin (4-methylsulfinylbutyl glucosinolate) were present in the genome (Figure 4, Figure 4—figure supplements 1–6). Because the E. cheiranthoides methylsulfonyl glucosinolates glucocheirolin, glucoerysolin (4-methylsulfonylbutyl glucosinolate), and 3-hydroxy-4-methylsulfonylbutyl glucosinolate are not present in Arabidopsis, genes encoding their biosynthesis are unknown and could not be identified as part of this study. The E. cheiranthoides genome contains genes with similarity to Arabidopsis ALKENYL HYDROXALKYL PRODUCING (AOP2 and AOP3), and 3-BUTENYL GLUCOSINOLATE 2-HYDROXYLASE (GS-OH) (Figure 4—figure supplements 7 and 8). However, the apparent absence of sinigrin (2-propenyl glucosinolate), 2-hydroxypropyl glucosinolate, progoitrin (2-hydroxy-3-butenyl glucosinolate), and 4-hydroxybutyl glucosinolate in E. cheiranthoides (Figure 4), suggests that the encoded enzymes of these genes have other functions. CYP79A2, which functions in the biosynthesis of benzyl glucosinolates that are present in very small amounts in seeds of Arabidopsis ecotype Columbia (Wittstock and Halkier, 2000), has a homolog in the E. cheiranthoides genome (Figure 3—figure supplement 1). Although we did not observe benzyl glucosinolates in E. cheiranthoides, this lack of detection could be due to assay sensitivity or not testing all tissue types. Homologs of the Arabidopsis CYP79C1 and CYP79C2 genes, which have unknown functions but are hypothesized to be involved in glucosinolate biosynthesis (Halkier and Gershenzon, 2006), are present in the E. cheiranthoides genome (Figure 3—figure supplement 1). CYP79D2 from cassava catalyzed the formation of valine- and isoleucine-derived glucosinolates in Arabidopsis (Mikkelsen and Halkier, 2003), yet no CYP79D genes appear to be present in E. cheiranthoides (Figure 3—figure supplement 1). Additionally, there was no clear E. cheiranthoides homolog of GLUCORAPHASATIN SYNTHASE 1 (GRS1, Figure 4—figure supplement 9), a 2-oxoglutarate-dependent dioxygenase that contributes to glucoraphasatin (4-methylthio-3-butenyl glucosinolate) biosynthesis in R. sativus (Kakizaki et al., 2017).

Figure 4 with 12 supplements see all

Download asset Open asset

Identification of known aliphatic glucosinolate biosynthetic genes and glucosinolate-modifying genes from Arabidopsis in *Erysimum cheiranthoides.*

Aliphatic glucosinolates are synthesized from methionine by a series of enzymes, while additional enzymes are responsible for aliphatic glucosinolate modifications (black box). Red square brackets indicate where gene copy numbers differ between Arabidopsis and *E. cheiranthoides*, or where gene copies could not be matched unambiguously between species. Glucosinolates with names highlighted in blue were identified in *Erysimum cheiranthoides* var. *Elbtalaue.* Abbreviations: branched-chain aminotransferase (BCAT); bile acid transporter (BAT); methylthioalkylmalate synthase (MAM); isopropylmalate isomerase (IPMI); large subunit (LSU); small subunit (SSU); isopropylmalate dehydrogenase(IPMDH); cytochrome P450 monooxygenase (CYP); glutathione S-transferase F (GSTF); glutathione S-transferase Tau (GSTU); glutathione (GSH); γ-glutamyl peptidase 1 (GGP1); SUPERROOT 1 C-S lyase (SUR1); UDP-dependent glycosyltransferase (UGT); sulfotransferase (SOT); flavin monooxygenase (FMO); glucosinolate oxoglutarate-dependent dioxygenase (AOP); 3-butenyl glucosinolate 2-hydroxylase (GS-OH).

In response to insect feeding or pathogen infection, glucosinolates are activated by myrosinase enzymes (Halkier and Gershenzon, 2006). Between-gene phylogenetic comparisons revealed that homologs of known Arabidopsis myrosinases, the main foliar myrosinases TGG1 and TGG2 (Xue et al., 1995; Barth and Jander, 2006), root-expressed TGG4 and TGG5 (Andersson et al., 2009), and likely pseudogenes TGG3 and TGG6 (Rask et al., 2000; Zhang et al., 2002), were also present in the E. cheiranthoides genome (Figure 4—figure supplement 10). Additionally, we found homologs of the more distantly related Arabidopsis myrosinases PEN2 (Bednarek et al., 2009; Clay et al., 2009) and PYK10 (Sherameti et al., 2008; Nakano et al., 2017). In Arabidopsis, protein products of epithiospecifier protein (ESP), epithiospecifier modifier (ESM), and nitrile specifier protein (NSP) direct glucosinolate breakdown into nitriles, thiocyanates, or isothiocyanates (Lambrix et al., 2001; Burow et al., 2006; Zhang et al., 2006). Although we did not measure glucosinolate breakdown in Erysimum, we did find ESP, ESM, and NSP homologs in the E. cheiranthoides genome (Figure 4—figure supplements 11 and 12). Therefore, the pathway of glucosinolate activation appears to be largely conserved between Arabidopsis and E. cheiranthoides.

Phylogenetic relationship of 48 Erysimum species

Assemblies of transcriptomes from 48 Erysimum species (including E. cheiranthoides) had N50 values ranging from 574 to 2,160 bp (Supplementary file 3). Transcriptome assemblies contained completed genes from 54–94% of the BUSCO set and coding sequence lengths were generally shorter on average than the E. cheiranthoides coding sequence lengths (Supplementary file 3). Transcriptome sequences were deposited under GenBank project ID PRJNA563696 and at www.erysimum.org. The large number of orthologous gene sequences identified among the E. cheiranthoides genome and the 48 transcriptomes resulted in an ASTRAL species tree with high posterior probabilities for most nodes (Figure 5—figure supplement 1, Supplementary file 4). To determine divergence times among the 48 species, we generated a chronogram using a concatenated ExaML species tree with branch length information (Figure 5). While we relied on published estimates to constrain ages of several internal nodes, our analysis aligns well with a recent, rapid radiation of the species included in our study within the last 2–4 Mya (Figure 5).

Figure 5 with 4 supplements see all

Download asset Open asset

Genome-guided concatenated phylogeny of 48 *Erysimum* species.

Phylogenetic relationships were inferred from 9868 orthologous genes using ExaML with *Arabidopsis thaliana* as outgroup. Node depth corresponds to divergence time in million years. Pie charts on each internode show concordance factors, with gray segments corresponding to the proportion of gene tree supporting the main topology. Nodes are labelled as 1 to 47; see Supplementary file 5 for concordance factor values and number of decisive trees of each node. Four nodes were constrained using published divergence time estimates, with the range of constraints indicated by gray bars. Known polyploid species are highlighted in red. Approximate geographic range of species is provided in parentheses. The horticultural species *E. cheiri* and the weedy species *E. cheiranthoides* and *E. repandum* are of European origin but are now widespread across the Northern Hemisphere. Clades of species from shared geographic origins are highlighted in different colors. On the right, pictures of rosettes of a representative subset of species is provided to highlight the morphological diversity within this genus. Plants are of the same age and relative size differences are conserved in the pictures.

The concatenated ExaML species tree and the ASTRAL species tree shared overall similar topologies, but very short internal branch lengths on both trees indicated high levels of gene tree discordance. We further dissected this discordance by assessing support of the main topology of the ASTRAL species tree (Figure 5—figure supplement 1) using quartet scores, which compare the main tree topology relative to its first and second alternative topology. For most nodes in the ingroup, each topology had quartet scores near the minimum value of 1/3 (Supplementary file 4), indicating that the possible gene trees were present in almost equal frequency for each topology. For the ExaML tree, we assessed discordance at each node using concordance factors, which are the proportion of gene trees that agree with the main topology (Figure 5). Again, most nodes in the ingroup had very low support for the main topology (<5% of gene tree agreement; Figure 5, Supplementary file 5). This suggests that many internal branches had lengths near 0, indicating polytomies that could not be resolved even with the extensive sampling of gene sequences from transcriptome data. These high levels of discordance were likely caused by frequent polyploidization (Figure 5), incomplete lineage sorting, and high degrees of hybridization. A high prevalence of hybridization was further indicated by a high frequency of gamma scores (hybridization proportion) between 0.3 and 0.7 across all ingroup taxa (Figure 5—figure supplement 2) recovered in the HyDe analysis (Blischak et al., 2018).

Despite extensive levels of discordance and low agreement of individual gene trees with the species trees, the main topologies of the ExaML and ASTRAL species trees revealed geographic clades that matched the generally limited native species ranges. The three Mediterranean annual species E. incanum (INC), E. repandum (REP), and E. wilczekianum (WIC) formed a well-supported monophyletic sister clade to all other sequenced species (Figure 5, Figure 5—figure supplement 1). The only other annual in the set of sampled species, E. cheiranthoides (ECE), was part of a weakly-supported clade (high posterior probability but low concordance), comprised of several perennial species from Greece and central Europe, including the widespread ornamental E. cheiri (CHR). Species from the Iberian peninsula/Morocco, North America, and Iran formed additional, weakly-supported clades conserved between species trees (Figure 5), while another clade of Turkish and Greek Erysimum species was only monophyletic in the ASTRAL species tree (Figure 5—figure supplement 1). The clear geographic structure in the main topologies of the species trees was confirmed by a strong correlation between the cophenetic and geographic distance matrices for the subset of 43 species with geographic information (Mantel test, p<0.001).

Glucosinolate diversity and myrosinase activity

Across the 48 Erysimum species, we identified 25 candidate glucosinolate compounds with distinct molecular masses and HPLC retention times (Supplementary file 6). Of these, 24 compounds could be assigned to known glucosinolate structures with high certainty. The last remaining compound appeared to be an unknown isomer of glucocheirolin. Individual Erysimum species produced between 5 and 18 glucosinolates (Figure 6A), and total glucosinolate concentrations were highly variable among species (Figure 6B). The ploidy level of species explained a significant fraction of total variation in the number of glucosinolates produced (F_4,38 = 4.63, p=0.004), with hexaploid species producing the highest number of compounds (Figure 6—figure supplement 1). However, neither the number of distinct glucosinolate compounds nor their total concentrations exhibited a phylogenetic signal, and related species were less similar than expected under a model of Brownian motion (Table 2).

Figure 6 with 2 supplements see all

Download asset Open asset

Mean defense traits of 48 *Erysimum* species, grouped by phylogenetic relatedness.

Not all traits could be quantified for all species. (A) Total number of glucosinolate compounds detected in each species. (B) Total glucosinolate concentration found in each species, quantified by total ion intensity in the mass spectrometry analyses. Values are means ±1 SE. (C) Quantification of glucosinolate-activating myrosinase activity. Enzyme kinetics were quantified against the standard glucosinolate sinigrin (2-propenyl glucosinolate) and are expressed per unit fresh plant tissue. Values are means ±1 SE. (D) Total number of cardenolide compounds detected in each species. (E) Total cardenolide concentrations found in each species, quantified by total ion intensity in mass spectrometry analyses. Values are means ±1 SE.

Figure 6—source data 1 Species means and standard errors (where applicable) for number of glucosinolate/cardenolide compounds, total compound concentrations, and myrosinase activity.: https://cdn.elifesciences.org/articles/51712/elife-51712-fig6-data1-v2.txt
Download elife-51712-fig6-data1-v2.txt

Table 2

Measure of phylogenetic signal for total defensive traits and principal coordinates of the cardenolide and glucosinolate dissimilarity matrices (PCO) using Blomberg’s K. Significant values (p<0.05) are highlighted in bold.

Plant trait	K statistics	p-value (10,000 simulations)
Glucosinolate PCO1 (18.8%)	0.86	0.038
Glucosinolate PCO2 (13.6%)	0.80	0.090
Total glucosinolate concentrations	0.81	0.076
Number of glucosinolate compounds	0.89	0.014
Myrosinase activity	0.88	0.038
Cardenolide PCO1 (16.6%)	1.79	<0.001
Cardenolide PCO2 (12.2%)	1.04	0.002
Total cardenolide concentrations	1.03	0.015
Number of cardenolide compounds	1.25	<0.001

Clustering species by dissimilarities in glucosinolate profiles mostly resulted in chemotype groups corresponding to known underlying biosynthetic genes, although support for individual species clusters in the chemogram was variable (Figure 7). The majority of all species produced glucoiberin as the primary glucosinolate. Of these, approximately half also produced sinigrin as a second dominant glucosinolate compound. Further chemotypic subdivision, related to the production of glucocheirolin and 2-hydroxypropyl glucosinolate, appeared to be present but only had relatively weak statistical support. However, eight species clearly differed from these general patterns. The species E. allionii (ALI), E. rhaeticum (RHA), and E. scoparium (SCO) mostly lacked glucosinolates with 3-carbon side-chains, but instead accumulated glucosinolates with 4-, 5- and 6-carbon side-chains. The two closely related species E. odoratum (ODO) and E. witmannii (WIT) predominantly accumulated indole glucosinolates, while E. collinum (COL), E. pulchellum (PUL), and accession ER2 predominantly produced glucoerypestrin (3-methoxycarbonylpropyl glucosinolate), a glucosinolate that is exclusively found within Erysimum (Fahey et al., 2001).

Figure 7

Download asset Open asset

Glucosinolate compound diversity and abundance across 48 *Erysimum* species.

(A) Chemogram clustering species by dissimilarities in glucosinolate profiles. Values at nodes are confidence estimates (approximately unbiased probability value, function *pvclust* in R) based on 10,000 iterations of multiscale bootstrap resampling. (B) Heatmap of glucosinolate profiles expressed by the 48 *Erysimum* species. Color intensity corresponds to log-transformed integrated ion counts recorded at the exact parental mass ([M-H]^-) for each compound, averaged across samples from multiple independent experiments. Compounds are grouped by major biosynthetic classes and labelled using systematic short names. See Supplementary file 6 for full glucosinolate names and additional compound information. (C) Classification of species chemotype based on predominant glucosinolate compounds. 3C/4C/5C = length of carbon side chain, MSI = methylsulfinyl glucosinolate, MSO = methylsulfonyl glucosinolate, OH = side chain with hydroxy group, ALK = side chain with alkenyl group, CARB = carboxylic glucosinolate, IND = indole glucosinolate.

Similar to the lack of phylogenetic signal for compound numbers and concentrations, dissimilarity in glucosinolate profiles was unrelated to phylogenetic relatedness (Mantel test, p=0.331), and neither of the first two principal coordinates of the glucosinolate dissimilarity matrix showed a significant phylogenetic signal (Table 2). The lack of phylogenetic signal was visualized by optimizing vertical matching of tips between the ExaML species tree and the glucosinolate chemogram (Figure 5—figure supplement 3A). For five species pairs, the closest phylogenetic neighbor was also the most chemically similar species, but in general, close relatives more often belonged to chemically distant species clusters. Finally, reconstruction of the ancestral states for total glucosinolate content and the first principal coordinate of the glucosinolate dissimilarity matrix suggests that both traits likely originated at intermediate levels and repeatedly evolved towards opposite extremes in closely related species (Figure 5—figure supplement 4).

As glucosinolates require activation by myrosinase enzymes upon tissue damage by herbivores, myrosinase activity in leaf tissue determines the rate at which toxins are released. We quantified myrosinase activity of Erysimum leaf extracts and found it to be highly variable among species (Figure 6C). After grouping species into nine chemotypes defined by chemical dissimilarity and the production of characteristic glucosinolate compounds (Figure 7C), we found that myrosinase activity significantly differed among these chemotypes (Figure 8, F_8,33 = 8.31, p<0.001). Chemotypes that predominantly accumulated methylsulfonyl glucosinolates, hydroxy glucosinolates, or indole glucosinolates had low to negligible activity against the assayed glucosinolate sinigrin. It is important to note that sinigrin is an alkenyl glucosinolate and Erysimum myrosinases targeting other, structurally dissimilar glucosinolates may not effectively cleave sinigrin. After chemotype differences were accounted for, myrosinase activity was marginally related to total glucosinolate concentrations (F_1,33 = 3.60, p=0.067). Similar to other glucosinolate traits, uncorrected myrosinase activity exhibited no phylogenetic signal (Table 2).

Figure 8

Download asset Open asset

Myrosinase activity of leaf extracts from 43 *Erysimum* species, grouped by glucosinolate chemotype.

Open circles are species means and black diamonds are chemotype means ± 1 SE. See also Figure 7 for chemotype information. 3C/4C/5C = length of carbon side chain, MSI = methylsulfinyl glucosinolate, MSO = methylsulfonyl glucosinolate, OH = side chain with hydroxy group, ALK = side chain with alkenyl group, CARB = carboxylic glucosinolate, IND = indole glucosinolate.

Figure 8—source data 1 Species means for myrosinase activity and glucosinolate chemotype classification.: https://cdn.elifesciences.org/articles/51712/elife-51712-fig8-data1-v2.txt
Download elife-51712-fig8-data1-v2.txt

Cardenolide diversity

With the exception of E. collinum (COL), which only contained trace amounts of cardenolides in leaves, all Erysimum species contained diverse mixtures of cardenolide compounds and accumulated considerable amounts of cardenolides (Figure 6D–E). The ploidy level of species again explained a significant fraction of the total variation in the number of cardenolides (F_4,38 = 3.47, p=0.016), with hexaploid species producing the highest average number of compounds (Figure 6—figure supplement 1). To obtain an estimate of biological activity and evaluate quantification from total MS ion counts, we used an established assay that quantifies cardenolide concentrations from specific inhibition of animal Na⁺/K⁺-ATPase by crude Erysimum leaf extracts. We found generally strong enzymatic inhibition, with leaves of Erysimum species on average containing an equivalent of 5.72 ± 0.12 µg mg⁻¹ (±1 SE) of the reference cardenolide ouabain. Despite only producing trace amounts of cardenolides, E. collinum (COL) extracts caused significantly stronger inhibition than the Brassicaceae control plant, S. arvensis (Figure 6—figure supplement 2). Overall, quantification of cardenolide concentrations by Na⁺/K⁺-ATPase inhibition was highly correlated with the total MS ion count (Figure 6—figure supplement 2, r = 0.95, p<0.001). Thus, the use of ion count data for cross-species comparisons was appropriate for this purpose. Both the total numbers of compounds and the total abundances exhibited a strong phylogenetic signal (Table 2), indicating that closely related species shared similar cardenolide traits.

Cardenolide diversity was considerably higher than that of glucosinolates, with a total of 97 distinguishable candidate cardenolide compounds identified across the 48 Erysimum species (two compounds were later excluded, leaving 95 compounds; Supplementary file 7). Of these, 46 compounds had distinct molecular masses and mass fragments, while the remaining compounds likely were isomers, sharing a molecular mass with other compounds but having distinct HPLC retention times. The 95 putative cardenolides comprised nine distinct genins (Figure 9, Figure 9—figure supplement 1), the majority of which were glycosylated with digitoxose, deoxy hexoses, xylose, or glucose moieties. Only digitoxigenin and cannogenol accumulated as free genins, while all other compounds occurred as either mono- or di-glycosides. A likely major source of isomeric cardenolide compounds was thus the incorporation of different deoxy hexoses of equivalent mass, such as rhamnose, fucose, or gulomethylose. A subset of compounds had molecular masses that were heavier by 42.011 m/z than known mono- or di-glycoside cardenolides. Such a gain in mass corresponds to the gain of an acetyl-group, and mass fragmentation patterns indicated that these compounds were acetylated on the first sugar moiety (Supplementary file 7). Out of the nine detected genins, six had previously been described from Erysimum species (Makarevich et al., 1994). In addition, we identified three previously undescribed mass features with fragmentation patterns characteristic of cardenolide genins (Figure 9—figure supplement 1). Of these three, one matched an acetylated cannogenol, one matched formylated cannogenol, and one matched formylated nigrescigenin, assuming acetylation/formylation of a free OH-group on the precursor molecule. Formyl adducts can sometimes be formed during LC-MS due to the addition of formic acid to solvents, although this is less common with positive ionization. To exclude the possibility that these were technical artefacts, we analyzed a subset of extracts by LC-MS without the addition of formic acid and found both formyl-genins at comparable concentrations (Figure 9—figure supplement 2). We therefore assume that all three novel structures are natural variants of cardenolides produced by Erysimum plants, even though we currently lack final structural elucidation.

Figure 9 with 2 supplements see all

Download asset Open asset

Predicted pathways of cardenolide genin modification in *Erysimum*.

Pathways are linearized for simplicity but more likely form a complex network. Genin diversity likely originates from digitoxigenin, which is transformed into more stucturally complex cardenolides by hydroxylases, dehydrogenases, and formyl-, methyl-, or acetyltransferases. Acetyl-cannogenol could be derived directly from cannogenol or from formyl-cannogenol, with the frequent co-occurrence of acetyl-cannogenol and formyl-cannogenol in leaf extracts suggesting the latter. According to their exact mass, frequently detected dihydroxy-digitoxigenin compounds (C₂₃H₃₄O₆) could be either bipindogenin or strophanthidol (grayed out). While bipindogenin cardenolides have commonly been reported for *Erysimum* species in the literature, their structure would require additional intermediates that have not been detected (n.d.). Thus, strophanthidol appears to be the more likely isomer to occur in *Erysimum*. All cardenolide genins are further modified by glycosylation at a conserved position in the molecule (R). Note that all structures are putative, and particularly formyl- and acetyl-modifications could be attached to any free OH-group.

Clustering of species by dissimilarities in cardenolide profiles revealed fewer obvious species clusters in the chemogram than for glucosinolates, and particularly higher-level species clusters had only weak statistical support (Figure 10). A clear exception to this was a species cluster that included E. cheiranthoides (ECE) and E. sylvestre (SYL), which were characterized by a chemotype lacking several otherwise common cannogenol- and strophanthidin-glycosides, while accumulating unique digitoxigenin-glycosides. A second major cluster was visually apparent, yet not statistically significant, and separated groups of species that did or did not produce glycosides of the newly discovered putative formyl-nigrescigenin (Figure 10). Similarity in cardenolide profiles among species was strongly correlated with phylogenetic relatedness (Mantel test, p<0.001), and the first two principal coordinates of the Bray-Curtis dissimilarity matrix exhibited strong phylogenetic signals (Table 2). Closely related species were therefore not only more similar in their total cardenolide concentrations, but also had more similar cardenolide profiles than expected by chance. These results were again visualized by optimizing vertical matching of tips between the ExaML species tree and the cardenolide chemogram (Figure 5—figure supplement 3B). For twelve species pairs (half of all species in our phylogeny), the closest phylogenetic neighbor was also the most chemically similar species, and phylogenetically related species more commonly belonged to chemically similar species clusters, indicated by a significantly lower total length of tip links compared to what was observed for glucosinolates (Figure 5—figure supplement 3B). Reconstruction of the ancestral states for total cardenolide content suggests that trait values likely originated at low total concentrations, and increased to intermediate levels in the North American and Spanish/Moroccan clades, and independently to very high levels in the species pair of E. horizontale and E. crepidifolium (Figure 5—figure supplement 4). For the first principal coordinate of cardenolide dissimilarity, trait values originated at intermediate values, but sub-clades more commonly evolved towards shared chemical profiles than was the case for glucosinolates (Figure 5—figure supplement 4).

Figure 10

Download asset Open asset

Cardenolide compound diversity and abundance accross 48 *Erysimum* species.

(A) Chemogram clustering species by dissimilarities in cardenolide profiles. Values at nodes are confidence estimates (approximately unbiased probability value, function *pvclust* in R) based on 10,000 iterations of multiscale bootstrap resampling. (B) Heatmap of cardenolide profiles expressed by the 48 *Erysimum* species. Color intensity corresponds to log-transformed integrated ion counts recorded at the exact parental mass ([M+H]⁺ or [M+Na]⁺, whichever was more abundant) for each compound, averaged across samples from multiple independent experiments. The species *E. collinum* (COL) only expressed trace amounts of cardenolides, which are not visible on the color scale. Compounds are grouped by shared genin structures. Cgi. = Cannogenin, For-can.=Formyl cannogenol, Ac-can.=Acetyl cannogenol. See Supplementary file 7 for additional compound information.

Macroevolutionary patterns in defense and inducibility

Similarity in glucosinolate and cardenolide chemical profiles of the 48 species was not correlated (Mantel test, p=0.171), and neither the number of compounds (PGLS: F_1,46 = 0.09, p=0.771) nor their total concentrations (PGLS: F_1,46 = 0.51, p=0.478) were correlated between compound classes. Tip-specific estimates of speciation rates were not correlated with the number of glucosinolate compounds produced by a species, regardless of speciation rate metric or statistical method used (Table 3). In contrast, we found a significantly positive correlation between the node density (ND) measure and the number of cardenolide compounds, while for the alternate equal split (ES) measure the correlation was marginally significant for the simulation-based method only (Table 3). Given the correlation coefficients of the simulation-based method, variation in the number of cardenolide compounds thus explained 17–28% of the total variation in speciation rate. Variation in total glucosinolate or cardenolide concentrations was not correlated with speciation rates (Table 3).

Table 3

Correlations between plant traits and tip-specific speciation rates estimated from the main ExaML species tree.

Each trait is correlated against two estimates of speciation rates using phylogenetic least squares (PGLS) and a simulation-based method (SIM). Node density (ND) and equal split (ES) estimates of speciation rates are strongly correlated (r = 0.767, p<0.001) but differ in relative weighting of recent and more distant evolutionary history. For 1000 sets of randomly generated traits, only three resulted in more than one significant correlation, suggesting that multiple significant tests per trait (bold) are unlikely to arise by chance.

	Nd-pgls	Nd-sim	Es-pgls	Es-sim
Total glucosinolate concentrations	F_1,46 = 0.79, p=0.379	r_Pearson = 0.29, p=0.300	F_1,46 = 0.00, p=0.990	r_Pearson = 0.09, p=0.737
Number of glucosinolate compounds	F_1,46 = 1.16, p=0.286	r_Pearson = 0.23, p=0.412	F_1,46 = 0.87, p=0.356	r_Pearson = 0.14, p=0.603
Total cardenolide concentrations	F_1,46 = 0.29, p=0.593	r_Pearson = 0.25, p=0.373	F_1,46 = 0.01, p=0.908	r_Pearson = 0.23, p=0.398
Number of cardenolide compounds	F_1,46 = 5.87, p=0.019	r_Pearson = 0.53, p=0.030	F_1,46 = 1.93, p=0.17	r_Pearson = 0.42, p=0.093

Foliar application of JA was expected to stimulate accumulation of defensive compounds in plant leaves and among the 30 tested species, glucosinolate levels responded positively to JA, with the majority of species increasing their foliar glucosinolate concentration (Figure 11). However, the glucosinolate inducibility of a species was independent of constitutive glucosinolate levels (PGLS: F_1,28 = 0.17, p=0.680). By contrast, the majority of species exhibited lower cardenolide levels in response to JA, resulting in lack of inducibility across species (Figure 11). The species E. crepidifolium (CRE) heavily influenced inducibility patterns, as it not only had three times higher constitutive concentrations of cardenolides than any other Erysimum species, but also markedly increased both glucosinolate and cardenolide concentrations in response to JA treatment (Figure 11). When this outlier species was removed, inducibility (or suppression) of foliar cardenolides was not correlated with constitutive cardenolide levels (PGLS: F_1,27 = 0.20, p=0.657), and inducibilities of glucosinolates and cardenolides were likewise not correlated with each other (PGLS: F_1,27 = 0.36, p=0.551).

Figure 11

Download asset Open asset

Inducibility of foliar glucosinolates and cardenolides in response to exogenous application of jasmonic acid (JA), expressed as absolute differences in total mass intensity between JA-treated and control plants.

Circles are species means, based on single pooled samples of multiple individual plants. The filled triangle is the average inducibility of all measured species with 95% confidence interval. Non-overlap with zero (dashed lines) corresponds to a significant effect. The species in the upper right corner is *E. crepidifolium*, an outlier and strong inducer of both glucosinolates and cardenolides.

Figure 11—source data 1 Species means for absolute inducibility of total glucosinolate and cardenolide concentrations.: https://cdn.elifesciences.org/articles/51712/elife-51712-fig11-data1-v2.txt
Download elife-51712-fig11-data1-v2.txt

Discussion

The genus Erysimum is a fascinating model system of phytochemical diversification that combines two potent classes of chemical defenses in the same plants. The assembled genome of the rapid-cycling annual plant E. cheiranthoides allowed us to identify almost the full set of genes involved in E. cheiranthoides glucosinolate biosynthesis, myrosinase expression, and breakdown product modification. This genome (GenBank project ID PRJNA563696, www.erysimum.org) will facilitate further identification of glucosinolate genes unique to Erysimum and represents a central resource for the identification of cardenolide biosynthesis genes in this emerging model system, as well as for future functional and evolutionary studies in the Brassicaceae.

The extant species diversity in the genus Erysimum is the result of a rapid radiation (Moazzeni et al., 2014), with our own estimate supporting an evolutionary recent onset of radiation within the last 2–4 Mya. In fact, as our approach for phylogeny construction did not account for heterozygosity levels of species, coalescence times are included in our divergence time estimates (Edwards and Beerli, 2000), thereby likely inflating age estimates for most speciation events. This may explain why we found no evidence for very recent speciation events (most recent event estimated at 1.24 Mya), while others have estimated significantly younger ages for at least some of the same events (Moazzeni et al., 2014, F. Perfectti, unpublished data).

All but one species in our study produced evolutionary novel cardenolides, while the likely closest relatives – the genera Malcolmia, Physaria or Arabidopsis (there is some disagreement among studies; Moazzeni et al., 2014; Huang et al., 2016) – almost certainly lack these defenses (Jaretzky and Wilcke, 1932; Hegnauer, 1964). The onset of diversification in Erysimum thus appears to coincide with the gain of the cardenolide defense trait, while the number of cardenolide compounds produced was positively correlated with tip-specific speciation rates of the main species tree, providing at least weak evidence that speciation and cardenolide diversification are linked in this genus. Even though most species co-expressed two different classes of potentially costly defenses, there was no evidence for a trade-off between glucosinolates and cardenolides, and both groups of traits varied independently.

Potentially costly, obsolete defenses are expected to be selected against and should disappear over evolutionary time. For example, cardenolides in the genus Asclepias and alkaloids across the Apocynaceae decrease in concentration with speciation, consistent with co-evolutionary de-escalation in response to specialized, sequestering herbivores (Agrawal and Fishbein, 2008; Livshultz et al., 2018). However, it appears that both glucosinolates and cardenolides provide a defensive function in Erysimum: glucosinolates may be maintained as highly efficient defenses against generalist herbivores (Kerwin et al., 2015), whereas cardenolides may be functionally relevant against glucosinolate-specialized herbivores (Chew, 1975; Chew, 1977; Wiklund et al., 1978; Renwick et al., 1989; Dimock et al., 1991).

As further evidence for the distinct roles of glucosinolates and cardenolides, the two defenses responded differently to exogenous JA application. Glucosinolate concentrations were upregulated in response to JA in the majority of species, with an average 52% increase relative to untreated controls. This is similar to the inducibility of glucosinolates reported for other Brassicaceae species (Textor and Gershenzon, 2009), suggesting that glucosinolate defense signaling remains unaffected by the presence of cardenolides in Erysimum plants. In contrast, cardenolide levels were not inducible or were even suppressed in response to exogenous application of JA in almost all tested species, suggesting that inducibility of cardenolides is not a general strategy of Erysimum. In the more commonly studied milkweeds (Asclepias spp., Apocynaceae), cardenolides are usually inducible in response to herbivore stimuli (Rasmann et al., 2009; Bingham and Agrawal, 2010), but cardenolide suppression is also common, particularly in plants with high constitutive cardenolide concentrations (Bingham and Agrawal, 2010; Rasmann and Agrawal, 2011). Milkweed plants are attacked by a rich community of cardenolide-specialized herbivores (Dobler et al., 2012), likely making this defensive plasticity adaptive for these plants. The lack of cardenolide inducibility in Erysimum could therefore indicate a lack of widespread cardenolide-specialized herbivores that might otherwise select against high constitutive cardenolide levels.

Phylogenetic relationships and phytochemical similarity

The genus Erysimum poses considerable phylogenetic challenges, with its evolutionary recent radiation resulting in a large number of hybridizing species and high prevalence of polyploidization (Polatschek and Snogerup, 2002; Marhold and Lihová, 2006; Al-Shehbaz, 2010). Both previous partial phylogenies of the genus, constructed from internal transcribed spacer (ITS) or chloroplast sequences, consequently struggled to resolve polytomies among species (Gómez et al., 2014; Moazzeni et al., 2014).

Here, we attempted to construct a better-resolved phylogeny from 9868 orthologous genes extracted from transcriptome sequences. However, while our species tree provided good posterior probabilities for all nodes, it also revealed very high levels of gene discordance. Several internal nodes of our species tree topology were supported by less than 1% of all gene trees, and only the most recent branching events were supported by more than 10% of gene trees. High levels of discordance, likely driven by introgression and incomplete lineage sorting, are common during ongoing species radiations, with many recent plant examples reporting similar findings (Novikova et al., 2016; Pease et al., 2016; Copetti et al., 2017; Wu et al., 2018). The abundance of polyploid species in our phylogeny (at least 21 out of 48, Figure 5) may have further exacerbated levels of discordance. Specifically, if these are allopolyploid rather than autopolyploid species, discordance could be introduced by our methodological approach for gene selection, which randomly retained only a single copy for each identified orthologous gene with multiple copies. This same problem could also have inflated the estimation of hybridization by our HyDe analysis, although a high rate of gene flow is likely, at least among geographically close species (Abdelaziz et al., 2014). More fundamentally, the high levels of gene discordance also highlight the limitations of simple bifurcating species trees to represent the significantly more complicated network of splits and reticulate evolutionary events that is likely the true history of the genus Erysimum (Marhold and Lihová, 2006). However, while we have been unsuccessful in reconstructing the exact evolutionary history of the genus, we nevertheless believe our species tree captures key aspects of its phylogenetic relationships, particularly in respect to closely related species and geographic clades.

In our species tree, the three annual species, E. repandum, E. incanum, and E. wilczekianum grouped together as a well-supported monophyletic clade sister to all other Erysimum species. These species co-occur geographically with several perennial Erysimum species, but they are largely isolated by non-overlapping flowering times. Further separate clades were present for species from Iran, North America, and a combined clade of species from Spain, Morocco, and the Canary Islands, while the remaining species grouped into four central and eastern European clades. Within the Spanish clade, species from southeastern Spain exhibited closer relatedness to Moroccan species than to species from northeastern or northwestern Spain, loosely matching more fine-scale evaluations of species relatedness in this region (Abdelaziz et al., 2014). Therefore, even though none of these clades were supported by high gene concordance factors, the main topology of our species tree captured an apparently meaningful pattern of closely related species commonly occurring in close geographic proximity.

Clustering of species by dissimilarities in glucosinolate profiles revealed distinct groups of chemically similar species, largely corresponding to nine chemotypes determined by the predicted function of few major-effect glucosinolate genes. However, we found no phylogenetic signal for chemical similarity, compound number, or total concentrations of glucosinolates (Blomberg’s K < 1 for all traits). Closely related species thus appear to be less likely to share similar glucosinolate chemotypes. In a comparative study of 30 Strepthanthus species, Cacho et al. (2015) reported similarly low values for Blomberg’s K for most glucosinolate traits, suggesting that this may be a general pattern in glucosinolate diversity.

In contrast, clustering of species by dissimilarities in cardenolide profiles revealed fewer distinct groups of chemically similar species, suggesting that the considerably more complex cardenolide chemotypes may be controlled by many minor-effect genes. However, we found strong phylogenetic signals for chemical similarity, compound numbers, and total concentrations of cardenolides (Blomberg’s K > 1). Closely related species were therefore more likely to share similar cardenolide chemotypes. Given the high levels of gene discordance in our phylogeny, it seems probable that at least the glucosinolate results may have been affected by hemiplasy, i.e., a phenomenon where a trait is determined by genes whose topologies does not match the species tree (Avise and Robinson, 2008; Pease et al., 2016; Wu et al., 2018). Hemiplasy may result in the overestimation of a trait’s evolutionary rate (Mendes et al., 2018), and results must be evaluated with caution. However, even though trait evolution within Erysimum almost certainly has been significantly more complex than can be represented by a simple bifurcating species tree (Novikova et al., 2016; Pease et al., 2016; Wu et al., 2018), we believe our results remain robust in three key points. First, geographically close Erysimum species tend to be more closely related, likely because geographic proximity may facilitate gene flow. Indeed, geographic signatures were present in both previously published phylogenies (Gómez et al., 2014; Moazzeni et al., 2014) as well as in our tree. Second, geographically close, related species share similar cardenolide but not glucosinolate phenotypes. Third, these distinct patterns in chemical classes indicate distinct evolutionary mechanisms for glucosinolate and cardenolide defenses, with a small number of major-effect glucosinolate genes evolving independently of a putatively larger set of minor-effect cardenolide genes.

Phytochemical diversity

Despite vast morphological differences among sampled Erysimum species, the diversity in glucosinolate profiles across species was relatively limited, with a total of 25 different glucosinolate compounds detected. Even though this diversity may be further amplified by enzymes that direct glucosinolates into different toxic products upon activation (not measured here; Lambrix et al., 2001; Burow et al., 2006; Zhang et al., 2006), intraspecific glucosinolate diversity of Arabidopsis alone is significantly higher, with more than 30 compounds reported across a range of accessions (Kliebenstein et al., 2001a). In contrast, an evaluation of 30 Streptanthus species revealed a similarly low total number of 35 glucosinolate compounds (Cacho et al., 2015). The intraspecific diversity of Arabidopsis may therefore not be typical for all plants of the Brassicaceae. Additional broadly comparative studies of glucosinolate diversity in other Brassicaceae species are needed to provide a reliable ‘baseline’ for glucosinolate diversity. Importantly, we intentionally ignored intraspecific chemical diversity as we lacked genetically diverse seed material for most species. However, a preliminary screening of multiple E. cheiranthoides accessions suggests that there is little to no variation in glucosinolate profiles within this one species, while there is considerable variation in cardenolide profiles (T. Züst, unpublished data).

The majority of Erysimum species produced glucoiberin as their main glucosinolate. Aliphatic glucosinolates such as glucoiberin are derived from methionine in a process that involves elongation and modification of a variable side-chain (Halkier and Gershenzon, 2006), and in this context the 3-carbon glucosinolate glucoiberin is one of the least biosynthetically complex glucosinolates. However, the potential to produce additional aliphatic glucosinolates with longer side chains clearly exists in the genus, as 4-, 5-, and 6-carbon glucosinolates with more complex modifications were scattered across the phylogeny. A few species produced glucosinolates that are not found in Arabidopsis, including a sub-class of aliphatic glucosinolates, the methylsulfonyl glucosinolates. The homolog of GS-OH, which in Arabidopsis forms 2-hydroxy-but-3-enyl glucosinolate from 3-butenyl glucosinolate, does not have a clear function in E. cheiranthoides due to the lack of alkenyl glucosinolates. It is therefore possible that the GS-OH homolog in E. cheiranthoides may code for the unknown enzyme that hydroxylates 4-methylsulfonylbutyl glucosinolate to form 3-hydroxy-4-methylsulfonylbutyl glucosinolate (Figure 4, Figure 4—figure supplement 8). Methylsulfonyl glucosinolates are found in several Brassicaceae genera (Fahey et al., 2001), and glucocheirolin, the most abundant methylsulfonyl glucosinolate in Erysimum species, is only a weak egg-laying stimulant for the cabbage white butterfly (Pieris rapae), compared to other glucosinolates (Huang et al., 1993). Methylsulfonyl glucosinolates may thus represent a plant response to specialist herbivores that use plant defenses as host-finding cues.

The species E. pulchellum (PUL) and E. collinum (COL) from Turkey and Iran, respectively, accumulated glucoerypestrin as their main glucosinolate compound. This compound was first described in E. rupestre [syn. E. pulchellum, (Polatschek, 2011)] by Kjær et al. (1957) and to date has been found exclusively in plants of the genus Erysimum (Fahey et al., 2001). Radioactive labeling experiments indicated that glucoerypestrin is derived from a dicarboxylic amino acid, possibly 2-amino-5-methoxycarbonyl-pentanoic acid (Chisholm, 1973). Modification of the amino acid side chain during methionine-derived aliphatic glucosinolate biosynthesis as a pathway to glucoerypestrin is less likely, due to the lower specific incorporation of ¹⁴C-labeled methionine compared to ¹⁴C-labeled dicarboxylic acids into this compound (Chisholm, 1973). In any case, the gain of glucoerypestrin represents yet another evolutionary novelty in the Erysimum genus, but its relative toxicity and the adaptive benefits of its production have yet to be elucidated.

Myrosinase activity levels differed among glucosinolate chemotypes, and activity was positively correlated with glucosinolate abundance in plants when controlling for glucosinolate chemotype. Erysimum species that predominantly produced indole glucosinolates or 4-methylsulfinyl glucosinolates had negligible myrosinase activity against the assayed aliphatic glucosinolate sinigrin. Indole glucosinolates can be activated by PEN2 – a thioglucosidase that is more specific for indole glucosinolates (Bednarek et al., 2009; Clay et al., 2009) – or even break down in the absence of plant-derived myrosinase (Kim et al., 2008). The negligible activity in these species could therefore indicate the existence of selective pressures to tailor myrosinase expression to the type and concentrations of glucosinolates that are produced. Mirroring results for glucosinolate defenses, myrosinase activity was not more similar among related species, suggesting that the two components of glucosinolate defense evolve in concert.

We detected considerable amounts of the evolutionarily novel cardenolide defense in 47 out of 48 Erysimum species or accessions. Among the 95 likely cardenolide compounds detected, at least 22 had not been described previously in Erysimum. This metabolic diversity had three main sources: modification of the genin core structure, variation of the glycoside chain, or isomeric variation (e.g., through the incorporation of different isomeric sugars). Structural variation in cardenolides affects the relative inhibition of Na⁺/K⁺-ATPase (Dzimiri et al., 1987; Petschenka et al., 2018) and physiochemical properties such as lipophilicity, which play an important role in uptake and metabolism of plant metabolites by insects (Duffey, 1980). Individual Erysimum species produced between 15 and 50 different cardenolide compounds, and the comparison of quantification by total mass ion counts vs. quantification by inhibition of Na⁺/K⁺-ATPase revealed highly similar results. While both methods of quantification are only approximate, this correlation at least provides no obvious indication of vast differences in Na⁺/K⁺-ATPase inhibitory activity among Erysimum cardenolides.

The metabolic pathways involved in the biosynthesis and modification of cardenolides have yet to be elucidated (Kreis and Müller-Uri, 2010; Züst et al., 2018). Here, we propose a pathway for the modification of digitoxigenin, commonly assumed to be the least biosynthetically complex cardenolide (Kreis and Müller-Uri, 2010), into the eight structurally more complex genins found within Erysimum (Figure 9). Variation in glycoside chains is likely mediated by glycosyltransferases that act on the different genins. In the Brassicaceae genus Barbarea, plants produce saponin glycosides as an evolutionary novel defense, and a significant proportion of glycoside diversity in this system has been linked to the action of a small set of UDP glycosyltransferases (Erthmann et al., 2018). Similarly, through the joint action of genin-modifying enzymes and glycosyltransferases, a relatively small set of enzymes and corresponding genes could generate the vast cardenolide diversity found in the Erysimum genus. The identification and manipulation of these genes in different Erysimum species will make it possible to test the adaptive benefits of this structural diversity.

On average, leaves of Erysimum species contained cardenolides equivalent to 6 µg ouabain per mg dry leaf weight (estimated from Na⁺/K⁺-ATPase inhibition), placing them slightly above most species of the well-studied cardenolide-producing genus Asclepias (Rasmann and Agrawal, 2011). However, two species, E. collinum (COL) and E. crepidifolium (CRE), were clear outliers in terms of cardenolide content (Figure 6D–E). The almost complete absence of cardenolides in E. collinum (COL), which clustered phylogenetically with two other Middle Eastern species producing average concentrations of these compounds (E. crassipes [CSS] and E. crassicaule [CRA], Figure 5), likely represents a secondary loss of this trait in the course of evolution. This species also accumulated an evolutionary novel glucosinolate, glucoerypestrin (see above), which may have resulted in a shift in selective pressures that led to the loss of potentially costly cardenolide production. Conversely, E. crepidifolium (CRE) had cardenolide concentrations more than three times higher than any other tested Erysimum species (Figure 6E). This is consistent with the highly toxic nature of this species, which has the German vernacular name ‘Gänsesterbe’ (geese death) and has been associated with mortality in geese that consume the plant.

Whereas most species did not induce cardenolide accumulation in response to JA, E. crepidifolium (CRE) had a significant 48% increase. While not as extreme, this observation is similar to the results of Munkert et al. (2014), who reported a three-fold increase in cardenolide levels of E. crepidifolium in response to a high dose of methyl jasmonate. Plants use conserved transcriptional networks to continuously integrate signals from their environment and optimize allocation of resources to growth and defense (Havko et al., 2016). Thus, while these networks commonly govern hardwired responses (e.g., an attenuation of growth upon activation of JA signaling), they may nevertheless be altered by mutations at key nodes of the network (Campos et al., 2016). Given this relative flexibility in signaling networks, it is perhaps not surprising that the evolutionary novel cardenolides have been integrated into the defense signaling of Erysimum species to variable degrees. Investigating gene expression changes in the inducible E. crepidifolium relative to the non-inducing E. cheiranthoides may therefore provide valuable insights into the molecular regulation of this defense.

Conclusions

The study of the speciose genus Erysimum, which has two co-expressed chemical defense classes, revealed largely independent evolution of the ancestral and the novel defense. With no evidence for trade-offs between the structurally and biosynthetically unrelated defenses, the diversity, abundance, and inducibility of each class of defenses appears to be evolving independently in response to the unique selective environment of each individual species. The evolutionarily recent gain of novel cardenolides has resulted in a system in which no known specific adaptations to cardenolides have yet evolved de novo in insect herbivores, although general adaptations to toxic food may still allow herbivores to consume the plants. Erysimum is thus an excellent model system for phytochemical diversification, as it facilitates the study of coevolutionary adaptations in real time. Our current work provides the foundation for a more mechanistic evaluation of these processes, which promises to greatly improve our understanding of the role of phytochemical diversity in plant-insect interactions.

Share this article

Cite this article

Geographic location of Erysimum species source populations used for transcriptome sequencing.

Assembly metrics for the E. cheiranthoides genome: v0.9=Falcon +Arrow assembly results, v1.2=genome assembly after Hi-C scaffolding and Pilon correction.

Visualization of the E.cheiranthoides genome assembly.

Identification of known indole glucosinolate biosynthetic genes and glucosinolate-modifying genes from Arabidopsis in Erysimum cheiranthoides.

Identification of known aliphatic glucosinolate biosynthetic genes and glucosinolate-modifying genes from Arabidopsis in Erysimum cheiranthoides.

Genome-guided concatenated phylogeny of 48 Erysimum species.

Mean defense traits of 48 Erysimum species, grouped by phylogenetic relatedness.

Figure 6—source data 1

Measure of phylogenetic signal for total defensive traits and principal coordinates of the cardenolide and glucosinolate dissimilarity matrices (PCO) using Blomberg’s K. Significant values (p<0.05) are highlighted in bold.

Glucosinolate compound diversity and abundance across 48 Erysimum species.

Myrosinase activity of leaf extracts from 43 Erysimum species, grouped by glucosinolate chemotype.

Figure 8—source data 1

Predicted pathways of cardenolide genin modification in Erysimum.

Cardenolide compound diversity and abundance accross 48 Erysimum species.

Correlations between plant traits and tip-specific speciation rates estimated from the main ExaML species tree.

Inducibility of foliar glucosinolates and cardenolides in response to exogenous application of jasmonic acid (JA), expressed as absolute differences in total mass intensity between JA-treated and control plants.

Figure 11—source data 1

Author details

Tobias Züst

Contribution

For correspondence

Competing interests

Susan R Strickler

Contribution

Competing interests

Adrian F Powell

Contribution

Competing interests

Makenzie E Mabry

Contribution

Competing interests

Hong An

Contribution

Competing interests

Mahdieh Mirzaei

Contribution

Competing interests

Thomas York

Contribution

Competing interests

Cynthia K Holland

Present address

Contribution

Competing interests

Pavan Kumar

Present address

Contribution

Competing interests

Matthias Erb

Contribution

Competing interests

Georg Petschenka

Contribution

Competing interests

José-María Gómez

Contribution

Competing interests

Francisco Perfectti

Contribution

Competing interests

Caroline Müller

Contribution

Competing interests

J Chris Pires

Contribution

Competing interests

Lukas A Mueller

Contribution

Competing interests

Georg Jander

Contribution

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags