Yerba mate (Ilex paraguariensis) genome provides new insights into convergent evolution of caffeine biosynthesis

  1. Federico A Vignale  Is a corresponding author
  2. Andrea Hernandez Garcia
  3. Carlos P Modenutti
  4. Ezequiel J Sosa
  5. Lucas A Defelipe
  6. Renato Oliveira
  7. Gisele L Nunes
  8. Raúl M Acevedo
  9. German F Burguener
  10. Sebastian M Rossi
  11. Pedro D Zapata
  12. Dardo A Marti
  13. Pedro Sansberro
  14. Guilherme Oliveira
  15. Emily M Catania
  16. Madeline N Smith
  17. Nicole M Dubs
  18. Satish Nair
  19. Todd J Barkman  Is a corresponding author
  20. Adrian G Turjanski  Is a corresponding author
  1. European Molecular Biology Laboratory - Hamburg Unit, Germany
  2. Department of Biochemistry, University of Illinois at Urbana-Champaign, United States
  3. IQUIBICEN-CONICET, Ciudad Universitaria, Pabellón 2, Argentina
  4. Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 2, Argentina
  5. Instituto Tecnológico Vale, Brazil
  6. Laboratorio de Biotecnología Aplicada y Genómica Funcional, Instituto de Botánica del Nordeste (IBONE-CONICET), Facultad de Ciencias Agrarias, Universidad Nacional del Nordeste, Argentina
  7. Department of Plant Sciences, University of California, Davis, United States
  8. Instituto de Biotecnología de Misiones, Facultad de Ciencias Exactas, Químicas y Naturales, Universidad Nacional de Misiones (INBIOMIS-FCEQyN-UNaM), Argentina
  9. Instituto de Biología Subtropical, Universidad Nacional de Misiones (IBS-UNaM-CONICET), Argentina
  10. Department of Biological Sciences, Western Michigan University, United States
  11. Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, United States
  12. Center for Biophysics and Quantitative Biology, University of Illinois at Urbana Champaign, United States
8 figures, 9 tables and 1 additional file

Figures

Biosynthetic routes to caffeine within the xanthine alkaloid network.

CF, caffeine; PX, paraxanthine; TB, theobromine; TP, theophylline; 1X, 1-methylxanthine; 3X, 3-methylxanthine; 7X, 7-methylxanthine; XR, xanthosine; X, xanthine. Nitrogen atoms are coloured to match the arrows corresponding to the enzymes that methylate them. Adapted with permission from O’Donnell et al., 2021.

Figure 2 with 1 supplement
Yerba mate genome duplication history.

(A) Evolutionary scenario of the eudicot genomes of Lactuca sativa, Daucus carota, Ilex paraguariensis, Coffea canephora, and Vitis vinifera, from their ancestor pre-γ. The plastid genome phylogeny is represented with solid black lines, while the multiple nuclear genome phylogeny is represented with green dashed lines. Paleopolyploidizations are shown with coloured dots (duplications) and stars (triplications). Divergence time estimates for the lineages, as well as age estimates for the L. sativa and D. carota paleopolyploidizations were obtained from the literature (Iorizzo et al., 2016; Magallón et al., 2015; Reyes-Chin-Wo et al., 2017; Zhang et al., 2020b). Ma, million years ago. (B) Ks distributions with Gaussian mixture model and SiZer analyses of I. paraguariensis (blue), L. sativa (green), D. carota (yellow), C. canephora (red), and V. vinifera (purple) paralogues. SiZer maps below histograms identify significant peaks at corresponding Ks values. Blue represents significant increases in slope, red indicates significant decreases, purple represents no significant slope change, and grey indicates not enough data for the test. (C) Comparative genomic synteny analyses of I. paraguariensis with C. canephora and V. vinifera.

Figure 2—figure supplement 1
Ks distributions with Gaussian mixture model and SiZer analyses of I. paraguariensis and L. sativa (green), D. carota (yellow), C. canephora (red), and V. vinifera (purple) orthologues.

SiZer maps below histograms identify significant peaks at corresponding Ks values. Blue represents significant increases in slope, red indicates significant decreases, purple represents no significant slope change, and grey indicates not enough data for the test.

The yerba mate (YM) genome encodes three recently duplicated CS-type SABATH proteins that are expressed in caffeine-producing tissues.

(A) SABATH gene tree estimate (LnL = −34,265.473) shows the placement of full-length YM proteins (marked by blue-green dots) within clades that have published functions. GAMT, gibberellin MT; IAMT, indole-3-acetic acid MT; LAMT/FAMT, loganic/farnesoic acid MT; BAMT/BSMT, benzoic/salicylic acid MT; XMT, xanthine alkaloid MT used for caffeine biosynthesis in Coffea and Citrus; SAMT, salicylic acid MT; JMT, jasmonic acid MT; CS, caffeine synthase in Theobroma, Camellia, and Paullinia. Accession numbers for all sequences are provided in Figure 3—source data 1. (B) Gene expression analysis of IpCS1–5 in root (n = 3) and mature leaves (n = 2) as indicated by the relative abundance of YM transcriptome reads mapped to the IpCS1–5 transcripts. RPKM, reads per kilobase per million mapped reads. Error bars indicate standard deviation from the mean. Housekeeping gene: G3PD, glyceraldehyde-3-phosphate dehydrogenase. (C) Synteny-based analysis of the CS genomic region for I. paraguariensis, I. polyneura, and I. latifolia.

Figure 3—source data 1

Accession numbers of SABATH sequences used for phylogenetic analysis in Figure 3.

https://cdn.elifesciences.org/articles/104759/elife-104759-fig3-data1-v1.docx
Figure 4 with 1 supplement
SABATH enzymes have evolved to catalyse the biosynthesis of caffeine in yerba mate.

(A) Relative enzyme activitiy of IpCS1 (n = 4), IpCS2 (n = 3), and IpCS3 (n = 3) SABATH enzymes with eight xanthine alkaloid substrates. (B) High-performance liquid chromatography (HPLC) traces showing products formed by three encoded caffeine synthase (CS)-type enzymes. Absorbance at 254 nm is shown. (C) Proposed biosynthetic pathway for caffeine in yerba mate. X, xanthine; XR, xanthosine; 1X, 1-methylxanthine; 3X, 3-methylxanthine; 7X, 7-methylxanthine; TP, theophylline; TB, theobromine; PX, paraxanthine. Coloured atoms and arrows indicate atoms that act as methyl acceptors for a given reaction.

Figure 4—figure supplement 1
Michaelis–Menten curves used to estimate kinetic parameters for (A) IpCS1, (B) IpCS2, and (C) IpCS3 (n = 2).

m1 is the estimate for Vmax and m2 is the estimate for KM. X, xanthine; 3X, 3-methylxanthine; TB, theobromine.

Figure 5 with 5 supplements
Ancestral sequence resurrection reveals ancestral xanthine alkaloid pathway flux.

(A) Simplified evolutionary history of three yerba mate (YM) xanthine alkaloid-methylating enzymes and their two ancestors, AncIpCS1 and AncIPCS2. Average site-specific posterior probabilities (PP) for each ancestral enzyme estimate are provided. Numbers below each branch of the phylogeny represents the number of amino acid replacements between each enzyme shown. These two ancestral relative activity charts (n = 4) show the averaged activities of two allelic variants of each enzyme. Relative substrate preference is also shown for the AncIPCS2 mutant enzyme (n = 3) in which five amino acid positions, A22G, R23C, T25S, H221N, and Y265C, that are inferred to have been replaced during the evolution of IpCS3, were changed. (B) Inferred pathway flux is shown for the antecedent pathways that could have been catalysed by the ancestral or modern-day combinations of enzymes that would have existed at three time points in the history of the enzyme lineage. Arrows linking metabolites are coloured according to the activities detected from each enzyme shown in panel A. Dotted arrows are shown for AncIpCS1’ because it is unknown what characteristics it would possess; it is assumed that it would have at least catalysed the formation of 3X from X since both its ancestor and descendant enzyme do so. X, xanthine; XR, xanthosine; 1X, 1-methylxanthine; 3X, 3-methylxanthine; 7X, 7-methylxanthine; TP, theophylline; TB, theobromine; PX, paraxanthine.

Figure 5—figure supplement 1
SABATH enzyme family phylogenetic tree used for obtaining ancestral sequence estimates for the clade including IpCS1–3 of Aquifoliales (log-likelihood = −46,631.672).

Clades of enzymes for which at least one sequence has been functionally characterized are labelled. GAMT, gibberellin MT; IAMT, indole-3-acetic acid MT; FAMT, farnesoic acid MT; BAMT, benzoic acid MT; XMT, xanthine alkaloid MT used for caffeine biosynthesis in Coffea and Citrus; SAMT, salicylic acid MT; JMT, jasmonic acid MT; CS, caffeine synthase in Theobroma, Camellia, and Paullinia. Within the CS clade, the orders of rosids and asterids are labelled to show interrelationships.

Figure 5—figure supplement 2
Caffeine synthase enzyme family phylogenetic tree used for obtaining alternative ancestral sequence estimates for AncIpCS1 and 2 (log-likelihood = −7032.8928).
Figure 5—figure supplement 3
Alignment of the two estimated amino acid sequences for AncIpCS1 that were biochemically characterized in Figure 5.
Figure 5—figure supplement 4
Alignment of the two estimated amino acid sequences for AncIpCS2 that were biochemically characterized in Figure 5.
Figure 5—figure supplement 5
High-performance liquid chromatography (HPLC) traces for xanthine alkaloid products formed by ancestral Ilex caffeine synthase (CS) enzymes.

(A) Chromatograms for AncIpCS1 assayed with three substrates. (B) Chromatograms for AncIpCS2 assayed with three substrates. (C) Chromatogram for authentic standards. X, xanthine; XR, xanthosine; 1X, 1-methylxanthine; 3X, 3-methylxanthine; 7X, 7-methylxanthine; TP, theophylline, TB, theobromine; PX, paraxanthine. Absorbance at 254 nm is shown.

Figure 6 with 4 supplements
Crystal structure of IpCS3 in complex with caffeine (CF) and S-adenosyl-homocysteine (SAH) and comparison with the active site of Coffea canephora DXMT.

(A) Overview of the crystal structure of IpCS3 (PDB ID: 8UZD) depicting the active site of the enzyme in complex with CF and SAH. (B) Relevant residues in IpCS3 for ligand recognition. (C) Relevant residues in CcDXMT (PDB ID: 2EFJ) for ligand recognition. Protein residues are displayed as lines with carbon atoms coloured in bluewhite while small molecules – CF, theobromine (TB), and SAH – are drawn as sticks. Colour code for the rest of the atoms: nitrogen (blue), oxygen (red), and sulphur (yellow). Hydrogen bond interactions are indicated as black dotted lines.

Figure 6—figure supplement 1
Crystal structure of IpCS3 displaying a difference Fourier map (Fo − Fc) contoured to 2.0 σ (blue) showing bound SAH and CF.

Relevant residues in IpCS3 for ligand recognition are displayed as lines with carbon atoms coloured in grey, while small molecules – caffeine (CF) and S-adenosyl-homocysteine (SAH) – are drawn as sticks and labelled. Colour code for the rest of the atoms: nitrogen (blue), oxygen (red), and sulphur (yellow).

Figure 6—figure supplement 2
Theobromine and caffeine are oriented the same way in the active site of IpCS3.

(A) IpCS3–CF complex (PDB ID: 8T2G). (B) IpCS3–TB complex (docking model). Protein residues are displayed as lines with carbon atoms coloured in bluewhite while small molecules – theobromine (TB), caffeine (CF), and S-adenosyl-homocysteine (SAH) – are drawn as sticks. Colour code for the rest of the atoms: nitrogen (blue), oxygen (red), and sulphur (yellow). Hydrogen bond interactions are indicated as dotted lines.

Figure 6—figure supplement 3
Comparative amino acid alignment of xanthine methyltransferase (XMT) and caffeine synthase (CS) sequences (1–209 of IpCS1) shows convergent changes predicted to participate in substrate binding and promote methylation preference switches.

Accession numbers are as follows: TcCS1 (EOY17874), PcCS1 (EC766748), IpCS1 (CAK9135737), CisXMT1 (KDO50937), TcCS2 (EOY17880), PcCS2 (EC778019), IpCS2 (CAK9135740), CsTCS2 (AB031281), CcXMT1 (JX978518), CaXMT1 (AB048793), CaXMT2 (AB084127), CsTCS1 (AB031280), TcBTS1 (AB096699), CcMXMT1 (JX978517), CaMXMT1 (AB048794), CaMXMT2 (AB084126), CisXMT2 (XP_006469448), PcCS (BK008796), CaDXMT1 (AB084125), CaDXMT2 (KJ577793), CcDXMT (JX978516), IpCS3 (CAK9135742), and CbSAMT (AAF00108).

Figure 6—figure supplement 4
Comparative amino acid alignment of xanthine methyltransferase (XMT) and caffeine synthase (CS) sequences (210–365 of IpCS1) shows convergent changes predicted to participate in substrate binding and promote methylation preference switches.

Accession numbers are as follows: TcCS1 (EOY17874), PcCS1 (EC766748), IpCS1 (CAK9135737), CisXMT1 (KDO50937), TcCS2 (EOY17880), PcCS2 (EC778019), IpCS2 (CAK9135740), CsTCS2 (AB031281), CcXMT1 (JX978518), CaXMT1 (AB048793), CaXMT2 (AB084127), CsTCS1 (AB031280), TcBTS1 (AB096699), CcMXMT1 (JX978517), CaMXMT1 (AB048794), CaMXMT2 (AB084126), CisXMT2 (XP_006469448), PcCS (BK008796), CaDXMT1 (AB084125), CaDXMT2 (KJ577793), CcDXMT (JX978516), IpCS3 (CAK9135742), and CbSAMT (AAF00108).

Figure 7 with 1 supplement
Docking models of xanthine alkaloids in IpCS1 and IpCS2 active sites.

(A) IpCS1–X complex. (B) IpCS2–3X complex. Protein residues are displayed as lines with carbon atoms coloured in bluewhite while small molecules – xanthine (X), 3-methylxanthine (3X), caffeine (CF), paraxanthine (PX), S-adenosyl-L-methionine (SAM), and S-adenosyl-homocysteine (SAH) – are drawn as sticks. Colour code for the rest of the atoms: nitrogen (blue), oxygen (red), and sulphur (yellow). Hydrogen bond interactions are indicated as black dotted lines.

Figure 7—figure supplement 1
AlphaFold2-ColabFold Model Quality assessment of IpCS1, IpCS2, and IpCS3 models.

Sequence coverage of the multiple sequence alignment used for IpCS1 (A), IpCS2 (B), and IpCS3 (C). Alignment error for IpCS1 (D), IpCS2 (E), and IpCS3 (F). pLDDT score of IpCS1 (G), IpCS2 (H), and IpCS3 (I).

Figure 8 with 1 supplement
Only CS genes were available for co-option and utilization for xanthine alkaloid biosynthesis in yerba mate whereas coffee only had xanthine methyltransferase (XMT) genes.

Both CS- and XMT-type caffeine biosynthetic enzymes were present in the ancestor of core eudicots but numerous apparent losses of one or the other or both has occurred during lineage diversification. Gene loss is represented by vertical bar on relevant branches of the cladogram.

Figure 8—figure supplement 1
Only CS genes are available for co-option and utilization for xanthine alkaloid biosynthesis in yerba mate.

(A) Synteny-based analysis of the CS genomic region for seven angiosperm taxa. (B–D) Synteny-based analyses of the XMT genomic regions for seven angiosperm taxa. Angiosperm taxa: GH, Gossypium hirsutum; TC, Theobroma cacao; CiS, Citrus sinensis; IP, Ilex paraguariensis; CC, Coffea canephora; AC, Actinidia chinensis; CS, Camellia sinensis. Genes: CS, caffeine synthase-type enzyme; EIF3F, eukaryotic translation initiation factor 3 subunit F; POT, proton‐dependent oligopeptide transporter; XMT, xanthine methyltransferase-type enzyme; RPIA, ribose 5-phosphate isomerase A; BAO, beta-amyrin 28-oxidase-like; UAE, UDP-arabinose 4-epimerase 1-like; IQM, IQ domain-containing protein; PTC52, protochlorophyllide-dependent translocon component 52; CBP, calcium-binding protein; MYB, MYB transcription factor; RBCMT, ribulose-1,5 bisphosphate carboxylase/oxygenase large subunit N-methyltransferase; NUDT2, nudix hydrolase 2-like, SRT1, NAD-dependent protein deacetylase; RIN4, RPM1 interacting protein 4; NSP5, nitrile specifier protein 5.

Tables

Table 1
Statistics of the genome sequencing data of yerba mate.
LibraryNumber of readsRead lengthTotal lengthCoverage
Pair-end 350 bp #1360,653,40810136.4 Gbp21.8×
Pair-end 350 bp #2368,746,46410137.2 Gbp22.3×
Pair-end 550 bp356,261,24610136 Gbp21.5×
Mate-pair 3 kbp #1415,398,58610130.3 Gbp18.2×
Mate-pair 3 kbp #2410,588,93410130 Gbp17.9×
Mate-pair 3 kbp #3343,059,35010125 Gbp15×
Mate-pair 8 kbp393,202,25610134.6 Gbp20.7×
Mate-pair 12 kbp415,478,77610133.7 Gbp20.1×
PacBio long reads19,514,62750 bp to 61 kbp77.5 Gbp49.3×
Total341 Gbp207.8×
Table 2
Statistics of the genome assembly of yerba mate.
MetricValue
# scaffolds (≥1000 bp)10,611
# scaffolds (≥5000 bp)9343
# scaffolds (≥10,000 bp)8951
# scaffolds (≥25,000 bp)5944
# scaffolds (≥50,000 bp)2595
Total length (≥50,000 bp)887,124,725
# scaffolds10,611
Largest scaffold7,402,063
Total length1,064,802,823
GC (%)36.33
N50510,878
N75132,523
L50506
L751461
# N’s per 100 kbp1976.99
Table 3
Classification and distribution of repetitive DNA elements in yerba mate.
NumberLength occupied (bp)Percentage of the genome (%)
Class I retrotransposons421,599385,714,53236.22
SINEs840154,2980.01
Penelope000.00
LINEs35,43317,109,2071.61
CRE/SLACS000.00
L2/CR1/Rex575135,5490.01
R1/LOA/Jockey44376,9370.01
R2/R4/NeSL000.00
RTE/Bov-B85992,126,7650.20
L1/CIN425,81614,769,9561.39
LTR retrotransposons385,326368,451,02734.60
BEL/Pao709266,6320.03
Ty1/Copia98,23767,631,1366.35
Gypsy/DIRS1216,472274,526,51525.78
Retroviral000.00
Class II DNA transposons45,42719,116,2091.80
hobo-Activator21,3356,378,8500.60
Tc1-IS630-Pogo000.00
En-Spm000.00
MuDR-IS905000.00
PiggyBac000.00
Tourist/Harbinger58702,846,5480.27
Others000.00
Unclassified990,080269,430,12225.30
Total interspersed repeats674,26086363.32
Small RNA4362718,7620.07
Satellites000.00
Simple repeats185,5077,911,0800.74
Low complexity31,8561,606,2550.15
Table 4
Apparent enzyme kinetic parameter estimates for yerba mate caffeine biosynthetic enzymes with selected substrates.
Enzyme (substrate)KM (μM)kcat (1/s)kcat/KM (s–1 M–1)
IpCS1 (X)85.050.000910.11
IpCS2 (3X)197.080.003115.77
IpCS3 (TB)151.190.002919.36
Table 5
Data collection and refinement statistics of IpCS3 structure bound to S-adenosyl-homocysteine (SAH) and caffeine.
IpCS3 in complex with SAH and caffeine
PDB8UZD
Data collection
Wavelength (Å)0.9786
Resolution (Å)2.72
Resolution rangea*37.00–2.72
(2.82–2.72)
Space groupP 41 21 2
Cell dimensions
a, b, c (Å)82.67, 82.67, 226.09
α, β, γ (°)90.00, 90.00, 90.00
Total reflections43,818
Unique reflections21,910
Multiplicitya*2.0 (2.0)
Completeness (%)a*99.89 (100.00)
<II>a25.79 (2.87)
Rmergea,b* (%)0.0223 (0.2168)
Rmeas (%)a*0.0315 (0.3066)
CC1/2a*0.999 (0.878)
Refinement
Resolution (Å)2.72
No. reflections21,909
Rworkc /Rfreed §0.194/0.248
No. atoms
Protein5,216
CFF + SAH80
Water48
B-factors
Protein63.38
CFF + SAH84.48
Water48.19
Bond lengths (Å)0.004
Bond angles (°)1.112
  1. *

    aNumbers in parentheses refer to the highest resolution shell.

  2. bRmerge = Σ|Ii − <Ii>|/ΣIi, where Ii = the intensity of the ith reflection and <Ii> = mean intensity.

  3. cRwork = Σ|FoFc|/Σ|Fo|, where Fo and Fc are the observed and calculated structure factors, respectively.

  4. §

    dRfree was calculated as for Rwork, but on a test set comprising 5% of the data excluded from refinement.

Appendix 1—table 1
Detail of yerba mate tRNA and anti-codon nucleotide sequences.
tRNA genesAnti-codon countsTotal No. of tRNAs
POLAR
Asparagine (Asn)GTT (36)ATT (0)36
Cysteine (Cys)GCA (22)ACA (0)22
Glutamine (Gln)TTG (13)CTG (10)23
Glycine (Gly)GCC (32)TCC (11)CCC (8)ACC (0)51
Serine (Ser)GCT (20)TGA (20)AGA (15)CGA (5)GGA (5)ACT (0)65
Threonine (Thr)TGT (11)AGT (16)GGT (6)CGT (2)35
Tyrosine (Tyr)GTA (17)ATA (0)17
NON-POLAR
Alanine (Ala)AGC (12)CGC (4)TGC (11)GGC (0)27
Isoleucine (Ile)AAT (14)TAT (6)GAT (2)22
Leucine (Leu)CAA (23)AAG (10)CAG (4)TAG (8)TAA (6)GAG (0)51
Methionine (Met)CAT (55)55
Phenylalanine (Phe)GAA (30)AAA (2)32
Proline (Pro)AGG (10)TGG (28)CGG (4)GGG (0)42
Tryptophan (Trp)CCA (31)31
Valine (Val)AAC (11)GAC (10)CAC (9)TAC (7)37
POSITIVELY CHARGED
Arginine (Arg)ACG (15)TCT (14)CCT (7)CCG (6)TCG (6)GCG (3)51
Histidine (His)GTG (25)ATG (2)27
Lysine (Lys)CTT (10)TTT (17)27
NEGATIVELY CHARGED
Aspartic acid (Asp)GTC (39)ATC (1)40
Glutamic acid (Glu)CTC (14)TTC (21)35
Selenocysteine tRNAsTCA (0)0
Possible suppressor tRNAsCTA (0)TTA (1)TCA (1)2
tRNAs with undetermined isotypes11
Predicted pseudogenes76
Appendix 1—table 2
miRNA families predicted in the yerba mate genome.
miRNAFunctional involvement in other eudicot plants
miR156Seed growth and development Chi et al., 2011; Song et al., 2011
Fruit development (Pantaleo et al., 2010)
Drought/cold stress (Curaba et al., 2012; Zhu and Luo, 2013)
miR159Growth and development (Varkonyi-Gasic et al., 2010)
Phase change from vegetative to reproductive growth (Han et al., 2014)
Lipid and protein accumulation (Zhao et al., 2010)
Drought stress (Barrera-Figueroa et al., 2011)
miR160Growth and development (Gu et al., 2013; Wang et al., 2011)
Fibrous root and storage root development (Sun et al., 2015)
Drought stress (Nadarajah and Kumar, 2019)
miR162_2Storage root initiation and development (Sun et al., 2015)
miR164Lateral root and leaf development (Deng et al., 2015)
Fibrous root and storage root development (Sun et al., 2015)
Seed development (Song et al., 2011)
Drought stress (Ferreira et al., 2012)
miR166Seed development (Song et al., 2011)
Fibrous root and storage root development (Sun et al., 2015)
Drought stress (Barrera-Figueroa et al., 2011)
Disease resistance (Guo et al., 2011)
miR167_1Growth and development (Varkonyi-Gasic et al., 2010)
Drought/cold stress (Barrera-Figueroa et al., 2011; Jeong et al., 2011)
miR168Development (Gu et al., 2013)
Resistance to fire blight (Kaja et al., 2015)
miR169_2; miR169_5Drought/cold/salt stress (Carnavale Bottino et al., 2013; Koc et al., 2015; Sheng et al., 2015; Shui et al., 2013)
miR171_1; miR171_2Development (Chaves et al., 2015; Zhang et al., 2011)
Lipid and protein accumulation (Zhao et al., 2010)
miR172Development (Sun et al., 2012)
Starch biosynthesis (Chen et al., 2015)
Drought/cold stress (Koc et al., 2015)
miR390Drought stress (Shui et al., 2013)
Leaf morphology (Karlova et al., 2013)
miR394Drought/salt stress (Song et al., 2013)
miR395Low sulfate response (Katiyar et al., 2012)
miR396Seed development (Gao et al., 2015)
Starch biosynthesis (Chen et al., 2015)
Drought/salt stress (Shui et al., 2013; Xie et al., 2014)
miR397Drought/cold stress (Koc et al., 2015)
miR398Fibrous root and storage root development (Sun et al., 2015)
Salt stress (Carnavale Bottino et al., 2013)
miR399Phosphate homeostasis (Katiyar et al., 2012; Pant et al., 2008)
Shoot to root transport (Pant et al., 2008)
miR403Drought stress (Shui et al., 2013)
miR405Transposon derived (Xie et al., 2005)
miR408Tolerance to Boron deficiency (Lu et al., 2015)
Cold stress (Zhang et al., 2014)
Response to wounding and topping (Tang et al., 2012)
miR473Metabolism (Din et al., 2014)
Stress response (Patanun et al., 2013)
miR474Drought stress (Kantar et al., 2011)
miR475Metabolism (Din et al., 2014)
miR477Starch biosynthesis (Xie et al., 2011)
miR530Disease resistance (Zhao et al., 2015)
miR1023Disease resistance (Jiao and Peng, 2018)
miR1446Stress response (Lu et al., 2008)
Appendix 1—table 3
miRNA targets predicted in the yerba mate genome.
Targets IDsDescriptionmiR159miR164miR167_1miR168miR169_2miR169_5miR171_1miR171_2miR390miR394miR396miR397miR398miR403
ILEXPARA_008283Uncharacterized protein
ILEXPARA_029002Uncharacterized protein
ILEXPARA_031381Uncharacterized protein
ILEXPARA_043376Uncharacterized protein
ILEXPARA_013180ileS, isoleucine tRNA ligase
ILEXPARA_000910myb-like transcription factor
ILEXPARA_028644Uncharacterized protein
ILEXPARA_005969Putative membrane protein
ILEXPARA_048009panC, pantothenate (vitamin B5) synthetase
ILEXPARA_018064arf, auxin response factor
ILEXPARA_019275Uncharacterized protein
ILEXPARA_024153Uncharacterized protein
ILEXPARA_035190Uncharacterized protein
ILEXPARA_016483Hypothetical protein
ILEXPARA_029421GCP4, gamma tubulin complex protein 4
ILEXPARA_047849GOLS1, galactinol synthase 1
ILEXPARA_003987NACK1, kinesin-like protein
ILEXPARA_044341Uncharacterized protein
ILEXPARA_005359Uncharacterized protein
ILEXPARA_035716RABE1C, ras-related protein
ILEXPARA_010316MAPK, mitogen activated protein kinase
ILEXPARA_032923Uncharacterized protein
ILEXPARA_008149Uncharacterized protein
ILEXPARA_048631Protein kinase
ILEXPARA_008152Uncharacterized protein
ILEXPARA_023090NAGK, N-acetyl-D-glucosamine kinase
ILEXPARA_024088RNA-binding (RRM/RBD/RNP motif) family protein
ILEXPARA_023716Endoglucanase
ILEXPARA_042182Uncharacterized protein
ILEXPARA_021515Pentatricopeptide repeat (PPR) protein
ILEXPARA_004925Uncharacterized protein
ILEXPARA_045111Rotamase FKBP 1
ILEXPARA_013832ABCC2, ABC transporter C family member 2 protein
ILEXPARA_039828guaA, GMP synthase
ILEXPARA_028274Hypothetical protein
ILEXPARA_024538RPT6A, regulatory particle triple-A ATPase 6A
ILEXPARA_031387Uncharacterized protein
ILEXPARA_043757Uncharacterized protein
ILEXPARA_005297Uncharacterized protein
ILEXPARA_012032Uncharacterized protein
ILEXPARA_9682OST1B, oligosaccharyltransferase 1B
Appendix 1—key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Gene (Ilex paraguariensis)IpCS1GenBankCAK9135737Xanthine methyltransferase gene of Ilex paraguariensis
Gene (Ilex paraguariensis)IpCS2GenBankCAK91357403-Methylxanthine methyltransferase gene of Ilex paraguariensis
Gene (Ilex paraguariensis)IpCS3GenBankCAK9135742Theobromine methyltransferase gene of Ilex paraguariensis
Strain, strain background (Escherichia coli)BL21(DE3)Novagen69450-MChemically competent cells
Biological sample (Ilex paraguariensis)Ilex paraguariensis A. St.-Hil. var. paraguariensisINTA-EEA Cerro Azul, Misiones, Argentinacv CA 8/74Used to extract genomic DNA
Biological sample (Ilex paraguariensis)Ilex paraguariensis A. St.-Hil. var. paraguariensisEstablecimiento Las Marías S.A.C.I.F.A., Corrientes, Argentinacv SI-49Used to extract genomic DNA
Recombinant DNA reagentpUC57-IpCS1 (plasmid)GenScriptUsed to clone IpCS1 gene
Recombinant DNA reagentpTrcHis-IpCS2 (plasmid)This paperUsed to clone IpCS2 gene
Recombinant DNA reagentpUC57-IpCS3 (plasmid)GenScriptUsed to clone IpCS3 gene
Recombinant DNA reagentpUC57-AncIpCS1 (plasmid)GenScriptUsed to clone AncIpCS1 gene
Recombinant DNA reagentpUC57-AncIpCS2 (plasmid)GenScriptUsed to clone AncIpCS2 gene
Sequence-based reagentpET-15b- IpCS1 (plasmid)This paperUsed to express IpCS1 in E. coli BL21(DE3)
Sequence-based reagentpET-15b- IpCS2 (plasmid)This paperUsed to express IpCS2 in E. coli BL21(DE3)
Sequence-based reagentpET-15b- IpCS3 (plasmid)This paperUsed to express IpCS3 in E. coli BL21(DE3)
Sequence-based reagentpET-15b- AncIpCS1 (plasmid)This paperUsed to express AncIpCS1 in E. coli BL21(DE3)
Sequence-based reagentpET-15b- AncIpCS2 (plasmid)This paperUsed to express AncIpCS2 in E. coli BL21(DE3)
Sequence-based reagentIpCS2FThis paperPCR primers5′-ATGGACGTGAAGGAAGCAC-3′
Sequence-based reagentIpCS2RThis paperPCR primers5′-CTATCCCATGGTCCTGCTAAG-3′
Peptide, recombinant proteinIpCS1This paperPurified from E. coli BL21(DE3) cells
Peptide, recombinant proteinIpCS2This paperPurified from E. coli BL21(DE3) cells
Peptide, recombinant proteinIpCS3This paperPurified from E. coli BL21(DE3) cells
Peptide, recombinant proteinAncIpCS1This paperPurified from E. coli BL21(DE3) cells
Peptide, recombinant proteinAncIpCS2This paperPurified from E. coli BL21(DE3) cells
Commercial assay or kitDNeasy Plant Mini KitQIAGENCat. #: 69104Used to extract genomic DNA from Ilex paraguariensis
Commercial assay or kitQuick-DNA HMW MagBead KitZymo ResearchCat. #: D6060Used to extract genomic DNA from Ilex paraguariensis
Commercial assay or kitIllumina TruSeq DNA Sample Preparation KitIlluminaCat. #: FC-121-2003Used to construct paired-end libraries
Commercial assay or kitIllumina Nextera Mate Pair Library Preparation KitIlluminaCat. #: FC-132-1001Used to construct mate-pair libraries
Commercial assay or kitSequel Binding Kit 1.0Pacific BiosciencesCat. #: 101-365-900Used for preparing DNA templates for sequencing on the PacBio Sequel System
Commercial assay or kitSequel Sequencing Kit 1.0Pacific BiosciencesCat. #: 101-309-500Used to perform sequencing reactions on the PacBio Sequel System
Commercial assay or kitSMRT Cell 1MPacific BiosciencesCat. #: 100-171-800Consumable microchip used in the PacBio Sequel System for Single Molecule, Real-Time (SMRT) sequencing
Commercial assay or kitpTrcHis TOPO TA Expression KitInvitrogenCat. #: K4410-01Used to clone IpCS2 gene
Commercial assay or kitQIAEX II Gel Extraction KitQIAGENCat. #: 20021Used to clone IpCS1, IpCS3, AncIpCS1, and AncIpCS2 genes into pET-15b expression vector
Commercial assay or kitAgilent QuikChange Lightning KitAgilent Technologies Inc, Santa Clara, CACat. #: 210518Used for site-directed mutagenesis of AncIpCS2
Commercial assay or kitQIAprep Spin Miniprep KitQIAGENCat. #: 27104Used for the rapid purification of high-quality plasmid DNA
Commercial assay or kitTALON spin columnsTakara BioCat. #: 89068Used for the purification of histidine-tagged proteins
Chemical compound, drugXanthineSigma-AldrichCat. #: X0626Used to test relative substrate preference of IpCS1–3 and AncIpCS1–2
Chemical compound, drugXanthosineSigma-AldrichCat. #: X0750Used to test relative substrate preference of IpCS1–3 and AncIpCS1–2
Chemical compound, drug1-MethylxanthineSigma-AldrichCat. #: 69720Used to test relative substrate preference of IpCS1–3 and AncIpCS1–2
Chemical compound, drug3-MethylxanthineSigma-AldrichCat. #: 222526Used to test relative substrate preference of IpCS1–3 and AncIpCS1–2
chemical compound, drug7-MethylxanthineSigma-AldrichCat. #: 69723Used to test relative substrate preference of IpCS1–3 and AncIpCS1–2
Chemical compound, drugTheobromineSigma-AldrichCat. #: T4500Used to test relative substrate preference of IpCS1–3 and AncIpCS1–2
Chemical compound, drugParaxanthineSigma-AldrichCat. #: D5385Used to test relative substrate preference of IpCS1–3 and AncIpCS1–2
Chemical compound, drugTheophyllineSigma-AldrichCat. #: T1633Used to test relative substrate preference of IpCS1–3 and AncIpCS1–2
Software, algorithmTrimmomaticDOI: 10.1093/bioinformatics/btu170v.0.39Used to remove adaptor contaminations and filter low-quality reads
Software, algorithmQuakeDOI: 10.1186/gb-2010-11-11-r116v.0.3Used to correct clean reads
Software, algorithmSOAPdenovoDOI: 10.1186/2047-217X-1-18v.2Used to assemble and scaffold contigs
Software, algorithmDeconSeqDOI: 10.1371/journal.pone.0017288v.0.4.3Used to detect and remove sequence contaminants
Software, algorithmCanuDOI: 10.1101/gr.215087.116v.2.2Used for self-correction and assembly of long reads
Software, algorithmPurgeHaplotigsDOI: 10.1186/s12859-018-2441-2Used to separate assembly haplotypes
Software, algorithmQuickmergeDOI: 10.1101/029306v.03Used to merge SOAPdenovo and Canu curated assemblies
Software, algorithmSSPACEDOI: 10.1093/bioinformatics/btq683v.2.1.1Used to refine scaffolds and contigs
Software, algorithmRepeatMaskerhttp://repeatmasker.org/Used to mask the genome assembly
Software, algorithmFunannotateDOI: 10.5281/zenodo.2604804v.1.8.13Used to predict the protein- and non-coding genes
Software, algorithmInfernalDOI: 10.1093/bioinformatics/btt509v.1.1.4Used to improve the prediction of small RNAs and microRNAs
Software, algorithmtRNAScan-SEDOI: 10.1007/978-1-4939-9173-0_1v.2.0Used to improve the prediction of transfer RNAs
software, algorithmTAPIRhttp://bioinformatics.psb.ugent.be/webtools/tapirUsed to identify miRNA targets
Software, algorithmTargetFinderDOI: 10.1007/978-1-60327-005-2_4v.1.7Used to identify miRNA targets
Software, algorithmInterProScanDOI: 10.1093/bioinformatics/btu031v.5.55-88.0Used to assign function of the predicted genes
Software, algorithmeggNOG-mapperDOI: 10.1093/nar/gky1085v.2.1.7Used to assign function to the predicted genes
Software, algorithmDfam TE Toolshttps://github.com/Dfam-consortium/TEToolsv.1.5Used to estimate the repeat content
Software, algorithmCoGe’s tool SynMaphttps://genomevolution.org/Used to estimate rates of synonymous substitution (Ks) between paralogous and orthologous genes
Software, algorithmCoGe's tool SynFindhttps://genomevolution.org/Used to determine the syntenic depth ratio between I. paraguariensis, C. canephora, and V. vinifera
Software, algorithmCoGe’s tool GEvohttps://genomevolution.org/Used to compare CS and XMT syntenic regions
Software, algorithmMAFFTDOI: 10.1093/molbev/mst010v.7.0Used to align amino acid sequences
Software, algorithmFastTreeDOI: 10.1371/journal.pone.0009490v.2Used to perform phylogenetic analysis of SABATH sequences
Software, algorithmIQTreeDOI: 10.1093/nar/gkw256Used to estimate ancestral sequences
Software, algorithmPhenixDOI: 10.1107/S0907444909052925Used to solve the crystal structure of IpCS3
software, algorithmREFMAC5DOI: 10.1107/S0907444911001314Used to refine the crystal structure of IpCS3
Software, algorithmCOOTDOI: 10.1107/S0907444910007493v.0.9.8.3Used to refine the crystal structure of IpCS3
OtherIlex paraguariensis transcriptome sequence dataENAPRJNA315513Used to assess the completeness of Ilex paraguariensis genome
OtherIlex paraguariensis transcriptome sequence dataNCBISRP043293Used to assess the completeness of Ilex paraguariensis genome
OtherIlex paraguariensis transcriptome sequence dataNCBISRP110129Used to determine the expression of IpCS1–5 genes
OtherVivaspin columnsSartoriusCat. #: VS0101Used to remove proteins after enzymatic reaction
OtherKinetex 5 μM EVO C18 columnPhenomenexCat. #: 00F-4467-ANUsed for high-performance liquid chromatography
OtherCrystal Gryphon robotArt Robbins InstrumentsCat. #: 100-1010Used for automating crystallization

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Federico A Vignale
  2. Andrea Hernandez Garcia
  3. Carlos P Modenutti
  4. Ezequiel J Sosa
  5. Lucas A Defelipe
  6. Renato Oliveira
  7. Gisele L Nunes
  8. Raúl M Acevedo
  9. German F Burguener
  10. Sebastian M Rossi
  11. Pedro D Zapata
  12. Dardo A Marti
  13. Pedro Sansberro
  14. Guilherme Oliveira
  15. Emily M Catania
  16. Madeline N Smith
  17. Nicole M Dubs
  18. Satish Nair
  19. Todd J Barkman
  20. Adrian G Turjanski
(2025)
Yerba mate (Ilex paraguariensis) genome provides new insights into convergent evolution of caffeine biosynthesis
eLife 14:e104759.
https://doi.org/10.7554/eLife.104759