1. Genetics and Genomics
Download icon

The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine

  1. Kentaro Yoshida
  2. Verena J Schuenemann
  3. Liliana M Cano
  4. Marina Pais
  5. Bagdevi Mishra
  6. Rahul Sharma
  7. Chirsta Lanz
  8. Frank N Martin
  9. Sophien Kamoun
  10. Johannes Krause
  11. Marco Thines
  12. Detlef Weigel
  13. Hernán A Burbano  Is a corresponding author
  1. The Sainsbury Laboratory, United Kingdom
  2. University of Tübingen, Germany
  3. Biodiversity and Climate Research Centre, Germany
  4. Goethe University, Germany
  5. Senckenberg Gesellschaft für Naturforschung, Germany
  6. Max Planck Institute for Developmental Biology, Germany
  7. United States Department of Agriculture, United States
  8. Centre for Integrated Fungal Research, Germany
Research Article
Cite this article as: eLife 2013;2:e00731 doi: 10.7554/eLife.00731
11 figures, 5 tables and 5 data sets

Figures

Countries of origin of samples used in whole-genome, mtDNA genome or both analyses.

Red indicates number of historic and blue of modern samples. More information on the samples is given in Tables 1 and 2.

https://doi.org/10.7554/eLife.00731.004
Ancient DNA-like characteristic of historic samples.

(A) Lengths of merged reads from historic sample M-0182898. (B) Mean lengths of merged reads from historic samples. (C) Nucleotide mis-incorporation in reads from the historic sample M-0182898. (D) Deamination at first 5′ end base in historic samples. (E) Percentage of merged reads that mapped to the P. infestans reference genome.

https://doi.org/10.7554/eLife.00731.006
Figure 3 with 1 supplement
Coverage and SNP statistics.

(A) Mean nuclear genome coverage from historic (red) and modern (blue) samples. (B) Homo- and heterozygous SNPs in each sample. (C) Inverse cumulative coverage for all homozygous SNPs across all samples. (D) Same as (C) for homo- and heterozygous SNPs.

https://doi.org/10.7554/eLife.00731.007
Figure 3—figure supplement 1
Accuracy and sensitivity of SNP calling at different cutoffs for SNP concordance based on 3- and 50-fold coverage of simulated data.

Rescue cov.—minimum coverage required to accept SNP calls in low-coverage genomes based on these SNPs having been found in high-coverage genomes. The cutoffs enclosed in orange rectangles were used for the final analysis.

https://doi.org/10.7554/eLife.00731.008
Figure 4 with 2 supplements
Maximum-parsimony phylogenetic tree of complete mtDNA genomes.

Sites with less than 90% information were not considered, leaving 24,560 sites in the final dataset. Numbers at branches indicate bootstrap support (100 replicates), and scale indicates changes.

https://doi.org/10.7554/eLife.00731.009
Figure 4—figure supplement 1
Maximum-likelihood phylogenetic tree of complete mtDNA genomes.

Sites with less than 90% information were not considered, leaving 24,560 sites in the final dataset. Numbers at branches indicate bootstrap support (100 replicates).

https://doi.org/10.7554/eLife.00731.010
Figure 4—figure supplement 2
mtDNA sequences around diagnostic Msp1 restriction site (grey) for reference haplotype modern strains (blue) and historic strains (red).

The Msp1 (CCGG) restriction site is only present in the Ib haplotype; all other strains have a C-to-T substitution (CTGG).

https://doi.org/10.7554/eLife.00731.011
Correlation between nucleotide distance of mtDNA genomes of HERB-1/haplotype Ia/haplotype Ib clade to the outgroup P17777 and sample age in calendar years before present.
https://doi.org/10.7554/eLife.00731.012
Divergence estimates of mtDNA genomes.

Bayesian consensus tree from 147,000 inferred trees. Posterior probability support above 50% is shown next to each node. Blue horizontal bars represent the 95% HPD interval for the node height. Light yellow bars indicate major historical events discussed in the text. See Figure 5 and Table 3 for detailed estimates at the four main nodes in P. infestans.

https://doi.org/10.7554/eLife.00731.013
Figure 7 with 1 supplement
Phylogenetic trees of high-coverage nuclear genomes using both homozygous and heterozygous SNPs.

(A) Maximum-parsimony tree, considering only sites with at least 95% information, leaving 4,498,351 sites in the final dataset. Numbers at branches indicate bootstrap support (100 replicates), and scale indicates genetic distance. (B) Maximum-likelihood tree. (C) Heat map of genetic differentiation (color scale indicates SNP differences). US-1 strains DDR7062 and LBUS5 have the genomes sequences closest to M-0182896 (asterisks). The two US-1 isolates in turn are outliers compared to all other modern strains (highlighted by a gray box).

https://doi.org/10.7554/eLife.00731.015
Figure 7—figure supplement 1
Phylogenetic trees of high- and low-coverage nuclear genomes.

(A) Neighbor-joining tree of high-coverage genomes using 4,595,012 homo- and heterozygous SNPs. Numbers at branches indicate bootstrap support (100 replicates), and scale indicates genetic distance. (B) Neighbor-joining tree of high- and low-coverage genomes using 2,101,039 homozygous and heterozygous SNPs. Numbers at branches indicate bootstrap support above 50, from 100 replicates. Scale indicates genetic distance. (C) Maximum parsimony tree of high- and low-coverage genomes using 315,394 SNPs homozygous and heterozygous SNPs (using only sites with at least 80% information).

https://doi.org/10.7554/eLife.00731.016
Figure 8 with 1 supplement
Ploidy analysis.

(A) Diagram of expected read frequencies of reads at biallelic SNPs for diploid, triploid and tetraploid genomes. (B) Reference read frequency at biallelic SNPs in gene dense regions (GDRs) for the historic sample M-0182896, two modern samples, and simulated diploid, triploid and tetraploid genomes. The simulated tetraploid genome is assumed to have 20% of pattern 1 and 80% of pattern 3 shown in (A). The shape and kurtosis of the observed distributions are similar to the corresponding simulated ones. (C) Polymorphic positions with more than one allele in the GDR.

https://doi.org/10.7554/eLife.00731.017
Figure 8—figure supplement 1
Reference read frequency at biallelic SNPs in gene dense regions (GDRs) for five modern high-coverage samples.
https://doi.org/10.7554/eLife.00731.018
Read allele frequencies of historic genome M-0182896 and US-1 isolate DDR7602.

Alleles were classified as ancestral or derived using outgroup species P. mirabilis and P. ipomoeae. There were 40,532 segregating sites. (A) Distributions of derived alleles at sites segregating between M-0182896 and DDR7602. (B) Annotation of the different site classes.

https://doi.org/10.7554/eLife.00731.019
Figure 10 with 1 supplement
The effector gene Avr3a and its cognate resistance gene R3a.

(A) Diagram of AVR3A effector protein. (B) Frequency of Avr3a alleles in historic and modern P. infestans strains. (C) Neighbor-joining tree of R3a homologs from potato, based on 0.67 kb partial nucleotide sequences of S. tuberosum R3a (blue, accession number AY849382.1) and homologs (dark grey) in GenBank, and de novo assembled contigs from M-0182896 (red). Numbers at branches indicate bootstrap support with 500 replicates. Scale indicates changes.

https://doi.org/10.7554/eLife.00731.023
Figure 10—figure supplement 1
Summary of de novo assembly of RXLR effector genes.

TBLASTN query was performed with 549 RXLR proteins as a query and contigs as a database. When the High-scoring Segment Pair (HSP) and matched amino acids both covered ≥99% of the query length, we recorded a hit. Results with the optimal k-mer size are highlighted.

https://doi.org/10.7554/eLife.00731.024
Suggested paths of migration and diversification of P. infestans lineages HERB-1 and US-1.

The location of the metapopulation that gave rise to HERB-1 and US-1 remains uncertain; here it is proposed to have been in North America.

https://doi.org/10.7554/eLife.00731.025

Tables

Table 1

Provenance of P. infestans samples

https://doi.org/10.7554/eLife.00731.003
IDCountry of originCollection yearHost speciesReference*
Herbarium samplesKM177500England1845Solanum tuberosum1
KM177513Ireland1846Solanum tuberosum1
KM177502England1846Solanum tuberosum1
KM177497England1846Solanum tuberosum1
KM177514Ireland1847Solanum tuberosum1
KM177548England1847Solanum tuberosum1
KM177507England1856Petunia hybrida1
M-0182898GermanyBefore 1863Solanum tuberosum2
KM177509England1865Solanum tuberosum1
M-0182900Germany1873Solanum lycopersicum2
M-0182907Germany1875Solanum tuberosum1
KM177517Wales1875Solanum tuberosum1
M-0182897USA1876Solanum lycopersicum2
M-0182906Germany1877Solanum tuberosum2
M-0182896Germany1877Solanum tuberosum2
M-0182904Austria1879Solanum tuberosum2
M-0182903Canada1896Solanum tuberosum2
KM177512EnglandNASolanum tuberosum1
Modern samples06_3928AEngland2006Solanum tuberosum3
DDR7602Germany1976Solanum tuberosum4
P1362Mexico1979Solanum tuberosum5
P6096Peru1984Solanum tuberosum5
P7722 (P. mirabilis)USA1992Solanum lycopersicum5
P9464USA1996Solanum tuberosum5
P12204Scotland1996Solanum tuberosum5
P13527Ecuador2002Solanum andreanum5
P10127USA2002Solanum lycopersicum5
P13626Ecuador2003Solanum tuberosum5
P10650Mexico2004Solanum tuberosum5
LBUS5South Africa2005Petunia hybrida6
P11633Hungary2005Solanum lycopersicum5
NL07434Netherlands2007Solanum tuberosum3
P17777USA2009Solanum lycopersicum5
P17721USA2009Solanum tuberosum5
  1. *

    1, Kew Royal Botanical Gardens; 2, Botanische Staatssammlung München; 3, Cooke et al. (2012); 4, Kamoun et al. (1999); 5, World Oomycete Genetic Resource Collection at UC Riverside, CA; 6, Dr Adele McLeod, Univ. of Stellenbosch, South Africa.

Table 2

Sequencing strategy

https://doi.org/10.7554/eLife.00731.005
IDInstrument and read typeSequencing centerCoverage
M-0182896HiSeq 2000 (2 × 101 bp)MPIHigh
M-0182897HiSeq 2000 (2 × 101 bp)MPILow*
M-0182898HiSeq 2000 (2 × 101 bp)MPILow
M-0182900HiSeq 2000 (2 × 101 bp)MPILow
M-0182903HiSeq 2000 (2 × 101 bp)MPILow
M-0182904HiSeq 2000 (2 × 101 bp)MPILow*
M-0182906HiSeq 2000 (2 × 101 bp)MPILow
M-0182907HiSeq 2000 (2 × 101 bp)MPILow
KM177497MiSeq (2 × 150 bp)MPILow
KM177500MiSeq (2 × 150 bp)MPILow*
KM177502AMiSeq (2 × 150 bp)MPILow*
KM177507MiSeq (2 × 150 bp)MPILow*
KM177509MiSeq (2 × 150 bp) and HiSeq 2000 (2 × 101 bp)MPILow
KM177512MiSeq (2 × 150 bp) and HiSeq 2000 (2 × 101 bp)MPILow
KM177513MiSeq (2 × 150 bp) and HiSeq 2000 (2 × 101 bp)MPILow
KM177514MiSeq (2 × 150 bp) and HiSeq 2000 (2 × 101 bp)MPILow
KM177517MiSeq (2 × 150 bp) and HiSeq 2000 (2 × 101 bp)MPILow
KM177548MiSeq (2 × 150 bp) and HiSeq 2000 (2 × 101 bp)MPILow
06_3928AGAIIX (2 × 76 bp)TSLHigh
DDR7602GAIIX (2 × 76 bp)TSLHigh
LBUS5GAIIX (2 × 76 bp)TSLHigh
NL07434GAIIX (2 × 76 bp)TSLHigh
P10127HiSeq 2000 (2 × 101 bp)MPILow
P10650HiSeq 2000 (2 × 101 bp)MPILow
P12204HiSeq 2000 (2 × 101 bp)MPILow
P13527GAIIX (2 × 76 bp)TSLHigh
P1362HiSeq 2000 (2 × 101 bp)MPILow
P13626GAIIX (2 × 76 bp)TSLHigh
P11633HiSeq 2000 (2 × 101 bp)MPILow
P17721HiSeq 2000 (2 × 101 bp)MPILow
P17777GAIIX (2 × 76 bp)TSLHigh
P6096HiSeq 2000 (2 × 101 bp)MPILow
P7722HiSeq 2000 (2 × 101 bp)MPILow
P9464HiSeq 2000 (2 × 101 bp)MPILow*
PIC99114GAIIX (2 × 76 bp)TSLHigh
PIC99167GAIIX (2 × 76 bp)TSLHigh
  1. *

    Samples not included in any analysis due to extremely low coverage.

  2. Samples used only in mtDNA analysis.

Table 3

Inferred time to most recent common ancestor (TMRCA) for different splits in the mtDNA tree

https://doi.org/10.7554/eLife.00731.014
NodeTMRCA (ya)
Best estimateLower 2.5%Upper 2.5%
I/HERB-1, II460300643
Ia/Ib, HERB-1234187290
HERB-1 strains182168201
IIa, IIb14278214
Table 4

Presence or absence of avirulence effector genes in historic and modern samples, expressed as percentages of effector genes covered by reads

https://doi.org/10.7554/eLife.00731.020
Avr geneR geneHERB-1*US-120th century non-US-1Outgroups
EC3527EC3626P1777706_3928ANL07434MergedPm PIC99114Pip PIC99167
Avr1R1100100001000010098100
Avr2R2100100100100100811007797100
Avr3aR3a100100100100100100100100028
Avr3bR3b000010000100100100
Avr4R41001001001009589100998592
Avrblb1Rpi-blb110010010010010010010010000
Avrblb2Rpi-blb21001001001009210010089880
Avrvnt1Rpi-vnt1100100100100100100100100100100
AvrSmira1Rpi-Smira110010010010010010010010097100
AvrSmira2Rpi-Smira21001001001001001001001001000
  1. Sequences and polymorphisms are shown in Table 5 and Table 5—source data 1.

  2. *

    Same sequences obtained for M-0182896 and merged sequences.

  3. Same sequences obtained for DDR7602 and LBUS5.

Table 5

Amino acid differences in the avirulence effectors AVR1, AVR2, AVR3a and AVR4 encoded by the T30-4 reference genome, HERB-1 and DDR7602 (US-1)

https://doi.org/10.7554/eLife.00731.021
PositionSubstitutionNote
T30-4HERB1DDR7602
AVR1 (PITG_16,663)
 80TTT, SHERB-1 polymorphisms shared with T30-4 and DDR7602.
 142II, TT
 154VV, AA
 185III, V
AVR2 (PITG_22,870)
 31NKKHERB-1 identical to DDR7602.
AVR3a (PITG_14,371)
 19SCCHERB-1 identical to DDR7602; both correspond to AVR3aKI isoform.
 80EKK
 103MII
 139MLL
AVR4 (PITG_07,387)
 19TT, ITHERB-1 polymorphisms shared with T30-4 and DDR7602.
 139LSL, S
 221LVL, V
 271VFV, F
  1. IDs in parentheses refer to gene models in reference genome. Full-length sequences of deduced amino acid sequences of HERB-1 AVR1, AVR2, AVR3a and AVR4 are provided in Table 5—source data 1.

Table 5—source data 1

Full-length sequences of deduced amino acid sequences of HERB-1 AVR1, AVR2, AVR3a and AVR4.

https://doi.org/10.7554/eLife.00731.022

Data availability

The following data sets were generated
  1. 1
  2. 2
  3. 3
The following previously published data sets were used
  1. 1
  2. 2

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)