Lifestyles shape genome size and gene content in fungal pathogens
Figures
Several genomic traits are correlated with genome size in Sordariomycetes.
(A) Maximum likelihood tree based on 1000 concatenated protein sequence alignments calculated with IQ-TREE using ultrafast bootstrap approximation (n=563 species). The largest orders are indicated with different colors. Bootstrap support for all major clades except one within Ophiostomatales (82%, black dot) reached 100%. (B) Spearman’s rank correlation for phylogenetic independent contrasts of all pairwise combinations of genomic traits. (C) Principal component analysis of genomic traits. Colors correspond to orders depicted in A. (D) Model of genome size. On the y axis are the model coefficients with 95% confidence intervals obtained with phylogenetic generalized least squares model (PGLS) fitted to 555 genomes. Principal components correspond to the three main principal components based on 11 genomic traits. (E) Testing hypotheses of genome size evolution. The first four plots show the correlation of dN/dS as a proxy of Ne with the non-coding elements of the genome. The last plot is a correlation of gene loss rate with genome size. Correlations were tested using Spearman’s rank correlation on phylogenetic independent contrasts. Loadings of each principal component from panel C are shown in Figure 1—figure supplement 1. Data underlying figures are in Figure 1—source data 1–4.
-
Figure 1—source data 1
Spearman’s rank correlation coefficients for pairs of genomic traits.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig1-data1-v1.xlsx
-
Figure 1—source data 2
Eigenvectors calculated all genomic traits.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig1-data2-v1.xlsx
-
Figure 1—source data 3
Loadings calculated on all genomic traits.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig1-data3-v1.xlsx
-
Figure 1—source data 4
Contrasts calculated for genomic traits.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig1-data4-v1.xlsx
Loadings of the main PCs.
Principal component analysis of genomic traits. The histogram shows the proportion of variance explained by each PC. Dark gray PCs together explain 66% of the variance. The heatmap shows loadings for each PC, with brown colors depicting positive and green colors depicting negative contributions to the PC.
Genomic traits associated with pathogenicity.
(A) Time-scaled phylogeny of Sordariomycetes colored by the reconstructed genome size displayed in log scale. The outside heatmap indicates species which are pathogenic and insect-associated (IA). Letters and numbers correspond to clades used in subsequent analyses, with representative genera from each clade listed in the legend. The scale bar indicates the branch length in millions of years (My). (B) Association of 13 genomic traits with pathogenicity estimated with BayesTraits, phylogenetic logistic regression (Phyloglm), and Random Forest classifier. In BayesTraits and Phyloglm, colors correspond to statistically significant associations, either positive (brown) or negative (green). In Phyloglm, these are based on coefficient sign; in BayesTraits, they were inferred based on transition rates between binary traits estimated from the model. ‘YES’ indicates that dependent co-evolution was detected with BayesTraits but a uniform direction of association could not be deduced from transition rates. Gray color scale depicts prevalence of species classified as pathogenic and non-pathogenic with traits size below median (LOW) or above median (HIGH). (C) Same three methods repeated for 10 subsets of the data with an equal number of pathogens and non-pathogens from each genome size bin. In Random Forest analysis, rank of the trait, instead of importance is given. Distributions of genomic traits are shown in Figure 2—figure supplement 1. Transition rates (modeled gains and losses of traits) are shown in Figure 2—figure supplement 2. Data underlying figures are in Figure 2—source data 1–4.
-
Figure 2—source data 1
Results of three tests of association of genomic traits with pathogenicity.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig2-data1-v1.xlsx
-
Figure 2—source data 2
Counts of species classified as pathogenic and non-pathogenic with traits of a given size class.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig2-data2-v1.xlsx
-
Figure 2—source data 3
Results of three tests of association of genomic traits with pathogenicity conducted in random balanced subsets.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig2-data3-v1.xlsx
-
Figure 2—source data 4
Counts of gains and losses among four transition types obtained in BayesTraits run.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig2-data4-v1.xlsx
Distributions of genomic traits across lifestyles.
Box plots indicate 25th (first quartile) and 75th percentiles (third quartile) of distributions, and dots indicate mean values. The whiskers indicate the maximum and minimum values within the range measured between the quartile and 1.5 times the interquartile range. IA stands for insect-associated. Sample sizes were 355 for pathogenic and 193 for non-pathogenic species, and 135 for IA and 415 for non-IA species.
Numbers of gains and losses among four transition types obtained in BayesTraits run.
Opaque colors show transitions significantly shifted towards gains or losses. The x axis lists all four possible transitions for the two traits (genomic trait and pathogenicity). Two states (before change, after change) are separated by an underscore. 1 stands for either the value of a trait above median or presence of pathogenicity, 0 stands for the value of a trait below median or absence of pathogenicity. For example, transition 01_11 indicates a gain of a trait (eg. transition from a small to a large genome size, 0 x_1 x), and no change in pathogenicity (x1_x1).
Genomic traits associated with insect-association (IA).
(A) Association of 13 genomic traits with IA estimated with BayesTraits, phylogenetic logistic regression (Phyloglm), and Random Forest classifier. In BayesTraits and Phyloglm, colors correspond to statistically significant associations, either positive (brown) or negative (green). In Phyloglm, these are based on coefficient sign, in BayesTraits they were inferred based on transition rates between binary traits estimated from the model. ‘YES’ indicates that dependent co-evolution was detected with BayesTraits but a uniform direction of association could not be deduced from transition rates. Gray color scale depicts prevalence of species classified as IA and non-IA with traits size below median (LOW) or above median (HIGH). (B) Comparison of exon and intron metrics in 38 one-to-one orthologs between two clades, Microascales (M), Ophiostomatales (O), and Diaporthales (D). In the comparisons, one taxon is IA (M1, O) and another one is non-IA (M2, D). Exon/intron metrics were averaged within each clade and compared using a paired Mann-Whitney U test. (C) Correlation between prevalence of gene families within clades and two exon metrics, in the same four clades. In all clades, more common gene families have longer and more exons (negative binomial generalized linear model, p-values <0.05). The sample sizes per each bin (number of gene families) varied between 8 and 3148 in clade O, 12 and 4158 in clade D, 19 and 3764 in clade M1, 110 and 4740 in clade M2. Box plots in B and C indicate 25th (first quartile) and 75th percentiles (third quartile) of distributions, with median values in between. The whiskers indicate the maximum and minimum values within the range measured between the quartile and 1.5 times the interquartile range. Data underlying figures are in Figure 3—source data 1–4.
-
Figure 3—source data 1
Results of three tests of association of genomic traits with insect-association.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig3-data1-v1.xlsx
-
Figure 3—source data 2
Counts of species classified as insect-associated and non-insect-associated with traits of a given size class.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig3-data2-v1.xlsx
-
Figure 3—source data 3
Exon and intron metrics in 38 one-to-one orthologs.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig3-data3-v1.xlsx
-
Figure 3—source data 4
Prevalence of gene families within clades and two exon metrics, number, and length of exons.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig3-data4-v1.xlsx
Evolution of pathogens with different lifestyles.
(A) Models of insect-associated (IA) trait fitted with phylogenetic logistic regression using each of the five genomic traits and pathogenicity as predictors. Dots show coefficients of genomic traits for pathogenic and non-pathogenic species with 95% credible intervals and non-transparent colors indicating values with credible intervals not overlapping zero. Number of species used in each model were 317 for clade H (216 pathogenic and 101 non-pathogenic species), 33 for clade M (25 pathogenic and 8 non-pathogenic species), and 106 for clade O/Ma/D/S (53 pathogenic and 53 non-pathogenic species). (B) Rates of gene loss and gain in four groups of species with different pathogenicity and IA status, estimated based on 527 small gene families using birth and death model in the program CAFE v5. Estimates of gene evolutionary rates (λ) are shown above the box plots. Sample sizes were 267 non-IA pathogens, 147 non-IA non-pathogens, 88 IA pathogens, and 46 IA non-pathogens. Box plots indicate 25th (first quartile) and 75th percentiles (third quartile) of distributions, with median values in between. The whiskers indicate the maximum and minimum values within the range measured between the quartile and 1.5 times the interquartile range. Data underlying figures are in Figure 4—source data 1 and 2.
-
Figure 4—source data 1
Coefficients of genomic traits for pathogenic and non-pathogenic species with credible intervals estimated with phylogenetic logistic regression.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig4-data1-v1.xlsx
-
Figure 4—source data 2
Rates of gene loss and gain in four groups of species with different pathogenicity and insect-associated (IA) status.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig4-data2-v1.xlsx
Insect-vectored clades lose genes involved in breaking plant host barriers.
(A) Phylogenetic position of the selected clades. Names correspond to the ones in Figure 2A and colors correspond to general lifestyles explained in the legend in plot B. These broad categories indicate clades dominated by plant pathogenic fungi (H1, G, D), entomopathogens (H2.4, H2.6, H2.3), insect-vectored species (M1, O, H2.8, H2.2), saprotrophs (S), plant symbionts (H2.1, H2.2) or mixed lifestyle groups. The basis of triangles indicates span between minimum and maximum branch length for given clades. The height of the triangles was scaled with a factor of 0.2. Empty branches correspond to smaller clades whose members are not shown. (B) Means (dots) and standard deviations (error bars) of number of genes and genome size for selected clades. Sample sizes for each clade are given in panel C in parentheses under the clade identifier. (C) Heatmap shows the fold change of genes/clusters relative to the ancestral state ((observed - ancestral)/ ancestral state). Clades are shown in columns with the number of clade members in parentheses; functional classes are shown in rows. The dots indicate significant gain (brown) or loss (green) of genes/clusters across clade members estimated from 100 rounds of bootstrapping of 10 species in clades with ≥ 10 members. SMC: secondary metabolite clusters, CAZy: carbohydrate-active enzymes. The same heatmap but for pathogenic species only is shown in Figure 5—figure supplement 1. Data underlying figures are in Figure 5—source data 1–4.
-
Figure 5—source data 1
Fold change of genes/clusters relative to the ancestral state.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig5-data1-v1.xlsx
-
Figure 5—source data 2
Mean and high and low confidence intervals of fold change from bootstrapping species within clades.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig5-data2-v1.xlsx
-
Figure 5—source data 3
Fold change of genes/clusters relative to the ancestral state in pathogenic species only.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig5-data3-v1.xlsx
-
Figure 5—source data 4
Mean and high and low confidence intervals of fold change from bootstrapping species within clades in pathogenic species only.
- https://cdn.elifesciences.org/articles/104975/elife-104975-fig5-data4-v1.xlsx
Insect-associated pathogens lose genes involved in breaking host barriers.
Heatmap shows the fold change in genes/clusters number relative to the ancestral state for pathogenic species only. Clades are shown in columns (clade names correspond to the ones in Figure 2A) with the number of clade members in parentheses; functional classes are shown in rows. The dots indicate significant gain (red) or loss (green) of genes/clusters across clade members estimated from 100 rounds of bootstrapping of 10 species in clades with ≥ 10 members. SMC: secondary metabolite clusters, CAZy: carbohydrate-active enzymes.
Distribution of trait values calculated for long-read assembly (orange) and short-read assembly (blue) species.
Vertical lines indicate the position of median values.
Distribution of genomic trait values for short and long-read assemblies separately for pathogenic (1) and non-pathogenic (0) species.
Distribution of genomic trait values for short and long-read assemblies separately for IA (insect-associated) (1) and non-IA (0) species.
Comparison of assembly length and repeat annotations between matching species from our dataset (NCBI) and JGI MycoCosm.
In box plots, categories ‘1’ correspond to pathogens or IA (insect-associated) species, and ‘0’ to non-pathogens or non-IA species.
Comparison of gene annotations between matching species from our dataset (NCBI) and JGI MycoCosm.
In box plots, categories ‘1’ correspond to pathogens or IA (insect-associated) species, and ‘0’ to non-pathogens or non-IA species.
Gene trait values as a function of distance from the five species or genera used in Augustus training.
Distributions and medians of gene traits compared between all species from Hypocreales and Microascales retrieved from JGI MycoCosm, and all species from the same two orders in our dataset annotated with Augustus using ‘fusarium’ for training.
Species were split into three groups with increasing distance from genus Fusarium, namely all Fusarium species in Hypocreales order, all other Hypocreales, and Microascales.
Tables
Proportion of species with different lifestyles among short and long-read assemblies.
| Reads | Non-pathogen | Pathogen | Non-IA | IA |
|---|---|---|---|---|
| long | 0.37 | 0.63 | 0.77 | 0.23 |
| short | 0.35 | 0.65 | 0.75 | 0.25 |
Additional files
-
Supplementary file 1
Supplementary tables.
(A) Information on Sordariomycetes genome assemblies, including assembly identifiers, quality statistics and estimated trait values. (B) Sequencing information for species sequenced in this study. (C) Log marginal likelihoods and Bayes factors from likelihood ratio tests comparing different models of coevolution of pathogenicity with genomic traits. (D) Log marginal likelihoods and bayes factors from likelihood ratio tests comparing different models of coevolution of insect-association with genomic traits.
- https://cdn.elifesciences.org/articles/104975/elife-104975-supp1-v1.xlsx
-
MDAR checklist
- https://cdn.elifesciences.org/articles/104975/elife-104975-mdarchecklist1-v1.docx