Several genomic traits are correlated with genome size in Sordariomycetes.

A. Maximum likelihood tree based on 1000 concatenated protein sequence alignments calculated with IQ-TREE using ultrafast bootstrap approximation (n=563 species). The largest orders are indicated with different colors. Bootstrap support for all major clades except one within Ophiostomatales (82%, black dot) reached 100%. B. Spearman’s rank correlation for phylogenetic independent contrasts of all pairwise combinations of genomic traits. C. Principal component analysis of genomic features. Colors correspond to orders depicted in A. D. Model of genome size. On the y axis are the model coefficients with 95% confidence intervals obtained with phylogenetic generalized least squares (pgls). Principal components correspond to the three main principal components based on 11 genomic traits. E. Testing hypotheses of genome size evolution. First four plots show correlation of dN/dS as a proxy of Ne with the non-coding elements of the genome. The last plot is a correlation of gene loss rate with genome size. Correlations were tested using Spearman’s rank correlation on phylogenetic independent contrasts. Loadings of each principal component from figure C are shown in Supplementary Figure 1. Raw data underlying figures are in Figure 1-Source Data 1-3.

Genomic traits associated with pathogenicity.

A. Time-scaled phylogeny of Sordariomycetes colored by the reconstructed genome size displayed in log scale. The outside heatmap indicates species which are pathogenic and insect-associated (IA). Letters and numbers correspond to clades used in subsequent analyses, with representative genera from each clade listed in the legend. B. Association of 13 genomic traits with pathogenicity estimated with BayesTraits, phylogenetic logistic regression (Phyloglm), and random forest classifier. In BayesTraits and Phyloglm, colors correspond to statistically significant associations, either positive (brown) or negative (green). In Phyloglm these are based on coefficient sign, in BayesTraits they were inferred based on transition rates between binary traits estimated from the model. “YES” indicates that dependent co-evolution was detected with BayesTraits but a uniform direction of association could not be deduced from transition rates. Grey color scale depicts prevalence of species classified as pathogenic and non-pathogenic with traits size below median (LOW) or above median (HIGH). C. Same three methods repeated for 10 subsets of the data with an equal number of pathogens and non-pathogens from each genome size bin. In random forest analysis, rank of the feature, instead of importance is given. Transition rates (modeled gains and losses of traits) are shown in Supplementary Figure 2. Raw data underlying figures are in Figure2-Source Data 1-2.

Genomic traits associated with insect association (IA).

A. Association of 13 genomic traits with IA estimated with BayesTraits, phylogenetic logistic regression (Phyloglm), and random forest classifier. In BayesTraits and Phyloglm, colors correspond to statistically significant associations, either positive (brown) or negative (green). In Phyloglm these are based on coefficient sign, in BayesTraits they were inferred based on transition rates between binary traits estimated from the model. “YES” indicates that dependent co-evolution was detected with BayesTraits but a uniform direction of association could not be deduced from transition rates. Grey color scale depicts prevalence of species classified as IA and non-IA with traits size below median (LOW) or above median (HIGH). B. Comparison of exon and intron metrics in 38 one-to-one orthologs between two clades, Microascales (M), Ophiostomatales (O) and Diaportales (D). In the comparisons, one taxon is IA (M1, O) and another one non-IA (M2, D). Exon/intron metrics were averaged within each clade and compared using using paired Mann Whitney U test. C. Correlation between prevalence of gene families within clades and two exon metrics, in same four clades. In all clades, more common gene families have longer and more exons (negative binomial generalized linear model, P-values < 0.05). Raw data underlying figures are in Figure3-Source Data 1-3.

Evolution of pathogens with different lifestyles.

A. Models of IA trait using phylogenetic logistic regression for each of the five genomic traits and pathogenicity as predictors. Coefficients of genomic traits for pathogenic and non-pathogenic species with 95% credible intervals are shown. B. Rates of gene loss and gain in four groups of species with different pathogenicity and IA status, estimated based on 527 small gene families using birth and death model in the program CAFE v5. Estimates of gene evolutionary rates (λ) are shown above the boxplots. Raw data underlying figures are in Figure4-Source Data 1-2.

Insect-vectored clades lose genes involved in breaking plant host barriers.

Heatmap shows the fold change of genes/clusters relative to the ancestral state ((observed - ancestral)/ ancestral state). Clades are shown in columns (clade names correspond to the ones in Figure 2A) with the number of clade members in parentheses; functional classes are shown in rows. Clades comprise plant pathogenic fungi (H1, G, D), entomophathogens (H2.4, H2.6, H2.3), insect symbionts (H2.2), insect-vectored species (M1, O, H2.8), saprotrophs (S, H2.1), or mixed lifestyle groups. The dots indicate significant gain (brown) or loss (green) of genes/clusters across clade members estimated from 100 rounds of bootstrapping of 10 species in clades with >= 10 members. SMC: secondary metabolite clusters, CAZy: carbohydrate-active enzymes. Same heatmap but for pathogenic species only is shown in Supplementary Figure 3. Raw data underlying figures are in Figure5-Source Data 1-4.

Loadings of the main PCs.

Principal component analysis of genomic features. Histogram shows proportion of variance explained by each PC. Dark gray PCs together explain 66% of the variance. Heatmap shows loadings for each PC, with brown colors depicting positive, and green colors depicting negative contributions to the PC.

Numbers of gains and losses among four transition types obtained in BayesTraits run.

Opaque colors show transitions significantly shifted towards gains or losses. The x axis lists all four possible transitions for the two traits (genomic trait and pathogenicity). Two states (before change, after change) are separated by an underscore. 1 stands for either a value of a trait above median or presence of pathogenicity, 0 stands for a value of a trait below median or absence of pathogenicity. For example transition 01_11, indicates a gain of a trait (eg. transition from a small to a large genome size, 0x_1x), and no change in pathogenicity (x1_x1).

Insect-associated pathogens lose genes involved in breaking host barriers.

Heatmap shows the fold change in genes/clusters number relative to the ancestral state for pathogenic species only. Clades are shown in columns (clade names correspond to the ones in Figure 2A) with the number of clade members in parentheses; functional classes are shown in rows. The dots indicate significant gain (red) or loss (green) of genes/clusters across clade members estimated from 100 rounds of bootstrapping of 10 species in clades with >= 10 members. SMC: secondary metabolite clusters, CAZy: carbohydrate-active enzymes.