1. Genomics and Evolutionary Biology
Download icon

Codon optimization underpins generalist parasitism in fungi

  1. Thomas Badet
  2. Remi Peyraud
  3. Malick Mbengue
  4. Olivier Navaud
  5. Mark Derbyshire
  6. Richard P Oliver
  7. Adelin Barbacci
  8. Sylvain Raffaele Is a corresponding author
  1. LIPM, Université de Toulouse, INRA, CNRS, France
  2. Curtin University, Australia
Short Report
Cited
0
Views
1,643
Comments
0
Cite as: eLife 2017;6:e22472 doi: 10.7554/eLife.22472

Abstract

The range of hosts that parasites can infect is a key determinant of the emergence and spread of disease. Yet, the impact of host range variation on the evolution of parasite genomes remains unknown. Here, we show that codon optimization underlies genome adaptation in broad host range parasites. We found that the longer proteins encoded by broad host range fungi likely increase natural selection on codon optimization in these species. Accordingly, codon optimization correlates with host range across the fungal kingdom. At the species level, biased patterns of synonymous substitutions underpin increased codon optimization in a generalist but not a specialist fungal pathogen. Virulence genes were consistently enriched in highly codon-optimized genes of generalist but not specialist species. We conclude that codon optimization is related to the capacity of parasites to colonize multiple hosts. Our results link genome evolution and translational regulation to the long-term persistence of generalist parasitism.

https://doi.org/10.7554/eLife.22472.001

Introduction

The host range of a parasite has a central influence on the emergence and spread of disease (Woolhouse and Gowtage-Sequeria, 2005). There is a clear demarcation between specialist parasites that can only infect one or a few closely related host species and generalists that can infect more than a hundred unrelated host species (Woolhouse et al., 2001; Barrett et al., 2009). The host range is a trait constrained by ecological and physiological factors and determined by the evolutionary history of a parasite and its potential hosts (Poulin and Keeney, 2008). Host specialization is a frequent pattern in living systems that was proposed to result from tradeoffs in performance on multiple hosts (Joshi and Thompson, 1995). At the molecular level, optimization by natural selection of parasite proteins functioning in the interaction with a given host may be detrimental to their function in another species (Dodds and Rathjen, 2010; Dong et al., 2014). Nevertheless, generalists able to thrive on a broad range of hosts are found in most parasite lineages. The molecular mechanisms associated with the evolution of generalist parasitism remain, however, elusive. Notably, the extent to which interaction with multiple hosts affects the evolution of parasite genomes remains unknown.

Genomic analyses of obligate specialist parasites revealed massive losses of enzymes unnecessary for growth and reproduction on living hosts (Schirawski et al., 2010; Spanu et al., 2010). Instead, such specialists rely largely on a sets of small secreted protein (SSPs) subverting host cell function. For instance, the genome of the powdery mildew pathogen Blumeria graminis f. sp. hordei harbors 180 SSPs of about 100–150 amino acids (Pedersen et al., 2012). SSPs of specialist fungi often show evidence of diversifying-selection leading to extensive changes in their protein sequence (Stukenbrock et al., 2011; Pedersen et al., 2012; Hacquard et al., 2013). By contrast, the secretion of a battery of degrading enzymes is crucial for the parasitic success of generalists (Hu et al., 2014; Kubicek et al., 2014). The genomes of generalist fungi notably contain in average ~3 times more carbohydrate activity enzymes (CAZYmes) than specialist fungi (Zhao et al., 2013). For instance, Rhizoctonia solani AG2-2IIIB contains over 1000 CAZYme genes (Wibberg et al., 2016). Protein translation is the largest consumer of energy during cellular proliferation, and because secreted proteins are not recycled like other cellular proteins, strong constraints exist to reduce the synthetic cost of secreted proteins in microbes (Smith and Chapman, 2010). Considering the distinct properties of secreted proteins in specialist and generalist parasites, the impact of protein translation efficiency on the cost of secreted protein synthesis likely differ in these organisms.

The differential use of synonymous codons in a genome, or codon usage bias, affects gene expression level, protein folding, translation efficiency and accuracy (Plotkin and Kudla, 2011). In particular, the co-evolution of codons with the genomic tRNA complement, leading to codon optimization, results from the combination of neutral processes and selection for the optimization of protein translation (Hershberg and Petrov, 2008; Shah and Gilchrist, 2011). Natural selection for the optimization of protein translation, through ribosome pausing time, translation error rates and co-translational protein folding, has been widely associated with codon usage biases (Drummond and Wilke, 2008; Tuller et al., 2010; Shah and Gilchrist, 2011). Because the probability of protein translation error increases with protein length, selection for translation accuracy can be higher in genes encoding long proteins, as was observed in E. coli (Eyre-Walker, 1996). Codon usage bias also increases with the evolutionary age of genes, the frequency of codons optimal for translation being significantly higher at codons for conserved amino acids (Marais and Duret, 2001; Prat et al., 2009). Considering that highly expressed secreted proteins are often longer and more conserved in generalist than in specialist parasites, we expect natural selection on the optimization of codons to be stronger in generalist parasites than in host specialists.

The cost of protein translation leads to a tradeoff between protein production and cell growth rate (Scott et al., 2010; Shah et al., 2013; Kafri et al., 2016). Codon optimization therefore impacts cell growth in selective environments (Botzman and Margalit, 2011; Krisko et al., 2014), and could positively impact on the performances of parasites regardless of the host they encounter. This hypothesis would be consistent with generalist lineages performing better on average on their set of available hosts than co-occurring specialist competitors to persist in the biota, as predicted by the 'jack of all trades, master of none' theoretical model (Futuyma and Moreno, 1988). Here, we have examined the strength of natural selection on the optimization of codons in parasites with contrasted host range across the fungal kingdom. Because the genomes of broad host range parasites generally encode longer proteins, an in silico model of the cell translation machinery predicts that natural selection on codon optimization should be stronger in these species. Consistent with this prediction, we found that codon optimization at the genome scale correlates with fungal parasites host range. High codon optimization and broad host ranges co-evolved multiple times independently through fungal phylogeny. To document the molecular bases of codon optimization at the species level in generalist fungi, we compared single-nucleotide polymorphism (SNP) patterns in natural populations of a generalist and a specialist fungal parasite. We detected signatures of purifying natural selection acting on optimal codons in the generalist parasite only. Finally, we show that genes associated with virulence are better codon-optimized in generalist than in specialist parasite genomes supporting codon optimization as an adaptation to the colonization of multiple hosts. Together, our results reveal patterns of adaptive genome evolution associated with generalism that are conserved through the fungal Kingdom. Furthermore, we establish a link between translational regulation and host range variation through genomic patterns of codon optimization, contributing to our understanding of the dynamics of disease epidemics.

Results

Long proteins encoded by the genome of generalist fungi likely increase natural selection on codon optimization

The efficiency and accuracy of protein translation constrains cell growth rate and depends on sequence properties of expressed proteins (Ingolia et al., 2009; Kafri et al., 2016). To evaluate the impact of translation efficiency on cell growth during host colonization, we calculated theoretical maximal growth rates for cells of generalist and specialist fungal parasites. For this, we designed an in silico model of the cell translation machinery describing the accumulation over time of ribosomal proteins, other intracellular proteins and secreted proteins from which the maximal cell growth rate can be deduced (see ‘Materials and methods’, Figure 1—source data 1). To calculate the typical protein synthesis rate, the model uses the median length of ribosomal, intracellular and secreted proteins as input. We determined these values in the complete predicted proteomes of 13 host-specialist fungi (less than four host genera) and 15 generalist fungi (over 10 host genera) all belonging to different genera and covering most major fungal clades (Figure 1A, Figure 1—source data 1). In average, secreted proteins were ~14.8% longer in generalists than in specialists (365 and 318 codons respectively; Welch’s t-test p<10−08). Intracellular proteins were also longer in average in generalists than in specialists (~10.1% longer with 381 and 346 codons respectively; Welch’s t-test p<10−08), whereas ribosomal proteins were slightly shorter in average in generalists than in specialists (~1.2% shorter with 189 and 192 codons respectively; Welch’s t-test p<10−04). In yeast, codon decoding rate varies from 10 to 28 codons per second, with an average of 20 codons per second (Gardin et al., 2014). We tested the impact of varying the average codon decoding rate from 10 to 28 codons per second on the maximal cell growth rate predicted by our model for generalist and specialist fungal cells (Figure 1B). Cells of specialists achieved higher maximal growth rates than cells of generalists at a given codon decoding rate. Reciprocally, cells of generalists require higher codon decoding rate to reach maximal growth rates similar to cells of specialists. For instance, in order to achieve a growth rate of 4000 cells per 24 hr, the codon decoding rate should be 21.7 codons per second in generalists but only 19.9 codons per second in specialists.

Contrasted length distribution in proteomes is expected to increase selection on codon optimization in generalist fungi.

(A) Distribution of length (number of codons) in the ribosomal, intracellular and predicted secreted proteomes of 13 specialist fungal parasites and 15 generalist fungal parasites. (B) Relationship between codon decoding rate (number of codons translated per second) and cell growth rate of typical specialist and generalist fungi predicted by a cellular model of protein translation. Dotted lines highlight the higher codon decoding rate required in generalist fungi compared to specialist fungi to achieve a growth rate of 4000 cells produced per day. (C) Distribution of length in the proteomes (left) and relationship between codon decoding rate and cell growth rate (right) in related specialist and generalist fungi from the Basidiomycetes, Ascomycetes and Basal Fungi. The width of boxplots is proportional to the number of values. Spe., specialist (green); Gen., generalist (brown). Welch’s t-test p: *<10−01, **<10−04, ***<10−08.

https://doi.org/10.7554/eLife.22472.002

To test the robustness of these observations across the fungal phylogeny, we compared related specialist and generalist species from three major fungal groups: Basidiomycetes, Ascomycetes and Basal Fungi (Figure 1C). In all three groups, secreted proteins were longer in average in generalist species than in specialist species. For instance, secreted proteins of the generalist Botrytis cinerea were in average ~73% longer (394 amino acids) than secreted proteins of the specialist Blumeria graminis f. sp. tritici (228 amino acids; Welch’s t-test p<10−08). Length differences in ribosomal and other intracellular proteins were not consistent through all three clades. Overall, our cell translation machinery model showed in all three cases that generalist species require higher codon decoding rates to achieve similar growth rates as their specialist relatives.

The in silico cell model used here suggests that longer proteins, especially secreted proteins, encoded in the genome of generalist fungi limits maximal cell growth rates compared to specialist fungi. It also shows that codon optimization, leading to higher codon decoding rates, can support the secretion of more complex proteins with limited growth penalty. Although additional factors may contribute, we show that longer protein pools are sufficient to increase the constraints on codon optimization in generalist fungi. We therefore expect natural selection on the optimization of codons to be stronger in generalist parasites than in host specialists.

Codon optimization correlates with fungal parasites host range

To test this hypothesis, we analyzed patterns of codon optimization in the genomes of 36 fungal parasite species, including the 15 generalist parasites and 13 specialists analyzed previously, as well as 8 species with intermediate host range (between 4 and 10 host genera) (Table 1). For this, we first used a model that infers the degree ‘S’ of coadaptation between codon usage and the genomic tRNA complement at the whole genome scale (dos Reis et al., 2004). S values ranged from −0.032 (Puccinia graminis) to 0.843 (Cryptococcus neoformans), with an average of 0.374. We performed a correlation analysis and found that codon optimization at the whole genome scale increased with host range (Spearman rho = 0.82, p=7.1 10−10, Figure 2A). Generalist species had an average S of 0.58, whereas the average S was 0.23 for specialists. We verified the significance of the correlation between host range and genome scale codon optimization using Felsenstein’s phylogenetic independent contrasts, and obtained Spearman rho of 0.6 (p=1.5 10−05).

Codon optimization correlates with host range in fungal parasites.

Genome-scale codon optimization correlates with host range in 36 parasites across the kingdom Fungi. Species considered as specialists (less than four host genera) are shown in green, species considered as generalists (over 10 host genera) are shown in brown. Error bars show 95% confidence interval, dotted line shows logarithmic regression of the data. Codon optimization calculated based on knowledge on the tRNA pool (A), on codon usage in ribosomal protein genes (B) and on self-consistent relative codon adaptation (C) correlated with host range at Spearman ρ≥0.59 (p≤2.7 10−05) under phylogenetic independent contrasts. (D) Codon optimization is stronger in core orthologs from generalist fungal parasites than in core orthologs from specialist fungal parasites. Left: Distribution of tRNA adaptation indices in 1620 core ortholog genes show significantly higher values in generalist fungi (**, Welch’s t-test p<0.01). Right: Codon optimization calculated as the degree of coadaptation of core ortholog genes to the genomic tRNA pool is significantly higher in generalist fungi (***, Welch’s t-test p<10−04).

https://doi.org/10.7554/eLife.22472.004
Table 1

List of fungal species analyzed in this work, their host range and genome-scale codon optimization values. Genome-scale codon optimization was calculated using tRNA adaptation indiced (S), codon adaptation indices (SCAI) or self-consistent normalized relative codon adaptation indices (SscnRCA).

https://doi.org/10.7554/eLife.22472.007

Species

Host range

Class*

S

SCAI

SscnRCA

Cryptococcus neoformans

800

Gen.

0.843

0.870

0.860

Rhizoctonia solani

690

Gen.

0.432

0.624

0.583

Botrytis cinerea

556

Gen.

0.597

0.675

0.653

sclerotinia sclerotiorum

332

Gen.

0.524

0.583

0.553

Beauveria bassiana

269

Gen.

0.616

0.619

0.747

Metarhizium acridum

228

Gen.

0.647

0.595

0.712

Aspergillus fumigatus

175

Gen.

0.609

0.643

0.684

Batrachochytrium dendrobatidis

153

Gen.

0.545

0.521

0.602

Verticilium dahliae

78

Gen.

0.537

0.426

0.627

Fusarium graminearum

72

Gen.

0.788

0.818

0.819

Colletotrichum graminicola

59

Gen.

0.572

0.444

0.644

Rhizopus oryzae

28

Gen.

0.625

0.563

0.691

Penicillium digitatum

17

Gen.

0.545

0.602

0.630

Alternaria brassicicola

16

Gen.

0.400

0.592

0.628

Pyrenophora tritici-repentis

11

Gen.

0.419

0.569

0.570

Pseudogymnoascus destructans

8

0.484

0.513

0.505

Encephalitozoon intestinalis

7

0.313

0.558

0.528

Melampsora larici-populina

7

0.194

0.142

0.055

Stagonospora nodorum

7

0.155

0.431

0.462

Colletotrichum higginsianum

6

0.393

0.295

0.480

Sporisorium reilianum

5

0.476

0.200

0.429

Taphrina deformans

4

0.403

0.679

0.671

Magnaporthe oryzae

4

0.437

0.424

0.575

Puccinia graminis

2

Spe.

−0.032

0.344

0.249

Wolfiporia cocos

2

Spe.

0.217

0.266

0.316

Moniliophthora roreri

2

Spe.

0.406

0.530

0.551

Passalora fulva

2

Spe.

−0.011

0.513

0.514

Rozella allomycis

1

Spe.

0.236

0.096

0.259

Nosema ceranae

1

Spe.

0.021

−0.155

−0.085

Puccinia triticina

1

Spe.

0.204

0.501

0.472

Dothistroma septosporum

1

Spe.

0.211

0.447

0.354

Pseudocercospora fijiensis

1

Spe.

0.227

0.495

0.490

Zymoseptoria tritici

1

Spe.

−0.019

0.355

0.248

Blumeria graminis

1

Spe.

0.116

−0.187

0.164

Erysiphe necator

1

Spe.

0.174

−0.092

0.117

Ophiocordyceps unilateralis

1

Spe.

0.166

0.268

0.458

Gonapodya prolifera

0

np

0.115

0.311

0.363

Rhodotorula toruloides

0

np

0.344

0.385

0.470

Serpula lacrymans

0

np

−0.001

0.379

0.395

Laccaria bicolor

0

np

−0.026

0.237

0.155

Agaricus bisporus

0

np

0.361

0.432

0.400

Tuber melanosporum

0

np

0.230

0.319

0.349

Oidiodendron maius

0

np

0.147

0.442

0.388

Myceliophthora thermophila

0

np

0.246

0.123

0.323

Chaetomium globosum

0

np

0.148

0.204

0.342

  1. *The class column indicates whether species were considered as generalist (Gen.), specialist (Spe.) or non-parasitic (np).

Comparing genes codon usage with codon usage for a reference set of highly expressed genes is a classical approach for estimating codon usage bias independently of knowledge of the tRNA repertoire. We determined the codon adaptation index (CAI) (Sharp and Li, 1987) using ribosomal protein genes as a reference set. To allow comparisons of codon optimization across species, we calculated for each species the degree ‘SCAI’ of correlation between codon usage and CAI at the whole genome scale. SCAI ranged from −0.187 (Blumeria graminis f. sp. tritici) to 0.87 (Cryptococcus neoformans), with an average of 0.438 (Figure 2B). SCAI increased with host range (Spearman rho = 0.61, p=9.0 10−06 under phylogenetic independent contrasts) and generalist species had an average SCAI of 0.61, whereas the average SCAI was 0.26 for specialists. O’Neill et al. recently developed the self-consistent normalized Relative Codon Adaptation index (scnRCA) as a measure of codon usage bias that does not rely on a reference gene set and corrects for mutational biases (O'Neill et al., 2013). We calculated for each species the degree ‘SscnRCA’ of correlation between codon usage and scnRCA at the whole genome scale. SscnRCA ranged from −0.085 (Nosema ceranae) to 0.853 (Cryptococcus neoformans), with an average of 0.495 (Figure 2C). SscnRCA increased with host range (Spearman rho = 0.59, p=2.7 10−05 under phylogenetic independent contrasts) and generalist species had an average SscnRCA of 0.67, whereas the average SscnRCA was 0.32 for specialists.

To control for the confounding effects of heterogeneity in the completeness of genome assemblies, we first verified that codon optimization was related to the number of hosts but not genome assembly parameters at ANOVA p<0.01 (Figure 2—source data 1). Comparing codon usage bias for gene sets conserved across species is another established method to compare codon optimization across species independent of genome completeness. We identified a set of 1620 core ortholog genes (COGs) conserved across all the 36 fungal parasite species (Figure 2—source data 2). The tRNA adaptation indices (tAIs) were significantly higher in the orthologs of generalist genomes (median value is 0.361) than in orthologs of specialist genomes (median value is 0.337; Welch’s t-test p<0.01) (Figure 2D). We calculated S for COGs in each genome and found that S is significantly higher for COGs in generalist genomes (median is 0.774) than in specialist genomes (median is 0.167; Welch’s t-test p=9.9 10−06) (Figure 2D). These different analyses converge toward the conclusion that codon optimization at the whole genome scale correlates with fungal parasites host range.

Codon optimization and host range co-evolved multiple times across fungal phylogeny

To document the evolution of codon optimization in fungi, we reconstructed the ancestral state of codon optimization in fungal phylogeny. To this end, we built a phylogeny of our 36 fungal parasites and 9 non-parasitic fungi distributed among all major clades covered in our study (Figure 3A, Figure 3—source data 1 and 2). We determined the degree of codon optimization in non-parasitic fungal genomes and found that codon optimization was generally lower in non-parasitic fungi than in their generalist relatives (average values for non parasitic fungi were S = 0.157, SCAI = 0.315 and SscnRCA = 0.354). This does not exclude that some lineages of non-parasitic fungi could have evolved high codon optimization, but supports the view that codon optimization increased after the divergence of generalist parasites. To support this conclusion, we inferred the ancestral degree of codon optimization at internal nodes of fungal phylogeny using ML ancestral state estimation (Revell, 2012) (Figure 3A). The fungal parasites species used in this work represented four independent evolutionary paths to extreme generalism (over 100 host genera) with non-parasitic or specialists in the same clade, in the Chytridiomycota, the Agaricomycotina, the Leotiomycetes and the Sordariomycetes (Figure 3A). In all four clades, generalism was associated with an increase in genomic codon adaptation compared to related non-parasitic or specialist species and to the reconstructed ancestral state. For instance, in the Leotiomycetes, the generalist parasites Botrytis cinerea and Sclerotinia sclerotiorum (>300 host genera) both had S > 0.5, whereas the non-parasitic Oidiodendron maius and the host specialists Erysiphe necator and Blumeria graminis f. sp. tritici had S < 0.2. The reconstructed ancestral state for this clade was S = 0.37.

Codon optimization and host range co-evolved multiple times across fungal phylogeny.

(A) Phylogeny, genome-scale codon optimization and host range in 36 parasites across the kingdom Fungi. Nine non-pathogenic species belonging to the major branches of Fungi are shown for comparison. The phylogenetic tree was generated using the TimeTree database (Hedges et al., 2015) and PATHd8 (Britton et al., 2007). Codon optimization shown as the size of terminal nodes corresponds to the degree S of coadaptation of all genes to the genomic tRNA pool (dos Reis et al., 2004). Terminal nodes are sized according to genome-scale codon optimization and colored according to host range (grey was used for non pathogen species). Internal nodes are sized according to reconstructed ancestral genome-scale codon optimization, with ancestral S value indicated as a blue label. (B) Correlogram of genome-scale codon optimization and phylogenetic distance along the tree shown in A. Dotted lines delimit 95% confidence interval.

https://doi.org/10.7554/eLife.22472.008

To test for the robustness of the association between host range and codon optimization across fungal phylogeny, we tested whether it could be detected at the sub-Kingdom level. The correlation coefficient between genome-scale codon optimization and host range were between 0.64 and 0.84 for S, SCAI and SscnRCA at the Phylum level in Basidiomycetes and Ascomycetes, and across phyla in non-Dikarya fungi. To unambiguously exclude the impact of phylogeny on the correlation between host range and codon optimization, we calculated Blomberg’s K and Pagel’s λ as two quantitative measures of phylogenetic signal in genome scale codon optimization along the fungal tree (Pagel, 1999; Blomberg et al., 2003). We obtained Blomberg’s K = 0.28 and Pagel’s λ = 0.62. Both measures indicated that evolution of codon optimization is not correlated with the phylogeny along the tree at p<0.01. Then, we used phylosignal (Keck et al., 2016) to build a correlogram between phylogenetic distance and codon optimization (Figure 3B). Correlations were insignificant (in the range −0.05 to 0.1) along the whole tree. These results support the view that codon optimization is associated with host range in fungal parasites.

Biased SNP patterns underpin with codon optimization in the generalist parasite Sclerotinia sclerotiorum

To get insights into the molecular bases of codon optimization in generalists, we searched for patterns of genome evolution associated with codon optimization at the species level in natural populations of generalist parasites. To this end, we generated genome-wide SNPs data by sequencing five field isolates of the plant pathogenic fungus Sclerotinia sclerotiorum (24,344 coding SNPs). S. sclerotiorum is the causal agent of Sclerotinia rot disease, is notorious for its broad host range encompassing several hundreds of plant genera, and shows strong signatures of codon optimization (S = 0.52). As expected under strong codon optimization, tRNA concentrations determined by small RNA sequencing strongly correlated with the number of tRNA genomic copies, tRNA adaptive values (calculated according to [dos Reis et al., 2004]) and codon usage in S. sclerotiorum (Figure 4—figure supplement 1). We used 131,138 genome-wide coding SNPs of nine isolates of the host-specific plant pathogenic fungus Zymoseptoria tritici (Croll et al., 2013) as a reference species with weak codon optimization (S = −0.02). Overall, the SNP frequency was higher in the Z. tritici population (1.196 SNP.Kb−1.isolate−1) compared to S. sclerotiorum (0.416 SNP.kb−1.isolate−1). To search for signatures of codon optimization in these populations, we determined the frequency of variants for each codon type. We compared frequency of codon variants to the number of cognate tRNA genomic copies, allowing for wobble pairing (Crick, 1966). Frequency of codon variants negatively correlated with the number of tRNA genomic copies in S. sclerotiorum (Pearson = −0.60, p = 4.6 10−07) but not in Z. tritici (Pearson = 0.06, p = 0.62) (Figure 4A, Figure 4—source data 1).

Figure 4 with 2 supplements see all
Biased synonymous substitution patterns underpin codon optimization in local populations of a generalist but not a specialist fungal parasite.

(A) Genome-wide frequencies of variant codons in local populations of the host generalist Sclerotinia sclerotiorum and the host specialist Zymoseptoria tritici, according to the number of genomic copies of cognate tRNAs. The number of cognate tRNAs for each codon type was determined using wobble rules for codon-anticodon pairing. Dotted lines show linear regression of the data (Z. tritici: Pearson ρ = 0.06; p=0.62; S. sclerotiorum ρ = −0.60; p=4.6 10−07). (B) Adjusted variant frequencies for intergenic nucleotide triplets, optimal and non-optimal codons. Synonymous and non-synonymous SNPs are shown separately. Differences between optimal and non-optimal codon rates were assessed by Welch’s t-test (***p<0.001). (C) Predicted evolution of genome-wide content in optimal codons in S. sclerotiorum and Z. tritici based on observed and random mutation patterns.

https://doi.org/10.7554/eLife.22472.011

To further document the molecular bases of codon optimization at the species level, we compared SNP patterns in optimal, intermediate, and non-optimal codons for each amino acid in S. sclerotiorum and Z. tritici populations. In the Z. tritici population, SNP frequencies were not significantly different between optimal and non-optimal codons. By contrast, synonymous SNP frequencies were on average ~1.7-fold lower in optimal codons than in non-optimal codons in S. sclerotiorum (Welch’s t-test p=1.3 10−03) (Figure 4B). These findings identify biased synonymous substitutions as a link between generalism and codon optimization. This analysis is independent of codon usage indices and shows that selection for average performances on multiple hosts is reflected in global trends of genome evolution.

Natural selection drives codon optimization in generalist fungal parasites

Codon bias in a genome can be selectively neutral due to non-randomness in the mutation process. For instance, the identity of favored codons tracks the GC content of the genomes (Chen et al., 2004; Hershberg and Petrov, 2009). Although S. sclerotiorum genome is AT-rich (41.8% GC), synonymous SNPs showed no bias toward AT conversion, suggesting that synonymous SNPs deviated from neutral patterns (Figure 4—figure supplement 2). Furthermore, adjusted variant frequencies in S. sclerotiorum were similar in intergenic nucleotide triplets and in non-optimal codons but significantly lower in optimal codons (~1.53-fold lower, Welch’s t-test p= 1.0 10−11) (Figure 4B), suggesting that synonymous SNPs in optimal codons may be counter-selected in this species, providing evidence of purifying selection acting on optimal codons. To unambiguously demonstrate that observed SNP patterns lead to codon optimization in S. sclerotiorum and are unlikely to result from neutral processes, we simulated the evolution of optimal codon frequencies in S. sclerotiorum and Z. tritici genomes over 1000 generations (Figure 4C). In these simulations, we used either random SNP patterns, or the SNP patterns determined experimentally (Figure 4—source data 2). This demonstrated that the codon SNP patterns determined experimentally converge toward increased codon optimality in S. sclerotiorum genome, whereas SNP patterns observed in Z. tritici genome and random mutation patterns do not. Thus, patterns of evolution toward increased codon optimality were detected in S. sclerotiorum but not in Z. tritici populations, and deviate significantly from neutral evolution.

To estimate the likelihood that observed codon biases result from selective rather than neutral processes in other fungal genomes, we used permutation tests on the number of genomic tRNA copies to calculate probability distribution of S under no selection (dos Reis et al., 2004). A total of 14 species showed signatures of selection for codon optimization with a p<0.1, among which one was a specialist (M. roreri) and one had eight host genera (P. destructans). Twelve out of 15 (80%) generalist species showed selection for codon optimization with a p<0.1, confirming that adaptive translation is stronger in generalist species. We conclude that natural selection on protein translation efficiency or accuracy drives evolution toward codon optimization in generalist parasites through biased synonymous SNPs. These evolutionary patterns are consistent with an improvement of average performances on multiple hosts for generalist parasites.

Codon optimization signatures associate with host colonization

To further support the link between codon optimization and the ability to colonize multiple hosts, we analyzed signatures of codon optimization in host-induced genes of 15 fungal parasite species from multiple families of the Ascomycetes and Basidiomycetes (Figure 5—source data 1) (Amselem et al., 2011; Duplessis et al., 2011a, 2011b; Gao et al., 2011; Morton et al., 2011; O'Connell et al., 2012; Xiao et al., 2012; Zhang et al., 2012; Hacquard et al., 2013; Bailey et al., 2014; Chen et al., 2014; Kellner et al., 2014; de Bekker et al., 2015; Bradshaw et al., 2016). We found that host-induced genes were enriched ~3.1-fold among the top 10% codon-adapted genes in generalist parasites but not in specialists. Overall, genes induced during host colonization distributed predominantly among genes with high codon adaptation indices in generalist but not in specialist parasites (Figure 5A, Figure 5—source data 2). Parasites use secreted proteins to facilitate infection of their hosts. We therefore analyzed codon optimization in secretome genes of 45 fungal species (Figure 5—source data 3). We found that secretome genes were enriched over 1.6-fold among the top 10% codon-adapted genes in generalist parasites but only ~1.2-fold in specialists and non-parasitic fungi. Overall, predicted secretome genes distributed predominantly among genes with high codon adaptation indices in generalist parasites but not in specialist and non parasitic fungi (Figure 5B). We calculated S in each genome for secreted and non-secreted protein genes (Figure 5C, Figure 5—source data 4). In our genome set, non-parasitic fungi had similar S for secreted and non-secreted proteins. The average S for secreted proteins (0.212) was slightly higher than for non-secreted proteins (0.137) in specialists (Student’s paired t-test p=0.054). In generalists, the average S for secreted proteins (0.661) was significantly higher than for non secreted proteins (0.558, Student’s paired t-test p=1.12 10−03). Increased codon optimization in generalists was not only clear for genes encoding secreted proteins and host-induced genes, but also across their entire genome (‘other genes’ in Figure 5C). To get a global overview of cellular functions enriched in codon optimized genes in generalists, we analyzed normalized tRNA adaptation indices for genes annotated with Gene Ontologies (GO) in generalist and specialist genomes (Figure 5D, Figure 5—source data 5). Together with secreted enzymes, GOs related to translation and central metabolism showed higher codon optimization in generalists than in specialists. Conversely, GOs related to transcription, transposable elements and phosphorelay signal transduction were better optimized in specialists than in generalists. These results suggest that high codon optimization is an adaptation to host colonization and particularly to the colonization of multiple hosts.

Codon optimization strongly associates with host colonization in generalist fungal parasites.

(A) Genes induced during host infection are enriched among high tRNA adaptation index genes in generalist but not specialist parasite genomes. Error bars show standard error of the mean. (B) Genes encoding predicted secreted proteins are more strongly enriched among high tRNA adaptation index genes in generalist parasite genomes compared to specialist and non-parasitic fungi genomes. Error bars show standard error of the mean. (C) Degree of coadaptation of secreted protein genes and other genes to the genomic tRNA pool in non parasitic, specialist and generalist fungi (*** Student’s paired t-test p<0.002). (D) Distribution of tRNA adaptation indices according to gene functions in generalist and specialist fungal parasites. For each Gene Ontology (GO), the average normalized tRNA adaptation indices in all genes from generalist and specialist genomes are shown. Bubbles are sized according to the total number of genes, and colored according to the percentage of predicted secreted proteins when values for generalists and specialists differ by over 10%. Selected GOs are labeled.

https://doi.org/10.7554/eLife.22472.017

Discussion

Our study reveals that codon optimization through biased synonymous substitutions is a common feature in the evolution of generalist parasites and is associated with the colonization of multiple hosts. Because host colonization by generalist often requires diverse and relatively complex secreted proteins, these organisms are expected to require higher protein translation efficiency to compete with co-occuring specialist microbes. In agreement with this model, we found that natural selection for codon optimization correlates with host range in fungal parasites. We describe patterns of purifying selection acting on optimal codons in generalist parasites, providing a molecular basis for genome scale adaptation to the colonization of multiple hosts.

The increasing amount of genome sequences available for specialist fungal parasites revealed massive losses of enzymes associated with the ability to thrive and reproduce on living hosts (Schirawski et al., 2010; Spanu et al., 2010). Specialists rely largely on SSPs subverting host cell functions. Functional analyses have shown that SSPs from filamentous pathogens can function as effectors, facilitating the colonization of susceptible hosts while triggering resistance in some host genotypes by activating resistance (R) proteins (Dodds and Rathjen, 2010). In specialist species, the one-to-one relationship, either direct or indirect, between effectors and R proteins favors evolution through non-synonymous mutations, to increase effector diversity and escape R protein recognition (Raffaele and Kamoun, 2012). By contrast, pathogen populations exposed to weaker host selection, caused by quantitative resistance genes, or disruptive selection due to heterogeneity in the host population, are expected to evolve adapted genetic variants less rapidly (McDonald and Linde, 2002; Roux et al., 2014). Our analyses indeed support the view that selection driven by multiple hosts is associated with relatively low rates of evolution of protein variants but instead associates with the directional evolution of synonymous genetic variants. Translational selection may, nevertheless, be active in the genome of specialists but on limited gene sets only, resulting in a low signal in genome-scale analyses (dos Reis et al., 2004).

We noted a trend toward longer proteins encoded in the genome of generalist fungi, notably secreted proteins, consistent with the abundance of SSPs in specialists (Spanu et al., 2010) and previous comparative analyses (Kim et al., 2016). We associated longer proteomes in generalists with higher selection on codon optimization at the whole genome scale. The relative strength of translational selection on individual genes may, however, vary significantly within a given genome. Gene length increases in relation to evolutionary age, with conserved genes under purifying selection generally being longer and harboring higher frequency of optimal codons (Prat et al., 2009; Wolf et al., 2009). Neofunctionalization in duplicated genes has also been associated with increased gene length (He and Zhang, 2005). Duplications frequently underlie domain loss and gain in eukaryotic proteins, which is an important mechanism for the evolution of new functions (Buljan et al., 2010; Peisajovich et al., 2010). Natural selection may promote protein domains recombination to increase the versatility of fungal proteins functions, and thereby contribute to host range expansion.

The adaptive evolution of effectors that increase the colonization of one host species also increases the risk of detection in other host species by triggering their immune system. Furthermore, natural selection that maximizes parasite fitness on one particular host species might result in decreased fitness on an alternative host, leading to specialization (reduced host range) or host shift (Dong et al., 2014). The evolution of generalism has been associated to hybridization events (McMullan et al., 2015; Menardo et al., 2016), horizontal transfer of genetic elements (Ma et al., 2010; Hu et al., 2014), and an overall increase in the repertoire of protein-coding genes (Hu et al., 2014). This observation is consistent with the prominent role in pathogenic success of large repertoires of host-degrading enzymes encoded by the genome of generalist parasites (King et al., 2011; Zhao et al., 2013). Catalytic efficiency of these enzymes is crucial to perform in host environment. Although the catalytic activity of enzymes is typically modulated by adaptation of protein sequences, high reaction rates for efficient host colonization may be reached by modulating the amount of protein synthesized during colonization. However, an increase in proteins production may lead to a metabolic or structural burden of the translation machinery. For instance, the massive production of virulence factors by the generalist bacterial pathogen Ralstonia solanacearum reduces its metabolic versatility and thus its host colonization capacity (Perrier et al., 2016; Peyraud et al., 2016). Natural selection for the optimization of protein translation, through ribosome pausing time, translation error rates and co-translational protein folding, has been widely associated with codon usage biases (Drummond and Wilke, 2008; Tuller et al., 2010; Shah and Gilchrist, 2011). The adaptive evolution of the translation machinery described here is predicted to favor generalists’ performances in heterogeneous host populations, by reducing the cost of large and diverse sets of secreted enzymes production on proliferation. The mitigation of the tradeoff between cell proliferation and protein synthesis (Li et al., 2014) by codon optimization could explain how generalist lineages have maintained or gained the ability to produce diverse sets of secreted enzymes, allowing the colonization of genetically diverse hosts. Codon optimization may affect the expression of effectors and other virulence factors, modifying the degree of infection or enabling the colonization of new hosts opposing quantitative disease resistance mechanisms. These findings reveal that parasite host range variation is a prime example of an adaptive phenotype related to codon optimality and translational regulation.

Materials and methods

Simulation of codon optimization effect on protein biosynthesis

We designed a mathematical model of the cell translation machinery in order to undertake the analysis of the impact of codon optimization parameters on the cell physiology (Figure 1—source data 1). The model describes the variation of intracellular concentration, P(i), of three proteins sets (i): the free ribosomal proteins P(r), the proteins destined to be secreted P(s) and intracellular non-secreted proteins P(n), depending on their synthesis rate and the dilution rate due to cell growth. Crowding effect (Beg et al., 2007), that is the maximal number of proteins that a finite cell can contain, was modeled by adding a limitation function to the elongation rate which is asymptotic to the maximal proteins content. The ordinary differential equations were solved numerically using the COPASI software (RRID:SCR_014260) (Hoops et al., 2006) until a steady-state was reached. mRNA(i) concentrations were assumed to be stable with no dilution due to cell growth. The initiation constants between mRNA and ribosomes were set to a non-limiting value, similar for the three kinds of proteins sets (r, n and s). A stoichiometric constrain of 79 ribosomal proteins bound per mRNA(i) was added in the COPASI model. The initial parameters used for the different analysis are listed in Figure 1—source data 1. These parameters were set to be in the physiological range of a cell accordingly to data from Bremer and Dennis (2008). The concentrations of the biomolecules in the model are in mmol∙ml−1. The maximal growth rate was defined as the dilution rate (µ) above which the rate of the proteins biosynthesis is lower than the dilution rate, leading to the collapse of the proteins concentration. We considered situations in which 14.5% of total mRNAs corresponded to mRNAs for secreted proteins and 12% corresponded to mRNAs for ribosomal proteins, and analyzed the impact of varying codon decoding rate on protein production under our cellular model.

Codon optimization across the kingdom fungi

Genome assemblies and CDS files for 45 fungal species were downloaded from the repositories given in Figure 2—source data 1. For every species, the repertoire of tRNA genes was determined using the tRNAscan-SE 1.21 server (RRID:SCR_010835) (Schattner et al., 2005) with the ‘Source’ parameter set as ‘Eukaryotic’ and ‘Search Mode’ set as ‘Default’. For 42 species, only predicted full length CDS were considered for codon optimization analysis. For this, CDS lacking a start and a stop codon were filtered out. For Puccinia graminis, Serpula lacrymans and Colletotrichum higginsianum, since >15% of CDS lacked a predicted stop codon in these genomes, we filtered out CDS lacking a start codon only. tRNA adaptation indices (tAIs) were calculated for a total of 569,744 genes using the get.tai function in R (dos Reis et al., 2004) with the ‘sking’ parameter set to ‘0’ (Eukaryote) for the get.ws function. The degree of codon optimization in each genome was determined using the get.s function in R (dos Reis et al., 2004). Confidence intervals were determined using the CI function from the package ‘psychometric’ in R. For each genome, the probability of selection acting on codon optimization was determined using the ts.test function in R (dos Reis et al., 2004) with sample size = 500 and n = 1000 and p values were adjusted using Bonferroni correction. The repertoire of ribosomal proteins was identified based on interproscan annotation or de novo gene annotation performed with Blast2GO version 3.3 (RRID:SCR_005828) (Conesa et al., 2005). For each genome, the CDS of full length ribosomal proteins were used to compute a codon usage table with the CUSP function in EMBOSS (RRID:SCR_008493) (Rice et al., 2000) to calculate codon adaptation indices (CAI) for every gene in the corresponding genome, using the CAI function in EMBOSS. The degree of codon optimization at the whole genome scale was determined using the get.s function in R with gene CAI or scnRCA instead of tAI. Host range was determined according to sources listed in Figure 2—source data 2. For most plant pathogens, the Systematic Mycology and Microbiology Laboratory Fungus-Host Database of the United States Department of Agriculture was used (Farr and Rossman, 2016). For Cryptococcus neoformans, Aspergillus fumigatus, Beauveria bassiana, and Metarhizium acridum, a list of infected Orders were retrieved. We collected the list of Genera recorded in these Orders through various sources (Figure 2—source data 2). The likelihood that a parasite can infect two host species decreases continuously with phylogenetic distance between the hosts (Gilbert and Webb, 2007). In their study, Gilbert and Webb reported that the proportion of hosts colonized was about ~20% at a 300 Mya phylogenetic distance, decreasing logarithmically. We considered that 8% of Genera in a given Order were likely to be host for the above-mentioned pathogens to reach a conservative estimate. Core orthologous genes were identified through a BlastP search against a database of 36 complete fungal proteomes, using the CEGMA set downloaded from http://korflab.ucdavis.edu as a query. Only orthologous groups containing a single hit in all fungal species were selected for further analysis. tRNA adaptation indices for each genome were median-normalized before analyzing their distribution in COGs. We used the TimeTree database (Hedges et al., 2015) to obtain ages for a maximum of nodes of the tree (calibration points). Then, we ultrametricized the tree using PATHd8 (Britton et al., 2007). We tested for a potential presence of a phylogenetic signal estimated by Blomberg's K, Pagel's λ and autocorrelogram with the phylosignal R-package (Keck et al., 2016). We calculated phylogenetic independent contrasts (Felsenstein, 1985) and verified the significance of trait correlations using the pic function from the ape package in R.

Sequencing of S. sclerotiorum isolates

Isolates C014, P163, P314 and C104 were provided by Bruno Grezes-Besset (Biogemma, Mondonville, France) and were collected from infected rapeseed fields near Blois (France) in 2010. Isolate FrB5, collected from clover seed lot in Dijon (France) in 2010, was obtained from Vleugels et al. (2013). Isolate 1980 (ATCC18683) was collected in prior 1970 on bean pods near New York (Maxwell and Lumsden, 1970) and was provided by Martin Dickman (Texas A and M University, USA). For IGS phylogeny, Intergenic sequences (IGS) were retrieved on sequenced genomes with the Blast tool of CLC Genomics software (RRID:SCR_011853). The sequences were aligned with MAFFT v7 (RRID:SCR_011811) (Katoh and Standley, 2013), and the phylogenetic tree was build with PhyML 3 (Guindon et al., 2010) with SH-like approximate likelihood ratio test (aLRT) as support statistics. The tree was mid-point rooted using MEGA6 software (RRID:SCR_000667) (Tamura et al., 2013). Sclerotinia sclerotiorum isolates were grown in liquid potato dextrose medium (P6685 - Sigma) for 4 days at 24°C under shaking at 180 rpm. Cultures were filtered using vacuum and a fritted glass column protected with a doubled miracloth membrane and ground in liquid nitrogen. One gram of mycelium powder was used for DNA extraction using a DNA Maxi Kit (Quiagen) and following manufacturer’s instructions. DNA sequencing was outsourced to Fasteris SA (Switzerland) to produce Illumina paired-end reads (2 × 100 bp) using a HiSeq 2500 instrument. Paired-end reads for each S. sclerotiorum isolate were mapped to version 2 of the S. sclerotiorum 1980 reference strain genome (Derbyshire et al., 2017) using the mapping function of the CLC Genomics software. Variant call was performed on mapped reads with the Fixed Ploidy Variant Detection function of the CLC Genomics software. The resulting total variants for each strain were further filtered for homozygous single nucleotide variants.

Analysis of codon polymorphisms in plant pathogen populations

SNPs in gene coding sequences were extracted and mapped to the corresponding codons using R scripts to calculate the percentage of polymorphic codons of each type. The number of tRNAs able to decode each codon type is given as a percentage of the genomic tRNA pool that was determined based on tRNA gene predictions obtained through tRNAscan-SE. Whenever isoacceptor codons had no cognate tRNA predicted, the number of tRNA copies was divided evenly among isoacceptor codons according to general codon-anticodon wobble rules as described in dos Reis et al. (2004). We considered that the 11 codons that did not have isodecoder tRNAs in Zymoseptoria tritici were decoded evenly by the four tRNA genes that had no accepting codons identified by tRNAscan-SE. Codons with the highest percentage of decoding tRNAs were considered as optimal. In cases where the highest percentage of decoding tRNA was shared among several synonymous codons, codons having the highest number of cognate tRNA copies in the genome or codons with the highest usage in the genome were optimal. Codons with the lowest percentage of decoding tRNAs were considered as non-optimal. In cases where the lowest percentage of decoding tRNA was shared among several synonymous codons, codons with the lowest usage in the genome were considered non-optimal. Other codons were considered as intermediate.

Variant frequencies (in % of codons) were calculated by counting the number of codons harboring a SNP divided by the total number of this codon type in the genome. These raw variant frequencies were used in simulation of the evolution of codon optimality. A random base change within a codon is more likely to result in a nonsynonymous or stop-gain mutation than a synonymous mutation, hence we expect the frequency of non-synonymous codon variants to be higher than the frequency of synonymous codon variants under neutral selection. Therefore, to compare synonymous and non-synonymous variant frequencies, we adjusted non-synonymous variant frequencies dividing them by their likelihood to occur if changes were random. Variant frequencies in intergenic nucleotide triplets were used as an estimate for neutral patterns of evolution. For this, we extracted for each intergenic SNP the corresponding triplets. Variant frequencies (% of nucleotide triplets) were calculated by counting the number of triplets of a given type harboring a SNP divided by the total number of this triplet type in intergenic regions of the genome, determined using the COMPSEQ function in EMBOSS. Frequencies of variants in intergenic triplets were adjusted to compare to synonymous and non-synonymous variant frequencies. For this, we use as a correction factor the ratio between the adjusted total codon variant rate (synonymous variant rate + adjusted non-synonymous variant rate) over the unadjusted total codon variant rate (synonymous variant rate + non-synonymous variant rate). SNP frequencies in genes (in Kbp−1) were calculated counting the number of SNPs per genes and divided by the length of gene coding sequence in Kbp.

Determination of tRNA genes expression

Small RNAs were obtained from samples of Arabidopsis thaliana Col-0 ecotype plants infected by S. sclerotiotum strain 1980 and samples of S. Sclerotiorum strain 1980 grown in Potato Dextrose Broth as previously described. Small RNA were prepared using the NucleoSpin miRNA kit (Macherey-Nagel) following instructions of the manufacturer. Small RNAs sequencing was outsourced to Fasteris SA (Switzerland) to produce Illumina reads using a HiSeq 2500 instrument (small RNA libraries with expected insert size from 18 to 30 bp). Reads were imported into CLC Genomics Workbench 8.5 (QUIAGEN), and sequences were mapped on version 2 of the S. sclerotiorum strain 1980 genome (Derbyshire et al., 2017). Read counts at predicted tRNA loci were obtained using the Small RNA Analysis toolkit of the CLC Genomics Workbench 8.5 software. Only perfect matches to the S. sclerotiorum tRNA genomic loci on the correct strand were accepted, and inserts less than 15pb were discarded. Reads collected in planta or in vitro and aligned to the different copies of a given tRNA type in the genome were summed up and compared to tRNA gene copy numbers, codon usage or tRNA adaptive values.

Simulations of the evolution of codon optimality

Codons were classified as optimal, intermediate and non-optimal as described previously, unadjusted variant frequencies were used. The initial state in simulations corresponded to codon usage measured in S. sclerotiorum and Z. tritici reference genomes. To simulate the long-term evolution of codon optimality in S. sclerotiorum and Z. tritici, we developed two python scripts. The ‘observed’ simulator uses the frequencies of all 64x64 = 4096 possible codon substitutions measured in S. sclerotiorum and Z. tritici natural populations, kept constant over the simulation. At each evolutionary step, the frequencies of the 64 codons in each genome are mutated according to the codon substitution frequencies table. The total proportion of optimal, intermediate and non-optimal codons in each genome is computed after all measured codon substitutions have been accounted for (designated as one ‘generation’). To test if the observed long-term evolution of codon optimality deviated significantly from random mutation patterns, a ‘random’ simulator was implemented. In these simulations, bulk frequencies of triplets toward synonymous and not synonymous mutations were kept constant, whereas the outcome of mutations was determined randomly at each iteration. The number of mutations at each generation also varied randomly in order to obtain an uneven distribution. Simulations were repeated until an asymptotic behavior was apparent. The Python source code for these two simulators is provided in Figure 4—source code 1.

Codon optimization and genes function

To normalize tAIs across multiple genomes, tAI values were ordered in ascending order in each genome, and their rank given as a percentage (0 for lowest rank corresponding to the lowest tAI in a given genome up to 1 for the highest rank corresponding to the highest tAI in this genome). The source of data and pipelines used to identify genes significantly induced are listed in (Figure 5—source data 1). The predicted secreted protein genes were determined using SignalP 4.0 (Petersen et al., 2011) with default parameters. For each genome, genes were classified into secreted protein genes (signal peptide identified by SignalP 4.0) and non-secreted protein genes (all other genes). The degree of codon optimization in these two gene subsets were computed for each genome using the get.s function in R (dos Reis et al., 2004). The average normalized tAI for all genes annotated with a given Gene Ontology was computed for generalist and specialist genomes. GO mapping was retrieved from the repositories given in Figure 2—source data 1 or performed de novo with Blast2GO software. Only GOs associated with 200 genes or more were analyzed.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
    Modulation of chemical composition and other parameters of the cell at different exponential growth rates
    1. H Bremer
    2. PP Dennis
    (2008)
    EcoSal Plus, 3, 10.1128/ecosal.5.2.3, 26443740.
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
    Fungal databases, systematic mycology and microbiology laboratory, ARS, USDA
    1. DF Farr
    2. AY Rossman
    (2016)
    Fungal databases, systematic mycology and microbiology laboratory, ARS, USDA, http://nt.ars-grin.gov/fungaldatabases/.
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90

Decision letter

  1. Claus O Wilke
    Reviewing Editor; The University of Texas at Austin, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: this article was originally rejected after discussions between the reviewers, but the authors were invited to resubmit after an appeal against the decision.]

Thank you for submitting your work entitled "Codon optimization underpins generalist parasitism in fungi" for consideration by eLife. Your article has been favorably evaluated by Detlef Weigel (Senior Editor) and three reviewers, one of whom, Claus Wilke, is a member of our Board of Reviewing Editors.

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife. There were three key issues that prompted us to reach this conclusion:

1) The manuscript is lacking a convincing "why". No convincing theory is presented for why codon bias should differ between generalists and specialists.

2) The main finding of the paper is a correlation between codon bias and host breadth. However, this correlation is confounded by the phylogenetic relationship between species, and whether the correlation would remain after this relationship is controlled for is unclear.

3) The manuscript is not well conceived and structured. A lot of potentially interesting material is presented in supporting information, while some of the information in the main manuscript seems tangential to the overall story. For example, the material in Figure 2 only establishes that codon bias is caused by selection pressure (an observation that is already widely accepted) but does not actually support the main story of the paper (that generalists experience different selection pressures than specialists).

Reviewer #1:

1) I am missing a clear, overarching hypothesis. Why do the authors think that codon bias should differ between specialist and generalist species? What is driving this effect?

2) I am not sure that the work of Figure 2 speaks to the overall hypothesis of the paper. Figure 2 demonstrates that two genomes with different codon biases likely experience different synonymous selection pressures. I think it is widely accepted these days that codon bias is often caused by selection, so we haven't learned much new from this analysis. Importantly, this work does not explain why codon bias patterns might be different in a generalist vs. a specialist.

Technical comments:

3) Results and Discussion, first paragraph: "A total of 22 species showed signatures…" Was this corrected for multiple testing?

4) Results and Discussion, second paragraph, Figure 1B: Spearman correlation assumes that the points are independent. However, they are confounded by phylogeny. The authors should calculate the correlation on phylogenetic independent contrasts. Looking at the figure, I would expect the effect to be much weaker under independent contrasts.

5) Subsection “Codon optimization across the kingdom Fungi”: I don't understand this: "taking the average p value from 100 tests of sample size=500 and n=1000." More explanation is needed. Normally, one would do only one test. How is an average p value from multiple tests to be interpreted?

6) What is the point of Figure 2B? I don't understand what this is meant to show.

7) It is unclear what is shown in Figure 3A. I don't think the quantity shown on the y axis is defined.

Reviewer #2:

The manuscript by Badet et al. aims to describe patterns of codon adaptation in parasitic fungi and how they vary as a function of their host ranges. In addition to performing comparative genomic analyses across 45 species, they generate transcriptomic datasets to measure changes in mRNA and tRNA abundances depending on the host for a subset of species as well as perform simulation studies to estimate elongation times of codons across these species.

Overall it is quite a compelling piece of work and I highly recommend its publication. My primary concern is that the bulk of analyses that the authors have performed is tucked away in the supplement and barely described in the main text. In fact, some supplementary figures are not even cited. I feel the authors are shortchanging themselves by submitting this manuscript as a Short report. I would recommend they elaborate their explanations of the analyses already presented in the supplement and discuss them in more detail.

In terms of specific concerns – I was particularly struck by their Figure 2—figure – supplement 1B. A large number of synonymous mutations are derived from mutations in the first codon position. Only Leu and Arg are capable of such mutations. Does this mean that over half the synonymous mutations were occurring at these two amino acids?

Reviewer #3:

In this study, the authors examine the relationship between gene codon optimization and host range variations in parasitic fungi. The study raised the hypothesis that the degree of gene codon optimization in parasitic fungi is related to the host range variation. Parasitic fungi with high degrees of gene codon optimization are usually generalists while genes in specialist fungi are usually less codon-optimized.

Although this is a very interesting hypothesis, the conclusions, however, are not convincing due to a very problematic methodology used and are not consistent with the known mechanisms of host specificity determination of parasitic fungi.

The conclusions of this paper are heavily dependent on a previously developed method (dos Reis et al., 2004) to determine the degree of codon tRNA coadapation. Such a value called 'S' was assumed to reflect the translation selection of genes by codon usage biases. Such a model, however, is very problematic and its analysis results are not consistent with many known experimental results. For example, in the original 2004 study, among the 126 genomes analyzed by this model, the S values of only 36 genomes were found to be statistically different from zero, suggesting that for most genomes, there is no sign of translational selection by codon usage acting on their genomes. Obviously, this is very different from known experimental. For example, while S values of human and Drosophila indicating lack of selection, there are now very strong experimental evidence demonstrating a role for translation selection by codon usage biases. Same for Bacillus. As acknowledged by the authors of the 2004 paper, their model has severe limitations and cannot explain much of the variation observed. In the case of human and mouse genome, which should exhibit a very similar degree of translation selection by codon biases, has quite a big difference in S values. In the context of this paper, three of the non-parasitic fungi analyzed by the 2004 study, S. cerevisiae, S. pombe, and N. crassa, all have very high S values, which are not consistent with the conclusion of this paper. All these indicate that the S value model used is very problematic and do not reflect much of the known experimental results.

A brief reading of the original 2004 method calculating the optimized s-values show that although adenosine deamination was taken into account when calculating base pairing between tRNA and codons, the I:U base pairing was set to be more preferred than I:C based pairing. Obviously, this is not consistent with current knowledge that inosine preferentially base pairs with cytosine. On the other hand, the simple assumption that tRNA copy number will truthfully reflect tRNA expression will certainly introduce additional variations of the model (human vs. mice for example). Current tRNA sequencing methods do not help here due to cloning and sequencing biases. Obviously, there can be more issues with the model.

The host specificity determination of parasitic fungi is mostly known to be determined by expression of different effectors, which frequently are not conserved. Although the codon optimization of genes related to infection was examined in two species, effector genes was not studied. Even if codon optimization may affect the expression of effectors, it should only affect the degree of infection but not the host range. Based on my knowledge, the expression levels of many known effectors are usually low.

Because of these two major deficiencies, I feel that the current study is premature for publication.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for resubmitting your work entitled "Codon optimization underpins generalist parasitism in fungi" for further consideration at eLife. Your revised article has been favorably evaluated by Detlef Weigel (Senior editor), a Reviewing editor, and two reviewers.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1) Our major remaining concern with the manuscript lies in the "why" question brought up during the previous review. The arguments made are weak and hand-wavy. You claim that since generalists have longer genes than specialists and as a result might take longer to translate, there is stronger selection to improve their codon usage. This begs the question, what evolutionary forces are influencing gene lengths? Are gene-lengths in specialists under stronger selection to reduce in size or in generalists to increase their sizes and why?

Further, the results in Figure 5 suggest that codon optimization of secreted proteins may explain the host range of fungal parasites. However, the host range can simply be determined by robustness of overall cell metabolism/growth. To strengthen your conclusions, we suggest that you perform pathway-specific analysis and compare to that of the secreted proteins.

2) We appreciate that you have pursued several different approaches to correct for phylogeny, and we consider your results now reliable. However, throughout the manuscript there remain several places where you provide standard correlations, with uncorrected p values. All these should be replaced with correlations over phylogenetic contrasts. Specifically, these are in subsection “Codon optimization correlates with fungal parasites host range”, first paragraph and second paragraphs.

3) In several places throughout the manuscript, you list p values but don't state the test that was performed. Please state the test every time. Also, you are inconsistent in when you do and don't run a test. For example, in the first paragraph of the subsection “Long proteins encoded by the genome of generalist fungi likely increase natural selection on codon optimization” you list a p value but you don't do so in any of the other length comparisons in the following lines.

https://doi.org/10.7554/eLife.22472.119

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife. There were three key issues that prompted us to reach this conclusion:

1) The manuscript is lacking a convincing "why". No convincing theory is presented for why codon bias should differ between generalists and specialists.

The manuscript has been re-organized and amended to clearly state our “why” hypothesis: we propose that the longer proteins encoded by the genome of generalist fungi increase natural selection on the optimization of codons in these species. The rationale for this hypothesis is presented in the Introduction. Briefly, we observed that the genome of generalists typically encode longer proteins than that of specialists, probably in relation to requirements of their lifestyle. Overall, the time required for protein synthesis by the translation machinery increases with protein length. Therefore, near parasites maximal growth rate, the structural constraints on the translation machinery are higher in generalist than in specialist cells. This structural constraint can be alleviated by increasing protein synthesis rate via codon optimization. We also highlight the consistency of our hypothesis with the “jack of all trades, master of none” theoretical model for the evolution of generalism. Indeed, natural selection on the optimization of protein translation is expected to increase parasites fitness in average on multiple hosts rather than specifically on a single or a few hosts (Introduction, last paragraph).

As a first approach to test this hypothesis, we compared structural constraints on protein synthesis in specialist and generalist fungi and their impact on maximal cell growth rates. This analysis is based on an in silico model of the cellular translation machinery and proteome properties of sequenced fungal parasites. We conclude from this modeling approach that “longer proteins, especially secreted proteins, encoded in the genome of generalist fungi limits maximal cell growth rates compared to specialist fungi. [Our model] also shows that codon optimization can support the secretion of more complex proteins with limited growth penalty. […] We therefore expect natural selection on the optimization of codons to be stronger in generalist parasites than in host specialists.”

2) The main finding of the paper is a correlation between codon bias and host breadth. However, this correlation is confounded by the phylogenetic relationship between species, and whether the correlation would remain after this relationship is controlled for is unclear.

We have included explicit controls for phylogenetic signal in the revised manuscript. First, we used phylogeny and ancestral state reconstruction to show that codon optimization and host range co-evolved multiple times independently in the fungal Kingdom. Second, we show that the correlation between fungal parasites host range and codon optimization is detected both at the Kingdom and at the Phylum level. Our analysis of S. sclerotiorum and Z. tritici natural populations further shows that codon optimization can be detected at the infra-specific level in generalist but not in specialist species. Third, we used Blomberg’s K, Pagel’s λ and the phylosignal correlogram approach as quantitative measures of phylogenetic signal in codon optimization along the fungal phylogeny. Fourth, we verified the significance of the correlation between host range and genome scale codon optimization using Felsenstein’s phylogenetic independent contrasts (doi:10.1086/284325). These four methods allowed to unambiguously exclude an impact of phylogenetic relationships in the correlation we observed between codon optimization and host range.

3) The manuscript is not well conceived and structured. A lot of potentially interesting material is presented in supporting information, while some of the information in the main manuscript seems tangential to the overall story. For example, the material in Figure 2 only establishes that codon bias is caused by selection pressure (an observation that is already widely accepted) but does not actually support the main story of the paper (that generalists experience different selection pressures than specialists).

We have substantially revised the manuscript and improved its structure to clarify the underlying hypotheses and rationale for the analyses presented. We have reduced supporting information to the minimum required to support the conclusions of the paper (mostly source data). We notably clarify that the analyses presented in former Figure 2 aim at deciphering the molecular bases underlying codon optimization in generalist fungi.

We show that natural selection reduced the frequency of synonymous SNPs fixed in optimal codons in the genome of a generalist but not a specialist parasite. This first demonstrates that codon optimization associates with generalism at the infra-specific level, in addition to the Phylum and Kingdom levels. To our knowledge, comparative analyses describing evolutionary processes conserved from the infra-specific to the Kingdom level remain scarce. Second, this section of the manuscript connects codon optimization and broad host range through a molecular mechanism, indicating that the observed correlation is causal. Finally, by highlighting purifying selection acting on optimal codons in generalist but not in specialist parasites, this work provides further evidence that generalists experience different selection pressure than specialists. The significant changes we brought to this revised version of the manuscript should make these conclusions and implications clearer.

Reviewer #1:

1) I am missing a clear, overarching hypothesis. Why do the authors think that codon bias should differ between specialist and generalist species? What is driving this effect?

As mentioned in our response to the Editor, we have significantly edited the manuscript to make our hypothesis clear. We propose that selection on codon optimization would be stronger in generalists than in specialists because the genomes of generalist parasites encode longer proteins increasing the structural constraints on the translation machinery in these species. These structural constraints can be alleviated by codon optimization. This hypothesis derives from previous genomic studies of generalist and specialist fungi (see Introduction) and is supported by our analysis of an in silico model of the cell translation machinery calibrated with the properties of proteins from sequenced fungal parasites (see Results subsection “Long proteins encoded by the genome of generalist fungi likely increase natural selection on codon optimization”).

More specifically, our analysis shows that in average, secreted proteins are ~14.8% longer in generalist than in specialist fungi. Our modeling approach shows that differences in protein lengths imply higher codon decoding rates in generalists to achieve similar cellular growth rates as specialists. Therefore, codon optimization provides a mean for generalists to compete with specialist fungi and be maintained through evolution. This observation does not exclude other possible causes but is sufficient to expect natural selection on the optimization of codons to be stronger in generalist parasites than in host specialists.

2) I am not sure that the work of Figure 2 speaks to the overall hypothesis of the paper. Figure 2 demonstrates that two genomes with different codon biases likely experience different synonymous selection pressures. I think it is widely accepted these days that codon bias is often caused by selection, so we haven't learned much new from this analysis. Importantly, this work does not explain why codon bias patterns might be different in a generalist vs. a specialist.

The analyses reported in former Figure 2 (now Figure 4) aim at unraveling how natural selection drives codon optimization in generalists, rather than why this is the case. We explicitly state this objective in the first paragraph of the subsection “Biased SNP patterns underpin with codon optimization in the generalist parasite Sclerotinia sclerotiorum”. In Figure 4, we show first that codon optimization associates with generalism from the infra-specific to the Kingdom level. The evolutionary processes revealed in our work therefore stand out for their remarkable broad scope in the tree of life. Second, we connect codon optimization and broad host range through the molecular mechanism of biased synonymous SNPs on optimal codons. This supports a causal relationship between codon optimization and generalism. Finally, we detect purifying selection on optimal codon in the genome of generalist but not specialist fungi, reinforcing the finding that natural selection differs in these species. We conclude that “these findings identify biased synonymous substitutions as a link between generalism and codon optimization. This analysis is independent of codon usage indices and shows that selection for average performances on multiple hosts is reflected in global trends of genome evolution.” In the revised manuscript, we address the question why natural selection on codon optimization is expected to differ in generalist and specialists in the Introduction and results reported in Figure 1 (see response to reviewer 1 comment #1).

Technical comments:

3) Results and Discussion, first paragraph: "A total of 22 species showed signatures…" Was this corrected for multiple testing?

We have used Bonferroni correction to control for multiple testing in the revised version of the manuscript.

4) Results and Discussion, second paragraph, Figure 1B: Spearman correlation assumes that the points are independent. However, they are confounded by phylogeny. The authors should calculate the correlation on phylogenetic independent contrasts. Looking at the figure, I would expect the effect to be much weaker under independent contrasts.

In the revised manuscript we have explicitly controlled for phylogenetic signal in the correlation between codon usage and host range. We provide four lines of evidence to demonstrate that this correlation is independent of phylogeny: (i) we used ancestral state reconstruction to show that codon optimization and host range co-evolved multiple times independently in the fungal Kingdom; (ii) we show that the correlation is detected at the Phylum level, (iii) we used three different quantitative measures of phylogenetic signal that all excluded any significant impact of phylogenetic relationships in the correlation between codon optimization and host range, and (iv) we verified the significance of the correlation between host range and codon optimization using Felsenstein’s phylogenetic independent contrasts (doi:10.1086/284325) and found correlation coefficients ~0.6 (p-val<2.8 10-05).

5) Subsection “Codon optimization across the kingdom Fungi”: I don't understand this: "taking the average p value from 100 tests of sample size=500 and n=1000." More explanation is needed. Normally, one would do only one test. How is an average p value from multiple tests to be interpreted?

We have used the p-value reported by a single test in the revised version of the manuscript.

6) What is the point of Figure 2B? I don't understand what this is meant to show.

We have clarified the objectives of this approach as well as its implementation in our revised manuscript (subsection “Natural selection drives codon optimization in generalist fungal parasites”, first paragraph). We conclude from this approach that “SNP patterns determined experimentally converge towards increased codon optimality in S. sclerotiorum genome” and that “patterns of evolution towards increased codon optimality were detected in S. sclerotiorum but not in Z. tritici populations, and deviate significantly from neutral evolution”. This contributes to the unraveling of the molecular mechanisms underlying codon optimization in generalists and to the demonstration of a causal relationship between codon optimization and host range.

7) It is unclear what is shown in Figure 3A. I don't think the quantity shown on the y axis is defined.

Former Figure 3B has been removed from the revised version of the manuscript.

Reviewer #2:

The manuscript by Badet et al. aims to describe patterns of codon adaptation in parasitic fungi and how they vary as a function of their host ranges. In addition to performing comparative genomic analyses across 45 species, they generate transcriptomic datasets to measure changes in mRNA and tRNA abundances depending on the host for a subset of species as well as perform simulation studies to estimate elongation times of codons across these species.

Overall it is quite a compelling piece of work and I highly recommend its publication. My primary concern is that the bulk of analyses that the authors have performed is tucked away in the supplement and barely described in the main text. In fact, some supplementary figures are not even cited. I feel the authors are shortchanging themselves by submitting this manuscript as a Short report. I would recommend they elaborate their explanations of the analyses already presented in the supplement and discuss them in more detail.

We are grateful for the reviewer’s assessment that our study represents a compelling piece of work the publication of which was recommended. We have taken the suggestion of reviewer #2 to remove the less useful supplementary material and to expand on the hypotheses, rationale and interpretation of our analyses by submitting our revised manuscript as an Article.

In terms of specific concerns – I was particularly struck by their Figure 2—figure supplement 1B. A large number of synonymous mutations are derived from mutations in the first codon position. Only Leu and Arg are capable of such mutations. Does this mean that over half the synonymous mutations were occurring at these two amino acids?

This unexpected value was due to a mistake in the way gene orientation was taken into account in our original script. We are sorry that this error escaped our scrutiny in the first version of the manuscript and thank the reviewer for pointing it out. The figure has been corrected accordingly. This information is not used in any other analysis of this work so that all other conclusions regarding SNP patterns remain valid.

Reviewer #3:

In this study, the authors examine the relationship between gene codon optimization and host range variations in parasitic fungi. The study raised the hypothesis that the degree of gene codon optimization in parasitic fungi is related to the host range variation. Parasitic fungi with high degrees of gene codon optimization are usually generalists while genes in specialist fungi are usually less codon-optimized.

Although this is a very interesting hypothesis, the conclusions, however, are not convincing due to a very problematic methodology used and are not consistent with the known mechanisms of host specificity determination of parasitic fungi.

The conclusions of this paper are heavily dependent on a previously developed method (dos Reis et al., 2004) to determine the degree of codon tRNA coadapation. Such a value called 'S' was assumed to reflect the translation selection of genes by codon usage biases. Such a model, however, is very problematic and its analysis results are not consistent with many known experimental results. For example, in the original 2004 study, among the 126 genomes analyzed by this model, the S values of only 36 genomes were found to be statistically different from zero, suggesting that for most genomes, there is no sign of translational selection by codon usage acting on their genomes. Obviously, this is very different from known experimental. For example, while S values of human and Drosophila indicating lack of selection, there are now very strong experimental evidence demonstrating a role for translation selection by codon usage biases. Same for Bacillus. As acknowledged by the authors of the 2004 paper, their model has severe limitations and cannot explain much of the variation observed. In the case of human and mouse genome, which should exhibit a very similar degree of translation selection by codon biases, has quite a big difference in S values.

The calculation of S value involves all predicted genes in a genome. A positive S signal therefore suggests that selection for codon optimization is high and targets a significant proportion of genes in a given genome. As mentioned by Dos Reis et al. (2004), highly expressed genes with biased codon patters represent only 4.6% of Bacillus subtilis genome, probably explaining why the S signal at the whole genome scale was weak in this species. The authors conclude that “small S-value for a whole genome means that translational selection might be negligible at a genomic scale, but it can nonetheless have a strong effect on smaller scales, such as particular gene sets”. A subset of genes is indeed likely under translational selection in host specialists, although this subset is too small to increase S significantly. We have carefully edited the manuscript to clarify that conclusions based on S value apply at the whole genome scale (e.g. subsection “Codon optimization correlates with fungal parasites host range”, first and second paragraphs), and we have added the following word of caution: “Translational selection may nevertheless be active in the genome of specialists but on limited gene sets only, resulting in a low signal in genome-scale analyses” (Discussion).

In the context of this paper, three of the non-parasitic fungi analyzed by the 2004 study, S. cerevisiae, S. pombe, and N. crassa, all have very high S values, which are not consistent with the conclusion of this paper. All these indicate that the S value model used is very problematic and do not reflect much of the known experimental results.

Our work does not imply that non parasitic fungi should have low S values. We have included a small set of non-parasitic fungi in our work to support an increase of S in generalist lineages, which required identifying closely related genomes with lower S values and the reconstruction of ancestral states (Figure 3). We have added the following sentence to clarify this point: “[Lower codon optimization in non parasitic fungi] does not exclude that some lineages of non-parasitic fungi could have evolved high codon optimization, but supports the view that codon optimization increased after the divergence of generalist parasites” (Results).

A brief reading of the original 2004 method calculating the optimized s-values show that although adenosine deamination was taken into account when calculating base pairing between tRNA and codons, the I:U base pairing was set to be more preferred than I:C based pairing. Obviously, this is not consistent with current knowledge that inosine preferentially base pairs with cytosine. On the other hand, the simple assumption that tRNA copy number will truthfully reflect tRNA expression will certainly introduce additional variations of the model (human vs. mice for example). Current tRNA sequencing methods do not help here due to cloning and sequencing biases. Obviously, there can be more issues with the model.

To our knowledge, measures of the adaptation of genes to the tRNA pool (such as S and the tRNA adaptation index) are among the most widely used methods these days to study codon usage bias. In a recent cross-species analysis, Sabi and Tuller (doi: 10.1093/dnares/dsu017) reported that species-specific s-values outperformed the original optimized s-values in non-fungal organisms, but not in fungi. Accordingly, they determined species-specific s-values for fungi very similar to dos Reis et al. 2004 optimized s-values. Besides, we have determined tRNA expression in S. sclerotiorum using Illumina sequencing approach to verify in this organism that tRNA copy number correlate with tRNA expression. For these reasons, we do not expect strong methodological bias in this approach.

Nevertheless, we agree that tRNA based methods have limitations for the analysis of codon optimization. To circumvent them, we used in our revised manuscript three other complimentary approaches: First, we inferred genome-wide codon optimization based on the Codon Adaptation Index (Sharp and Li, Nucleic acids research 1987) using ribosomal proteins from each genome as reference sets. This method does not rely on knowledge of the tRNA pool. Second, we inferred genome-wide codon optimization based on the recently developed self-consistent normalized Relative Codon Adaptation index (scnRCA, O’Neill et al. Plos One 2013). This method does not rely on a reference gene set. Finally, we compared codon optimization in core ortholog gene sets, to exclude possible biases due to the completeness of genome assembly. It should be noted that bias due to genome assembly could be excluded in our other analyses (see subsection “Codon optimization correlates with fungal parasites host range”, last paragraph). The four approaches converged toward the conclusion that codon optimization correlates with fungal parasites host range, supporting the robustness of this finding.

The host specificity determination of parasitic fungi is mostly known to be determined by expression of different effectors, which frequently are not conserved. Although the codon optimization of genes related to infection was examined in two species, effector genes was not studied. Even if codon optimization may affect the expression of effectors, it should only affect the degree of infection but not the host range. Based on my knowledge, the expression levels of many known effectors are usually low.

We agreed that “The host specificity determination of parasitic fungi is mostly known to be determined by expression of different effectors, which frequently are not conserved” in the context of gene-for-gene interactions, such as observed for host specialist parasites. It is however unclear whether effectors are the only determinants of host range and how crucial they are in the case of broad host range parasites. It is notable that the ability to derive nutrients from host tissues is critical for some broad host range parasites (e.g. Gesbert et al. 2014 DOI:10.1111/cmi.12227). For plant pathogens this involves the degradation of the plant cell wall by secreted enzymes. The expression level of such genes may be high during host colonization. Furthermore, the complete repertoire of effectors in not known for most fungal pathogens. For these reasons we chose not to restrict our analysis to effectors (or candidate effectors). Instead we used genes induced during host colonization and predicted secreted protein as genes universally associated with host colonization. Admittedly, codon optimization is unlikely to allow a parasite to evade recognition by a host resistance gene. It may however increase parasites fitness to enable the colonization of different hosts. We have clarified this point in the Discussion (last paragraph), concluding that “Codon optimization may affect the expression of effectors and other virulence factors, modifying the degree of infection or enabling the colonization of new hosts opposing quantitative disease resistance mechanisms.”

[Editors’ note: the author responses to the re-review follow.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1) Our major remaining concern with the manuscript lies in the "why" question brought up during the previous review. The arguments made are weak and hand-wavy. You claim that since generalists have longer genes than specialists and as a result might take longer to translate, there is stronger selection to improve their codon usage. This begs the question, what evolutionary forces are influencing gene lengths? Are gene-lengths in specialists under stronger selection to reduce in size or in generalists to increase their sizes and why?

Further, the results in Figure 5 suggest that codon optimization of secreted proteins may explain the host range of fungal parasites. However, the host range can simply be determined by robustness of overall cell metabolism/growth. To strengthen your conclusions, we suggest that you perform pathway-specific analysis and compare to that of the secreted proteins.

The question of the evolution of gene length is an interesting field of research that reaches beyond the scope of our study. We have included elements of discussion on this point in our revised manuscript by stating that “Gene length increases in relation to evolutionary age, with conserved genes under purifying selection generally being longer and harboring higher frequency of optimal codons (Prat et al., 2009; Wolf et al., 2009). Neofunctionalization in duplicated genes has also been associated with increased gene length (He and Zhang, 2005). Duplications frequently underlie domain loss and gain in eukaryotic proteins, which is an important mechanism for the evolution of new functions (Buljan et al., 2010; Peisajovich et al., 2010).” In the context of our work, the possible implication is that “Natural selection may promote protein domains recombination to increase the versatility of fungal proteins functions and thereby contribute to host range expansion”.

Our analyses show that codon optimization of secreted proteins is stronger in the genome of generalist parasites compared to specialists. We show that it is also the case for host-induced genes, regardless of whether they encode secreted or non-secreted proteins (Figure 5A). We agree that additional genomic properties likely contributed to the success of generalist fungi. We have taken the suggestion to test for codon optimization in multiple cellular processes such as metabolism. The result of this analysis is provided in Figure 5D. We have added the following paragraph to describe these results: “Increased codon optimization in generalists was not only clear for genes encoding secreted proteins and host-induced genes, but also across their entire genome (‘other genes’ in Figure 5C). […] Conversely, GOs related to transcription, transposable elements and phosphorelay signal transduction were better optimized in specialists than in generalists.”

The subsection “Codon optimization and genes function” describes the methods used for this analysis. The findings from this analysis may be related to the emergence of new protein functions through domain recombination and gene length expansion as discussed in the third paragraph of the Discussion. The importance of codon optimization in the evolution of these innovations, deduced from our work and current knowledge of molecular host-parasite interactions, is then discussed in the last paragraph of the Discussion.

We are grateful for these suggestions that prompted us to complete our analyses and add elements of context to the Discussion section of the manuscript. These revisions greatly clarified the expected functional implications of codon optimization in generalists.

2) We appreciate that you have pursued several different approaches to correct for phylogeny, and we consider your results now reliable. However, throughout the manuscript there remain several places where you provide standard correlations, with uncorrected p values. All these should be replaced with correlations over phylogenetic contrasts. Specifically, these are in subsection “Codon optimization correlates with fungal parasites host range”, first paragraph and second paragraphs.

We have replaced uncorrected correlations by correlations under phylogenetic independent contrasts in the first and second paragraphs of the subsection “Codon optimization correlates with fungal parasites host range”. To preserve the flow of the manuscript, we have moved the statement “We verified the significance of the correlation between host range and genome scale codon optimization using Felsenstein’s phylogenetic independent contrasts, and obtained Spearman rho of 0.6 (p-val=1.5 10-05)” to the first paragraph of the aforementioned subsection. We updated the correlation coefficients and p-values used in Figure 2 with correlations under phylogenetic independent contrasts.

3) In several places throughout the manuscript, you list p values but don't state the test that was performed. Please state the test every time. Also, you are inconsistent in when you do and don't run a test. For example, in the first paragraph of the subsection “Long proteins encoded by the genome of generalist fungi likely increase natural selection on codon optimization” you list a p value but you don't do so in any of the other length comparisons in the following lines.

We have added statement on the test used and p-values calculated wherever applicable.

https://doi.org/10.7554/eLife.22472.120

Article and author information

Author details

  1. Thomas Badet

    LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
    Contribution
    TB, Data curation, Formal analysis, Investigation, Methodology, Writing—review and editing
    Contributed equally with
    Remi Peyraud and Malick Mbengue
    Competing interests
    The authors declare that no competing interests exist.
  2. Remi Peyraud

    LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
    Contribution
    RP, Data curation, Formal analysis, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Thomas Badet and Malick Mbengue
    Competing interests
    The authors declare that no competing interests exist.
  3. Malick Mbengue

    LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
    Contribution
    MM, Resources, Formal analysis, Investigation, Methodology, Writing—review and editing
    Contributed equally with
    Thomas Badet and Remi Peyraud
    Competing interests
    The authors declare that no competing interests exist.
  4. Olivier Navaud

    LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
    Contribution
    ON, Formal analysis, Methodology, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  5. Mark Derbyshire

    Centre for Crop and Disease Management, Department of Environment and Agriculture, Curtin University, Perth, Australia
    Contribution
    MD, Resources, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  6. Richard P Oliver

    Centre for Crop and Disease Management, Department of Environment and Agriculture, Curtin University, Perth, Australia
    Contribution
    RPO, Resources, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  7. Adelin Barbacci

    LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
    Contribution
    AB, Software, Formal analysis, Methodology, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0003-3156-272X
  8. Sylvain Raffaele

    LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
    Contribution
    SR, Conceptualization, Formal analysis, Supervision, Funding acquisition, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    sylvain.raffaele@toulouse.inra.fr
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-2442-9632

Funding

European Research Council (ERC-StG 336808)

  • Thomas Badet
  • Remi Peyraud
  • Malick Mbengue
  • Olivier Navaud
  • Sylvain Raffaele

Labex TULIP (ANR-11-IDEX-0002-02)

  • Thomas Badet
  • Remi Peyraud
  • Malick Mbengue
  • Olivier Navaud
  • Adelin Barbacci
  • Sylvain Raffaele

Australian grains research and development corporation

  • Mark Derbyshire
  • Richard P Oliver

Curtin University of Technology

  • Mark Derbyshire
  • Richard P Oliver

Labex TULIP (ANR-11-IDEX-0002-02)

  • Thomas Badet
  • Remi Peyraud
  • Malick Mbengue
  • Olivier Navaud
  • Adelin Barbacci
  • Sylvain Raffaele

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Marielle Barascud, Aline Lacaze and Remy Vincent for technical assistance, Bruno Grezes-Besset and Tim Vleugels for providing S. sclerotiorum isolates. This work was supported by a starting grant of the European Research Council (ERC-StG 336808 project VariWhim) to SR and the French Laboratory of Excellence project TULIP (ANR-10-LABX-41; ANR-11-IDEX-0002–02). MD and RPO are supported by the Australian Grains Research and Development Corporation and Curtin University. Sequences are deposited in GenBank under the BioProject ID PRJNA341340.

Reviewing Editor

  1. Claus O Wilke, Reviewing Editor, The University of Texas at Austin, United States

Publication history

  1. Received: October 18, 2016
  2. Accepted: January 28, 2017
  3. Accepted Manuscript published: February 3, 2017 (version 1)
  4. Version of Record published: February 17, 2017 (version 2)
  5. Version of Record updated: February 24, 2017 (version 3)

Copyright

© 2017, Badet et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,643
    Page views
  • 466
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Scopus, Crossref.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)