The protein domains of vertebrate species in which selection is more effective have greater intrinsic structural disorder
Figures

The effectiveness of selection, calculated as the long-term ratio of time spent in fixed deleterious: fixed beneficial allele states given symmetric mutation rates, is a function of the product .
Assuming a diploid Wright–Fisher population with s << 1, the probability of fixation of a new mutation , and the y-axis is calculated as . s is held constant at a value of 0.001 and N is varied. Results for other small magnitude values of s are superimposable. For small , selection is ineffective at producing codon bias. For large , selection is highly effective. For only a relatively narrow range of intermediate values of , the degree of codon bias depends quantitatively on .

More highly adapted species (bottom) have a higher proportion of their sites subject to effective selection on codon bias (blue area).
The Codon Adaptation Index (CAI) attempts to compare the intensity of selection (Figure 1, x-axis) in a subset of genes under strong selection (red areas). Given the narrow range of quantitative dependence of codon bias on shown in Figure 1, our new metric is intended to capture differences in the proportion of the proteome subject to substantial selection (blue areas).

Codon Adaptation Index (CAI) is seriously confounded with GC content (A), while Effective Number of Codons (ENC) and Codon Adaptation Index of Species (CAIS) are not (B and C).
We control for phylogenetic confounding via Phylogenetic Independent Contrasts (PIC) (Felsenstein, 1985); this yields an unbiased R2 estimate (Rohlf, 2006). Each datapoint is one of 118 vertebrate species with ‘Complete’ intergenic genomic sequence (allowing for %GC correction) and TimeTree divergence dates (allowing for PIC correction). Red line shows unweighted lm(y ~ x) with gray region as 95% confidence interval. Figure 3—figure supplement 1 shows in more detail why CAI is not appropriate for species-wide effectiveness of selection measurements. Plots without PIC correction are shown in Figure 3—figure supplement 2. The impact of amino acid frequency correction on CAIS is shown in Figure 3—figure supplement 3.

Codon Adaptation Index (CAI) is not appropriate for species-wide effectiveness of selection measurements.
Each CAI value shown is averaged over an entire species’ proteome. (A) The value of CAI is driven by its normalizing denominator term, CAImax. (B) As a result, CAI is inversely proportional to Codon Adaptation Index of Species (CAIS). Each datapoint is one of 118 vertebrate species with ‘Complete’ intergenic genomic sequence available (allowing for %GC correction) and TimeTree divergence dates (allowing for Phylogenetic Independent Contrasts [PIC] correction). p-values shown are for Pearson’s correlation.

The same relationships are shown as in Figure 3, but without correction for phylogenetic confounding, suggesting GC confounding for the Effective Number of Codons (ENC) but not the Codon Adaptation Index of Species (CAIS).
Codon Adaptation Index (CAI) (A) and ENC (B) both correlate with genomic GC, but CAIS (C) does not. Red line shows lm(y ~ x), with gray region as 95% confidence interval. We use Phylogenetic Independent Contrasts (PIC) corrected results rather than these results because PIC correction removes non-independent errors to produce an unbiased R2 estimate (Rohlf, 2006).

Vertebrate Codon Adaptation Index of Species (CAIS) values are not greatly affected by computation for a standardized amino acid composition vs. computation for the amino acid frequencies in the species in question.

Protein domains have higher intrinsic structural disorder (ISD) when found in more exquisitely adapted species, according to (A) the Codon Adaptation Index of Species (CAIS) and (B) the Effective Number of Codons (ENC).
We plot -ENC rather than ENC to more easily compare results with those from CAIS. (C) Correcting for local rather than genome-wide %GC removes the relationship. Each datapoint is one of 118 vertebrate species with ‘complete’ intergenic genomic sequence available (allowing for %GC correction), and TimeTree divergence dates (allowing for Phylogenetic Independent Contrasts [PIC] correction). ‘Effects’ on ISD shown on the y-axis are fixed effects of species identity in our linear mixed model, after PIC correction. Red line shows unweighted lm(y ~ x) with gray region as 95% confidence interval. Panels without PIC correction are presented in Figure 4—figure supplement 1.

The same relationships are shown as in Figure 4, here without correction for phylogenetic confounding.
As in Figure 4, intrinsic structural disorder (ISD) of protein domains is higher in more highly adapted species, as measured by Codon Adaptation Index of Species (CAIS) (A) and Effective Number of Codons (ENC) (B), but not by CAIS calculated with local GC% rather than genome-wide GC% (C). ISD is calculated as in Figure 4. Red line shows lm(y ~ x), with gray region as 95% confidence interval.

Codon Adaptation Index of Species (CAIS) is not correlated with the degree to which local genomic regions differ in their GC content from global GC content.
If CAIS were driven by GC-biased gene conversion, genomes with more heterogeneous %GC distributions should have higher CAIS scores.

More exquisitely adapted species have higher intrinsic structural disorder (ISD) in both young (A and B) and old (C and D) protein domains, according to both the Codon Adaptation Index of Species (CAIS) (A, C), and the Effective Number of Codons (ENC) (B, D).
Age assignments are taken from James et al., 2021, with vertebrate protein domains that emerged prior to last eukaryotic common ancestor (LECA) classified as ‘old’, and vertebrate protein domains that emerged after the divergence of animals and fungi from plants as ‘young’. ‘Effects’ on ISD shown on the y-axis are fixed effects of species identity in our linear mixed model. The same n = 118 datapoints are shown as in Figures 3 and 4. Red line shows lm(y ~ x), with gray region as 95% confidence interval. Panels without Phylogenetic Independent Contrasts (PIC) correction are shown in Figure 6—figure supplement 1.

Without correction for phylogenetic confounding, more highly adapted species have higher intrinsic structural disorder (ISD) in both young (A and B) and old (C and D) protein domains, according to both the Codon Adaptation Index of Species (CAIS) (A, C), and the Effective Number of Codons (ENC) (B, D).
Tables
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Software, algorithm | IUPRED2 | DOI: https://doi.org/10.1093/nar/gky384 | RRID:SCR_014632 | |
Software, algorithm | Codon Adaptation Index of Species | This paper | See Materials and methods | |
Software, algorithm | Codon Adaptation Index | DOI: https://doi.org/10.1093/nar/15.3.1281 | ||
Software, algorithm | ape | DOI: https://doi.org/10.1093/bioinformatics/bty633 | RRID:SCR_017343 | R package |
Software, algorithm | Effective Number of Codons | DOI: https://doi.org/10.1093/oxfordjournals.molbev.a004201 |