The effectiveness of selection, calculated as the long-term ratio of time spent in fixed deleterious: fixed beneficial allele states given symmetric mutation rates, is a function of the product Ns. Assuming a diploid Wright-Fisher population with s ≪1, the probability of fixation of a new mutation , and the y-axis is calculated as π(N, —s)/(π(N, —s) + π(N, s)). s is held constant at a value of 0.001 and N is varied. Results for other small magnitude values of s are superimposable.

More highly adapted species (bottom) have a higher proportion of their genes subject to effective selection on codon bias (blue area). The CAI, when used correctly, compares the intensity of selection in comparable regions of the right tail (red boxes). Our new metric captures both this, and differences in the proportion of the genome subject to substantial selection.

CAIS reflects the expected relationship between effectiveness of selection and body size, while CAI, ENC, and CAIS calculated using local GC content do not. Note that a high ENC value means more codons are being used in the genome of the given species, so that the given species is less codon adapted, while a high CAIS value means that a species is more codon adapted. Body size data are from PanTHERIA database, originally in log10(mass) in grams prior to PIC correction. Data are shown for 62 species in common between PanTHERIA and our own dataset of 118 vertebrate species that have both “Complete” genome sequence available for calculating %GC, and TimeTree divergence dates available for use in PIC correction. P-values shown are for Pearson’s correlation. Red line shows unweighted Im(y∼x) with grey region indicated 95% confidence interval.

Protein domains have higher ISD when found in more exquisitely adapted species. Each datapoint is one of 118 vertebrate species with “complete” intergenic genomic sequence available (allowing for %GC correction) and TimeTree divergence dates (allowing for PIC correction). “Effects” on ISD shown on the y-axis are fixed effects of species identity in our linear mixed model, after PIC correction. Red line shows unweighted Im(y∼x) with grey region as 95% confidence interval.

More exquisitely adapted species have higher ISD in both ancient and recent protein domains. Age assignments are taken from James et al. (2021), with vertebrate protein domains that emerged prior to LECA classified as “old”, and vertebrate protein domains that emerged after the divergence of animals and fungi from plants as “young”. “Effects” on ISD shown on the y-a×is are fixed effects of species identity in our linear mixed model. The same n=H8 datapoints are shown as in Figures 3 and 4. P-values shown are for Pearson’s correlation. Red line shows Im(y∼×), with grey region as 95% confidence interval, using a weighted model for non-PIC-corrected figures and unweighted for PIC-corrected figures. Weighted models make the quantitative estimation of slopes more accurate. Dot size on left indicates weight.

Codon Adaptation Index is not appropriate for species-wide effectiveness of selection measurements. Each CAI value shown is averaged over an entire species’ proteome. A) The value of CAI is driven by its normalizing denominator term, CAImax. B) As a result, CAI is inversely proportional to CAIS. Each datapoint is one of 118 vertebrate species with “Complete” intergenic genomic sequence available (allowing for %GC correction) and TimeTree divergence dates (allowing for PIC correction). P-values shown are for Pearson’s correlation.

CAIS without correction for amino acid composition still reflects the expected relationship between effectiveness of selection and body. Body size data from PanTHERIA database, originally in log10(mass) in grams; data shown for 62 species in common between PANTHERIA and our own dataset of 118 vertebrate species with a “Complete” intergenic genomic sequence and TimeTree divergence dates. P-value is shown for Pearson’s correlation. Red line shows unweighted Im(y∼x) with grey region as 95% confidence interval.

Our Figure 4 finding that more exquisitely adapted species have protein domains with higher ISD is confirmed by ENC. The same n=118 datapoints are shown as in Figure 4. P-values shown are for Pearson’s correlation. Red line shows unweighted Im(y∼x) with grey region as 95% confidence interval.

More exquisitely adapted species have higher ISD in both ancient and recent protein domains, as confirmed by ENC. Protein domains that emerged prior to LECA are identified as “old”, and protein domains that emerged after the divergence of animals and fungi from plants and found in vertebrates are identified as “young”. Age assignments are taken from James et al. (2021). The same n=H8 datapoints are shown as in Figure 3. P-values shown are for Pearson’s correlation. Red line shows Im(y∼x), with grey region as 95% confidence interval, using a weighted model for non-PIC-corrected figures and unweighted for PIC-corrected figures.

Effective Number of Codons (ENC) and Codon Adaptation Index of Species (CAIS) are uncorrelated with total genomic GC Content. Each datapoint is one of 118 vertebrate species with “Complete” intergenic genomic sequence and TimeTree divergence dates. Statistical significance is only supported if it survives controlling for phylogenetic confounding via Phylogenetic Independent Contrasts (PIC). P-values shown are for Pearson’s correlation.