The protein domains of vertebrate species in which selection is more effective have greater intrinsic structural disorder

  1. Catherine A Weibel
  2. Andrew L Wheeler
  3. Jennifer E James
  4. Sara M Willis
  5. Hanon McShea
  6. Joanna Masel  Is a corresponding author
  1. Department of Mathematics, University of Arizona, United States
  2. Department of Physics, University of Arizona, United States
  3. Genetics Graduate Interdisciplinary Program, University of Arizona, United States
  4. Department of Ecology and Evolutionary Biology, University of Arizona, United States
  5. Department of Earth System Science, Stanford University, United States
7 figures, 1 table and 1 additional file

Figures

The effectiveness of selection, calculated as the long-term ratio of time spent in fixed deleterious: fixed beneficial allele states given symmetric mutation rates, is a function of the product sN.

Assuming a diploid Wright–Fisher population with s << 1, the probability of fixation of a new mutation π(N,s)=1es21eNs , and the y-axis is calculated as πN,-s/(πN,-s+πN,s). s is held constant at a value of 0.001 and N is varied. Results for other small magnitude values of s are superimposable. For small sN, selection is ineffective at producing codon bias. For large sN, selection is highly effective. For only a relatively narrow range of intermediate values of sN, the degree of codon bias depends quantitatively on sN.

More highly adapted species (bottom) have a higher proportion of their sites subject to effective selection on codon bias (blue area).

The Codon Adaptation Index (CAI) attempts to compare the intensity of selection (Figure 1, x-axis) in a subset of genes under strong selection (red areas). Given the narrow range of quantitative dependence of codon bias on sN shown in Figure 1, our new metric is intended to capture differences in the proportion of the proteome subject to substantial selection (blue areas).

Figure 3 with 3 supplements
Codon Adaptation Index (CAI) is seriously confounded with GC content (A), while Effective Number of Codons (ENC) and Codon Adaptation Index of Species (CAIS) are not (B and C).

We control for phylogenetic confounding via Phylogenetic Independent Contrasts (PIC) (Felsenstein, 1985); this yields an unbiased R2 estimate (Rohlf, 2006). Each datapoint is one of 118 vertebrate species with ‘Complete’ intergenic genomic sequence (allowing for %GC correction) and TimeTree divergence dates (allowing for PIC correction). Red line shows unweighted lm(y ~ x) with gray region as 95% confidence interval. Figure 3—figure supplement 1 shows in more detail why CAI is not appropriate for species-wide effectiveness of selection measurements. Plots without PIC correction are shown in Figure 3—figure supplement 2. The impact of amino acid frequency correction on CAIS is shown in Figure 3—figure supplement 3.

Figure 3—figure supplement 1
Codon Adaptation Index (CAI) is not appropriate for species-wide effectiveness of selection measurements.

Each CAI value shown is averaged over an entire species’ proteome. (A) The value of CAI is driven by its normalizing denominator term, CAImax. (B) As a result, CAI is inversely proportional to Codon Adaptation Index of Species (CAIS). Each datapoint is one of 118 vertebrate species with ‘Complete’ intergenic genomic sequence available (allowing for %GC correction) and TimeTree divergence dates (allowing for Phylogenetic Independent Contrasts [PIC] correction). p-values shown are for Pearson’s correlation.

Figure 3—figure supplement 2
The same relationships are shown as in Figure 3, but without correction for phylogenetic confounding, suggesting GC confounding for the Effective Number of Codons (ENC) but not the Codon Adaptation Index of Species (CAIS).

Codon Adaptation Index (CAI) (A) and ENC (B) both correlate with genomic GC, but CAIS (C) does not. Red line shows lm(y ~ x), with gray region as 95% confidence interval. We use Phylogenetic Independent Contrasts (PIC) corrected results rather than these results because PIC correction removes non-independent errors to produce an unbiased R2 estimate (Rohlf, 2006).

Figure 3—figure supplement 3
Vertebrate Codon Adaptation Index of Species (CAIS) values are not greatly affected by computation for a standardized amino acid composition vs. computation for the amino acid frequencies in the species in question.
Figure 4 with 1 supplement
Protein domains have higher intrinsic structural disorder (ISD) when found in more exquisitely adapted species, according to (A) the Codon Adaptation Index of Species (CAIS) and (B) the Effective Number of Codons (ENC).

We plot -ENC rather than ENC to more easily compare results with those from CAIS. (C) Correcting for local rather than genome-wide %GC removes the relationship. Each datapoint is one of 118 vertebrate species with ‘complete’ intergenic genomic sequence available (allowing for %GC correction), and TimeTree divergence dates (allowing for Phylogenetic Independent Contrasts [PIC] correction). ‘Effects’ on ISD shown on the y-axis are fixed effects of species identity in our linear mixed model, after PIC correction. Red line shows unweighted lm(y ~ x) with gray region as 95% confidence interval. Panels without PIC correction are presented in Figure 4—figure supplement 1.

Figure 4—figure supplement 1
The same relationships are shown as in Figure 4, here without correction for phylogenetic confounding.

As in Figure 4, intrinsic structural disorder (ISD) of protein domains is higher in more highly adapted species, as measured by Codon Adaptation Index of Species (CAIS) (A) and Effective Number of Codons (ENC) (B), but not by CAIS calculated with local GC% rather than genome-wide GC% (C). ISD is calculated as in Figure 4. Red line shows lm(y ~ x), with gray region as 95% confidence interval.

Codon Adaptation Index of Species (CAIS) is not correlated with the degree to which local genomic regions differ in their GC content from global GC content.

If CAIS were driven by GC-biased gene conversion, genomes with more heterogeneous %GC distributions should have higher CAIS scores.

Figure 6 with 1 supplement
More exquisitely adapted species have higher intrinsic structural disorder (ISD) in both young (A and B) and old (C and D) protein domains, according to both the Codon Adaptation Index of Species (CAIS) (A, C), and the Effective Number of Codons (ENC) (B, D).

Age assignments are taken from James et al., 2021, with vertebrate protein domains that emerged prior to last eukaryotic common ancestor (LECA) classified as ‘old’, and vertebrate protein domains that emerged after the divergence of animals and fungi from plants as ‘young’. ‘Effects’ on ISD shown on the y-axis are fixed effects of species identity in our linear mixed model. The same n = 118 datapoints are shown as in Figures 3 and 4. Red line shows lm(y ~ x), with gray region as 95% confidence interval. Panels without Phylogenetic Independent Contrasts (PIC) correction are shown in Figure 6—figure supplement 1.

Figure 6—figure supplement 1
Without correction for phylogenetic confounding, more highly adapted species have higher intrinsic structural disorder (ISD) in both young (A and B) and old (C and D) protein domains, according to both the Codon Adaptation Index of Species (CAIS) (A, C), and the Effective Number of Codons (ENC) (B, D).

Age assignments and ISD effects are calculated as in Figure 6. Same n = 118 datapoints are shown as in Figures 35. Red line shows lm(y ~ x), with gray region as 95% confidence interval.

Author response image 1

Tables

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Software, algorithmIUPRED2DOI: https://doi.org/10.1093/nar/gky384RRID:SCR_014632
Software, algorithmCodon Adaptation Index of SpeciesThis paperSee Materials and methods
Software, algorithmCodon Adaptation IndexDOI: https://doi.org/10.1093/nar/15.3.1281
Software, algorithmapeDOI: https://doi.org/10.1093/bioinformatics/bty633RRID:SCR_017343R package
Software, algorithmEffective Number of CodonsDOI: https://doi.org/10.1093/oxfordjournals.molbev.a004201

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Catherine A Weibel
  2. Andrew L Wheeler
  3. Jennifer E James
  4. Sara M Willis
  5. Hanon McShea
  6. Joanna Masel
(2024)
The protein domains of vertebrate species in which selection is more effective have greater intrinsic structural disorder
eLife 12:RP87335.
https://doi.org/10.7554/eLife.87335.3