Universal and taxon-specific trends in protein sequences as a function of age

  1. Jennifer E James  Is a corresponding author
  2. Sara M Willis
  3. Paul G Nelson
  4. Catherine Weibel
  5. Luke J Kosinski
  6. Joanna Masel  Is a corresponding author
  1. Department of Ecology and Evolutionary Biology, University of Arizona, United States
  2. Department of Physics, University of Arizona, United States
  3. Department of Mathematics, University of Arizona, United States
  4. Department of Molecular and Cellular Biology, University of Arizona, United States
5 figures and 3 additional files

Figures

Figure 1 with 4 supplements
Young domains have high intrinsic structural disorder (ISD), but this trend is driven exclusively by recent animal domains.

Results are for non-transmembrane pfams. (A) The brown linear regression was calculated for recent animal pfams (slope = −0.062, R2 = 0.0097, p = 6 × 10−7, n (number of pfams) = 2456), green for …

Figure 1—figure supplement 1
Phylogenetic tree of all species used in this analysis.

Lineages have been color coded as follows: black, protists; green, plants; blue, fungi; brown; animals. Labels omitted for clarity, full species list and phylogenetic tree available at https://github…

Figure 1—figure supplement 2
Intrinsic structural disorder (ISD) depends on age for whole genes, as previously reported by Foy et al., 2019.

Each data point consists of the average across all instances of a homologous gene family (see Materials and methods) across all species (A) or just in mouse (B), dated according to the oldest pfam …

Figure 1—figure supplement 3
Young animal domains have high intrinsic structural disorder (ISD), while we have limited power to detect trends in young plant domains.

Linear regressions are calculated over recent animal (A, brown line) or recent plant domains (B, green line), and over ancient pfams (black lines), specific to occurrences in either animals (A) or …

Figure 1—figure supplement 4
Recalculating intrinsic structural disorder (ISD) after excising cysteine residues has very little effect on our results.

Details are as in Figure 1. Regression calculated for recent animal pfams (brown slope = −0.073, R2 = 0.014, p = 3 × 10−9), recent plant pfams (green slope = 0.013, p = 0.7), and ancient pfams in …

Figure 2 with 1 supplement
Trends in amino acid frequencies as a function of age differ across lineages.

Results are shown for non-transmembrane pfams. Phylostratigraphy slopes are in units of the change in percentage points of an amino acid per billion years. ‘Ancient’ refers to pfams older than 2101 …

Figure 2—figure supplement 1
Phylostratigraphy slopes are not significantly correlated with hydrophobicity, as measured by 1-relative solvent accessibility (RSA) (Tien et al., 2013).

Phylostratigraphy slopes are in units of percentage points of composition per billion years. Lines indicate the standard errors on the slopes. ‘Ancient’ refers to pfams older than 2101 MY …

Figure 3 with 2 supplements
Ancient domains exhibit similar amino acid trends, whether transmembrane or non-transmembrane.

Phylostratigraphy slopes are in units of percentage point change in composition per billion years. Taxonomic and temporal sub sets of the data are as the same as in Figure 2. Lines indicate the …

Figure 3—figure supplement 1
Phylostratigraphy slopes are not significantly correlated (after correction for multiple testing) with relative amino acid changeability.

Changeability scores are relative to the least changeable amino acid, tryptophan (W), which (Tourasse and Li, 2000) assigned a value of 1 in both the transmembrane and the non-transmembrane cases. …

Figure 3—figure supplement 2
Phylostratigraphy slopes are not significantly correlated with the amino acid flux estimates of Jordan et al., 2005.

Phylostratigraphy slopes are in units of percentage points of composition per billion years. Lines indicate the standard errors on the slopes. ‘Ancient’ refers to pfams older than 2101 MY …

Figure 4 with 2 supplements
Ancient amino acid phylostratigraphy slopes reflect the order of recruitment of amino acids into the genetic code.

Phylostratigraphy slopes for non-transmembrane (A) and transmembrane (B) pfams are in units of percentage points of composition per billion years, with lines indicating the standard errors on the …

Figure 4—figure supplement 1
Order of amino acid recruitment does not affect domain composition in more recent lineages.

Phylostratigraphy slopes are in units of percentage points of composition per billion years. Lines indicate the standard errors on the slopes. ‘Animal’ (nontransmembrane and transmembrane Spearman’s …

Figure 4—figure supplement 2
Phylostratigraphy slopes are not significantly correlated (after correction for multiple testing) to the cost of production of amino acids (aerobic metabolic cost, as estimated in yeast [Raiford et al., 2008]).

Phylostratigraphy slopes are in units of percentage points of composition per billion years, with lines indicating the standard errors on the slopes. ‘Ancient’ refers to pfams older than 2101 MY …

Figure 5 with 2 supplements
Young domains have more clustered hydrophobic amino acids (A), and the trend in clustering with age is consistent across time and taxonomic groups (B).

n (number of pfams) = 8002, 3100, 2456, and 183 for all, ancient, animal, and plant groups, respectively. Clustering has an expected value of 1 for randomly distributed amino acids. Results are …

Figure 5—figure supplement 1
Hydrophobic clustering of complete genes depends on age, as previously reported by Foy et al., 2019.

Clustering has an expected value of 1 for randomly distributed amino acids. Each data point consists of the average across all instances of homologous pfams, across all species in which it occurs. …

Figure 5—figure supplement 2
Young animal domains have more clustered hydrophobic amino acids, continuing the trend among ancient domains that can be seen in both animals and plants.

There is limited power to see trends among young plant domains. Linear regressions are calculated over recent animal (A, brown line) or recent plant domains (B, green line), and over ancient pfams …

Additional files

Supplementary file 1

Pfam amino acid frequency phylostratigraphy slopes, calculated over different subsets of the data.

https://cdn.elifesciences.org/articles/57347/elife-57347-supp1-v2.xlsx
Supplementary file 2

The full set of species used in the analysis.

https://cdn.elifesciences.org/articles/57347/elife-57347-supp2-v2.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/57347/elife-57347-transrepform-v2.docx

Download links