Molecular function limits divergent protein evolution on planetary timescales

  1. Mariam M Konaté
  2. Germán Plata  Is a corresponding author
  3. Jimin Park
  4. Dinara R Usmanova
  5. Harris Wang
  6. Dennis Vitkup  Is a corresponding author
  1. Columbia University, United States
  2. Division of Cancer Treatment and Diagnosis, National Cancer Institute, United States
6 figures, 1 table and 5 additional files

Figures

Figure 1 with 7 supplements
Sequence divergence of enzyme orthologs as a function of time.

The global pairwise sequence identities between pairs of orthologous enzymes are shown as a function of divergence times between the corresponding species. Three models of amino acid substitution …

https://doi.org/10.7554/eLife.39705.002
Figure 1—figure supplement 1
Sequence divergence of enzyme orthologs as a function of time.

The global pairwise sequence identities between pairs of orthologous enzymes as a function of divergence times between the corresponding species. The colored lines indicate fits to the data using …

https://doi.org/10.7554/eLife.39705.003
Figure 1—figure supplement 1—source data 1

Sequence identity versus divergence times for 64 enzyme families.

https://doi.org/10.7554/eLife.39705.004
Figure 1—figure supplement 2
Projected long-term sequence identity of metabolic orthologs.

The distribution, across 64 EC numbers, of the projected global sequence identity after 7 billion years of sequence divergence according to the Model 3 (Equation 3). The vertical dashed red line …

https://doi.org/10.7554/eLife.39705.005
Figure 1—figure supplement 3
Divergence of orthologs with experimentally validated functional annotations.

Sequence identity as a function of divergence time for orthologous enzymes annotated with the same EC number and experimentally validated molecular functions. The color lines indicate fits to the …

https://doi.org/10.7554/eLife.39705.006
Figure 1—figure supplement 3—source data 1

Sequence identity versus divergence times for experimentally validated enzymes.

https://doi.org/10.7554/eLife.39705.007
Figure 1—figure supplement 4
Enzyme divergence rates at 30% sequence identity.

The distribution of sequence divergence rates between orthologous enzymes when their sequence identity reaches 30%. Divergence rates were defined as the decrease in percent sequence identity per …

https://doi.org/10.7554/eLife.39705.008
Figure 1—figure supplement 5
Sequence divergence of non-enzyme orthologs as a function of divergence time.

The global pairwise sequence identities between pairs of orthologous proteins that are not part of the EC nomenclature are shown as a function of divergence times between the corresponding species. …

https://doi.org/10.7554/eLife.39705.009
Figure 1—figure supplement 5— source data 1

Sequence identity versus divergence times for 29 non-enzyme families.

https://doi.org/10.7554/eLife.39705.010
Figure 1—figure supplement 6
Sequence divergence of mitochondrial ribosomal orthologs as a function of divergence time.

Global sequence identity as a function of divergence time for orthologous proteins with substantially different divergence rates between eukaryotic orthologs (red) and pairs of orthologs involving …

https://doi.org/10.7554/eLife.39705.011
Figure 1—figure supplement 6—source data 1

Sequence identity versus divergence times for mitochondrial ribosomal orthologs.

https://doi.org/10.7554/eLife.39705.012
Figure 1—figure supplement 7
Effect of uncertainty in the estimation of species divergence times on the model fits.

Effect of different estimates of species’ divergence times on the predicted long-term sequence identity between orthologous enzymes with the same molecular function. The data on the left shows the …

https://doi.org/10.7554/eLife.39705.013
The limit of long-term protein sequence divergence between orthologous proteins.

(a) The distribution of Y0 parameter values across 64 EC numbers for fits based on Model 2 (Equation 2). The Y0 parameter represents the minimum percentage of protein sites that remain identical at …

https://doi.org/10.7554/eLife.39705.014
Figure 3 with 1 supplement
Conservation of protein sites in phylogenetically independent lineages.

To identify the fractions of protein sites that are universally conserved ― defined as sites that are identical in at least 90% of orthologs ― we considered phylogenetically independent lineages. (a)…

https://doi.org/10.7554/eLife.39705.015
Figure 3—figure supplement 1
Distribution of enzyme sites according to their conservation frequency.

The panels show the distribution of protein sites according to their probability of being identical in phylogenetically independent lineages; the distributions are shown for 30 different enzyme …

https://doi.org/10.7554/eLife.39705.016
Figure 4 with 2 supplements
Sequence divergence of protein sites with different fitness effects of mutations measured in E. coli.

(a) The divergence of sequence identity for FolA protein sites with different average fitness effects of mutations (measured in E. coli) is shown using different colors. The average sequence …

https://doi.org/10.7554/eLife.39705.017
Figure 4—figure supplement 1
Distribution of average fitness effects of amino acid substitutions.

The panels show the distribution of average growth defects due to amino acid substitutions across FolA (a) and InfA (b) protein sites. The average growth (fitness) effects at each site was …

https://doi.org/10.7554/eLife.39705.018
Figure 4—figure supplement 2
Reproducibility of experimentally measured average fitness effects of amino acid substitutions across FolA sites.

The similarity between the average fitness effects of substitutions across FolA sites was calculated using two non-overlapping sets of substitutions (see Materials and methods). Each dot in the …

https://doi.org/10.7554/eLife.39705.019
Long-term structural evolution of enzymes with the same molecular function.

(a) The pairwise C-alpha root mean square deviation (RMSD) as a function of the divergence time between pairs of orthologs annotated with the same EC number. RMSD values were calculated based on …

https://doi.org/10.7554/eLife.39705.020
Figure 5—source data 1

RMSD versus divergence times for proteins with the same enzymatic function.

https://doi.org/10.7554/eLife.39705.021
Effect of functional specificity on long-term sequence and structural divergence between orthologs.

(a) Sequence identities between orthologous pairs of enzymes that diverged over two billion years ago. The long-term sequence divergence between pairs of orthologs sharing the same EC number (gray, …

https://doi.org/10.7554/eLife.39705.022
Figure 6—source data 1

Sequence identities between ancient orthologs sharing the same EC number and only the first three digits of their EC numbers.

https://doi.org/10.7554/eLife.39705.023

Tables

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional
information
Strain, strain
background
(Escherichia coli EcNR2)
MG1655, bla, bio-,
λ-Red+, mutS-::cmR
PMID: 19633652Addgene #26931
Sequence-based
reagent
90 bp DNA
oligos with
phosphorothioated bases
This paperSee Supplementary file 4100 nmole DNA
Plate oligo,
Integrated DNA
Technologies
Commercial assay or kitMiseq Reagent Kit V2IlluminaMS-102–2002
Commercial assay or kitsybr greenThermoFisherS7567
Commercial assay or kitQubit HS DNA kitThermoFisherQ32854
Commercial assay or kitQ5 Hot Start High-
Fidelity Mastermix
NEBM0494S
Commercial assay or kitDNA clean and
concentration kit 5
Zymo ResearchD4013
Commercial assay or kitillustra bacteria
genomicPrep Mini
Spin kit
GE life sciences28904259
Commercial assay or kitAgilent DNA 1000 kitAgilent Genomics5067–1504
Software, algorithmSeqPrep v1.1John St. Johnhttps://github.com/jstjohn/SeqPrep
Software, algorithmBowtie2PMID: 22388286
Software, algorithmPerl scripts to count
mutant reads
This paperhttps://github.com/platyias/count-MAGE-seq
(copy archived at https://github.com/elifesciences-publications/count-MAGE-seq).
OtherTurbidostat for growth
competition assay
PMID: 23429717

Additional files

Supplementary file 1

Considered model species and pairwise average divergence times.

https://doi.org/10.7554/eLife.39705.024
Supplementary file 2

Fitted model parameters, statistical test results and non-enzymatic protein annotations.

Supplementary file 2A Fitted model parameters and test results for the 64 considered activities (EC numbers). Supplementary file 2B Estimated rates of sequence divergence for pairs of orthologs according to Model three fits Supplementary file 2C Fitted model parameters and test results for 29 sets of orthologs not annotated with EC numbers Supplementary file 2D UniProt annotations of representative sequences from E. coli and H. sapiens for sets of orthologs not annotated with EC numbers Supplementary file 2E UniProt annotations of representative sequences from E. coli and H. sapiens for sets of orthologs not annotated with EC numbers and fast divergence rates in eukaryotes.

https://doi.org/10.7554/eLife.39705.025
Supplementary file 3

Average relative growth rate effect of amino acid substitutions in the FolA protein of E. coli.

https://doi.org/10.7554/eLife.39705.026
Supplementary file 4

DNA oligomers used to introduce amino acid substitutions along the FolA protein.

https://doi.org/10.7554/eLife.39705.027
Transparent reporting form
https://doi.org/10.7554/eLife.39705.028

Download links