The global pairwise sequence identities between pairs of orthologous enzymes are shown as a function of divergence times between the corresponding species. Three models of amino acid substitution …
The global pairwise sequence identities between pairs of orthologous enzymes as a function of divergence times between the corresponding species. The colored lines indicate fits to the data using …
Sequence identity versus divergence times for 64 enzyme families.
The distribution, across 64 EC numbers, of the projected global sequence identity after 7 billion years of sequence divergence according to the Model 3 (Equation 3). The vertical dashed red line …
Sequence identity as a function of divergence time for orthologous enzymes annotated with the same EC number and experimentally validated molecular functions. The color lines indicate fits to the …
Sequence identity versus divergence times for experimentally validated enzymes.
The distribution of sequence divergence rates between orthologous enzymes when their sequence identity reaches 30%. Divergence rates were defined as the decrease in percent sequence identity per …
The global pairwise sequence identities between pairs of orthologous proteins that are not part of the EC nomenclature are shown as a function of divergence times between the corresponding species. …
Sequence identity versus divergence times for 29 non-enzyme families.
Global sequence identity as a function of divergence time for orthologous proteins with substantially different divergence rates between eukaryotic orthologs (red) and pairs of orthologs involving …
Sequence identity versus divergence times for mitochondrial ribosomal orthologs.
Effect of different estimates of species’ divergence times on the predicted long-term sequence identity between orthologous enzymes with the same molecular function. The data on the left shows the …
(a) The distribution of Y0 parameter values across 64 EC numbers for fits based on Model 2 (Equation 2). The Y0 parameter represents the minimum percentage of protein sites that remain identical at …
To identify the fractions of protein sites that are universally conserved ― defined as sites that are identical in at least 90% of orthologs ― we considered phylogenetically independent lineages. (a)…
The panels show the distribution of protein sites according to their probability of being identical in phylogenetically independent lineages; the distributions are shown for 30 different enzyme …
(a) The divergence of sequence identity for FolA protein sites with different average fitness effects of mutations (measured in E. coli) is shown using different colors. The average sequence …
The panels show the distribution of average growth defects due to amino acid substitutions across FolA (a) and InfA (b) protein sites. The average growth (fitness) effects at each site was …
The similarity between the average fitness effects of substitutions across FolA sites was calculated using two non-overlapping sets of substitutions (see Materials and methods). Each dot in the …
(a) The pairwise C-alpha root mean square deviation (RMSD) as a function of the divergence time between pairs of orthologs annotated with the same EC number. RMSD values were calculated based on …
RMSD versus divergence times for proteins with the same enzymatic function.
(a) Sequence identities between orthologous pairs of enzymes that diverged over two billion years ago. The long-term sequence divergence between pairs of orthologs sharing the same EC number (gray, …
Sequence identities between ancient orthologs sharing the same EC number and only the first three digits of their EC numbers.
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Strain, strain background (Escherichia coli EcNR2) | MG1655, bla, bio-, λ-Red+, mutS-::cmR | PMID: 19633652 | Addgene #26931 | |
Sequence-based reagent | 90 bp DNA oligos with phosphorothioated bases | This paper | See Supplementary file 4 | 100 nmole DNA Plate oligo, Integrated DNA Technologies |
Commercial assay or kit | Miseq Reagent Kit V2 | Illumina | MS-102–2002 | |
Commercial assay or kit | sybr green | ThermoFisher | S7567 | |
Commercial assay or kit | Qubit HS DNA kit | ThermoFisher | Q32854 | |
Commercial assay or kit | Q5 Hot Start High- Fidelity Mastermix | NEB | M0494S | |
Commercial assay or kit | DNA clean and concentration kit 5 | Zymo Research | D4013 | |
Commercial assay or kit | illustra bacteria genomicPrep Mini Spin kit | GE life sciences | 28904259 | |
Commercial assay or kit | Agilent DNA 1000 kit | Agilent Genomics | 5067–1504 | |
Software, algorithm | SeqPrep v1.1 | John St. John | https://github.com/jstjohn/SeqPrep | |
Software, algorithm | Bowtie2 | PMID: 22388286 | ||
Software, algorithm | Perl scripts to count mutant reads | This paper | https://github.com/platyias/count-MAGE-seq (copy archived at https://github.com/elifesciences-publications/count-MAGE-seq). | |
Other | Turbidostat for growth competition assay | PMID: 23429717 |
Considered model species and pairwise average divergence times.
Fitted model parameters, statistical test results and non-enzymatic protein annotations.
Supplementary file 2A Fitted model parameters and test results for the 64 considered activities (EC numbers). Supplementary file 2B Estimated rates of sequence divergence for pairs of orthologs according to Model three fits Supplementary file 2C Fitted model parameters and test results for 29 sets of orthologs not annotated with EC numbers Supplementary file 2D UniProt annotations of representative sequences from E. coli and H. sapiens for sets of orthologs not annotated with EC numbers Supplementary file 2E UniProt annotations of representative sequences from E. coli and H. sapiens for sets of orthologs not annotated with EC numbers and fast divergence rates in eukaryotes.
Average relative growth rate effect of amino acid substitutions in the FolA protein of E. coli.
DNA oligomers used to introduce amino acid substitutions along the FolA protein.