Coenzyme-Protein Interactions since Early Life

Alma Carolina Sanchez-Rocha; Mikhail Makarov; Lukáš Pravda; Marian Novotný; Klára Hlouchová

doi:10.7554/eLife.94174.1

eLife assessment

This study presents a useful examination of the prevalence of interactions between amino acids from different periods of Earth's history and coenzymes. While the premise of this work is well founded, the data lend themselves to alternative interpretations, suggesting that the main conclusions might be incompletely supported by the findings. The work would benefit from the inclusion of additional supplementary data and further analysis. This manuscript would be of interest to evolutionary biologists and biophysicists.

https://doi.org/10.7554/eLife.94174.1.sa2

Significance of findings

useful: Findings that have focused importance and scope

landmark
fundamental
important
valuable
useful

Strength of evidence

incomplete: Main claims are only partially supported

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Recent findings in protein evolution and peptide prebiotic plausibility have been setting the stage for reconsidering the role of peptides in the early stages of life’s origin. Ancient protein families have been found to share common themes and proteins reduced in composition to prebiotically plausible amino acids have been reported capable of structure formation and key functions, such as binding to RNA. While this may suggest peptide relevance in early life, their functional repertoire when composed of a limited number of early residues (missing some of the most sophisticated functional groups of today’s alphabet) has been debated.

Cofactors enrich the functional scope of about half of extant enzymes but whether they could also bind to peptides lacking the evolutionary late amino acids remains speculative. The aim of this study was to resolve the early peptide propensity to bind organic cofactors by analysis of protein-coenzyme interactions across the Protein Data Bank (PDB). We find that the prebiotically plausible amino acids are more abundant in the binding sites of the most ancient coenzymes and that such interactions rely more frequently on the involvement of the protein backbone atoms and metal ion cofactors. Moreover, we have identified a few select examples in today’s enzymes where coenzyme binding is supported solely by prebiotically available amino acids. These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution.

Introduction

Organic and inorganic cofactors occupy about half of all known protein structures, expanding across all the enzyme E.C. classes (Putignano, 2018; Mukhopadhyay et al., 2019). While their role in current life is indisputable, some of the cofactors were present and apparently crucial also during life’s early evolution (Chu and Zhang 2020; Goldman and Kacar 2021; Kirschning et al., 2021; Fried et al., 2022; Kirschning, 2022). The significance of metal ions has been broadly discussed, regardless of the different origins-of-life scenarios and has somewhat overshadowed that of organic cofactors (e.g. Wächtershäuser, 1992; Russell and Hall, 1997; Lane and Martin, 2012 ; Chu and Zhang, 2020; Fried et al., 2022).

Diverse lines of evidence have however indicated that many of the extant organic cofactors (coenzymes) date back to the earliest life while their core chemistries have been detected in abiotic material such as recently reported by the Hayabusa2 mission (Holliday et al., 2007; Fried et al., 2022; Naraoka et al., 2023). At the same time, these ancient coenzymes – often of nucleotide origin - have been traced to the most ancient protein folds (such as P-loop NTPases, TIM beta/alpha-barrels, OB and Rossmann folds) that date before the Last Universal Common Ancestor (LUCA) (Goldman and Kacar, 2021; Caetano-Anollés et al., 2007; Goldman et al., 2013; Longo et al., 2020 (a); Kessel and Ben-Tal, 2022). Within the most ancient folds, tens of peptide fragments/themes have been identified throughout seemingly unrelated structural domains, and frequently found to mediate ligand binding (Söding and Lupas, 2003; Alva et al., 2015; Narunsky et al., 2020; Kolodny et al., 2021). Such themes may well represent the remnants of protoenzymes in a peptide-nucleotide world (Fried et al., 2022). Chu and Zhang recently proposed that cofactors could initially “select” the earliest primitive proteins from the vast sequence space by the ability to bind them (Chu and Zhang, 2020). More generally, binding of cofactors to peptides could thus determine the evolution of both protoenzyme function and folding preferences (Tokuriki and Tawfik, 2009).

Prior to the fixation of Central Dogma and ribosomal synthesis, peptides would condense from amino acids (or their alternatives) prebiotically abundant in the environment (Frenkel-Pinter et al. 2019; Frenkel-Pinter et al. 2020; Fried et al., 2022). Independent meta-analyses of the amino acid alphabet evolution based on different possible sources of organic material and different disciplines point towards an “early alphabet” of ∼10 residues (Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr and Val) (Higgs and Pudritz, 2009; Trifonov, 2000; Cleaves, 2010). These could be supplemented by other prebiotically plausible non-canonical amino acids while the other half of the canonical alphabet is assumed to be the product of later biosynthesis (Wong and Bronskill, 1979; Weber and Miller, 1981; Burton et al., 2012; Zaia et al., 2008). Typically, the early amino acids (canonical as well as non-canonical) are smaller and less complex, missing e.g. sulfur groups and aromatics. Additionally, the canonical early alphabet lacks positively charged residues. An emerging question therefore is whether coenzymes could bind to small proteins of prebiotic relevance and whether they could be bound by the prebiotically available residues. In such a scenario, cofactors would provide a palette of functional groups to the early peptide world which would nominate them relatively sophisticated structural and catalytic hubs (Milner-White and Russell, 2011). Alternatively, if coenzymes could not be bound by these simple amino acids, this would suggest that their pairing with peptide molecules would become relevant only after the evolution of the full amino acid alphabet.

Work from our group and others has recently demonstrated that in select cases, protein sequences re-engineered from the early amino acids can still bind to nucleic acid and nucleotide-based cofactors (Longo et al., 2020 (b); Makarov et al., 2021; Giacobelli et al., 2022). Whether this phenomenon is still seen in today’s biology, its abundance and laws represent open questions. Here, we present a systematic survey of coenzyme binding throughout the PDB database. The outcomes of our study support that the coenzyme binding characteristics by amino acids differ by their evolutionary age. Early amino acids are enriched in binding pockets of the most ancient coenzymes and the interaction relies predominantly on the protein backbone groups. Selected examples show that unlike evolutionary younger cofactors, the ancient cofactors can still be bound in proteins only by early amino acids. Our analysis therefore points to an early peptide-coenzyme significance, preceding evolution of proteosynthesis and fixation of the Central Dogma.

Results

Identification of coenzymes in PDB

We identified all the available structures from the PDB that interact with the 27 coenzyme classes as defined in Fischer et al., 2010. In addition, ATP (that was not included in that study) was included here. Using these parameters, we found 25,822 protein structures and 81 nucleic acid macromolecules (Supplementary Table 1, Supplementary File 1). The protein structures were assigned to 8194 UniProt (The UniProt Consortium, 2023) codes. Those UniProt sequences were clustered by 90% identity and resulted in 7399 unique UniProt entries, corresponding to 21,317 protein structures (Fig. 1). In parallel, the clustering was also performed for 30% sequence identity, resulting in 3544 UniProt codes and 9645 PDB structures.

Workflow of this study. All available coenzymes in the PDB were identified according to the CoFactor database (Fischer et al., 2010). The PDB entries of structures bound to coenzymes were downloaded programmatically through the PDBe REST API (pdbe.org/api), including the interatomic cofactor-protein interactions, calculated by Arpeggio (Jubb et al, 2017). The coenzyme binding amino acids were mapped to Uniprot databases via SIFTS (Velankar et al., 2013; Dana et al., 2019). PDB entries were grouped by UniProt code; redundancy was removed by clustering the UniProt sequences by 90% (and in parallel also 30%) sequence identity.

The interaction ratio method was adopted to identify the most relevant residues in coenzyme binding sites. For each protein (unique UniProt ID) we defined the cofactor binding site as a subset of amino acids that appeared to interact with the cofactor in at least 50% of the structures of that particular protein within our dataset to pinpoint the amino acids that are important for the interaction. This methodology does not consider any qualitative criteria (e.g. resolution, R-factor, Clashscore).

Our database is composed of protein structures from all the members of cellular domains - Bacteria (54.3%), Archaea (6.2%), Eukaryota (37.8%), Viruses (1.5%), metagenomes and not assigned (0.3%) (Supplementary Table 1).

Evolutionary classification of coenzymes

To differentiate the evolutionary age of the analyzed coenzymes, we further adapted the classification system from Fried et al., 2022. This system encompasses four primary categories and one additional subcategory: i) “Ancient” coenzymes, including the subcategory “Nucleotide derived”; ii) “LUCA” coenzymes; iii) “Post-LUCA” coenzymes; and iv) “Unclassified” coenzymes (Fig.2).

Classification of coenzymes and amino acids by their assumed evolutionary temporality. The “Unclassified” coenzymes Thiamine diphosphate, Coenzyme M, Factor F430 and Glutathione are not shown in the scheme.

“Ancient” coenzymes comprise those that could be prebiotically synthesized, according to available studies (Miller and Schlesinger, 1993; Keefe et al., 1995; Holliday et al., 2007; Kirschning, 2021; Menor-Salván et al., 2022; Pinna et al., 2022); while the subcategory “Nucleotide derived” includes cofactors chemically derived from nucleotides (White, 1975; (White, 1975; Monteverde et al., 2017). “LUCA” coenzymes were presumably present in the last universal common ancestor (LUCA) and exhibit a universal distribution among Bacteria, Archaea, and Eukarya, although their prebiotically feasible synthesis was not established. “Post-LUCA” coenzymes likely originated only after the divergence of the three cellular domains, mirrored in their non-universal distribution. “Unclassified” coenzymes do not conform to the classification scheme. As a typical representative of the latest category, Coenzyme M has been synthesized under prebiotic conditions (Miller and Schlesinger, 1993; Kirschning, 2021), nonetheless, its biosynthetic pathways in Archaea and Bacteria have been shown to arise through convergent evolution and it is mainly prevalent in methanogens (Wu et al., 2022). Factor F430 is a coenzyme only distributed in methanogens (Thauer and Bonacker, 1994), although its precursors have been synthesized prebiotically (Seitz et al., 2021). Glutathione is another example of a coenzyme with restricted biological distribution, being mainly in eukaryotes, Gram-negative bacteria, and one archaea phylum (Copley and Dhillon, 2002) and the feasibility of its prebiotic synthesis remains unclear (Bonfio et al., 2017). Thiamine diphosphate was also designated as unclassified. Although the definitive prebiotic synthesis of thiamine diphosphate remains unclear, preliminary investigations conducted by Aylward (2006) and Aylward & Bofinger (2006) suggest its presence in the prebiotic world. Its nucleotide nature (White, 1975) and the existence of its universal riboswitch (Barrick and Breaker, 2007) provide compelling evidence of its potential status as an ancient coenzyme.

The Ancient coenzymes represent the most abundant class of our PDB dataset, dominated by ATP, NAD, Heme, FAD, SAM, and Coenzyme A structures and amounting to 94% of all analyzed structures in our database grouped by UniProt codes. Within the enzyme E.C. classification, oxidoreductases and transferases represent the classes with most abundant coenzyme content. While the LUCA, Post-LUCA and Unclassified coenzymes are typically found in specific enzyme classes, the Ancient coenzymes are distributed across all the E.C. classes (Supplementary Fig. 1).

Distribution of amino acids in the coenzyme binding sites

We hypothesized that the evolutionary significance of individual coenzyme classes would be reflected in distinct amino acid binding propensities as a smaller “early” protein alphabet apparently preceded its canonical version. The abundance of residues that compose each coenzyme binding site was analyzed and examined with respect to the order by which individual amino acids have been reported to enter the protein canonical alphabet (Higgs and Pudritz 2009) (Fig. 3; Supplementary Table 2). The binding site composition for both 90% and 30% identity datasets revealed that the occupancy of early amino acids is higher in ancient coenzyme binding sites and tends to decrease in LUCA and Post-LUCA cofactor sites (Fig. 3, Supplementary Table 2). Overall, for the 90% dataset the average occupancy of early vs. late amino acids in the ancient sites is 61% vs 39% while this ratio decreases to 53% vs 47% for the LUCA and 47% vs 53% in post-LUCA sites. These numbers follow the same trend for the 30% identity dataset and throughout the rest of the analysis, the 90% identity dataset – which includes higher number of proteins - was evaluated for more robust statistical analysis. To examine the impact that the distribution of amino acids in the coenzyme binding sites has on the binding modes, the interactions between coenzymes and individual amino acids were further inspected.

Early versus late amino acid composition of the coenzyme binding sites, categorized according to the evolutionary ages of coenzymes. Early amino acids are shown in color blue and late residues in red. The dashed line corresponds to the proportion of early vs. late amino acids within the UniProt composition of the sequences derived from our database (67% early and 33% late residues). The statistical significance of the early versus late amino acid composition was assessed by a Chi-squared test (P < 0.0001). Detailed statistical data are listed in Supplementary Table 6.

Interaction types between coenzymes and proteins

First, backbone vs side chain protein interactions of all the coenzyme classes were mapped (Fig. 4A). As expected, most of the interactions with coenzymes are mediated by amino acid side chains (61 %), frequently in combination with backbone (24%) (Fig. 4). Nevertheless, purely backbone interactions prevail in ancient coenzymes (24 %) (Fig. 4A). When backbone interactions are present throughout the different coenzyme classes, they are dominated by the early amino acids (Fig. 4B).

Binding of coenzymes with early and late amino acids by backbone and side chain atoms. “Backbone” interactions refer to residues in the coenzyme binding sites that interact purely through amino acid backbone atoms. “Side chain” interactions involve residues that interact solely via side chain atoms. “Backbone & Side chain” residues are those that interact with the coenzyme using both their backbone and side chain atoms. (A) Abundance of amino acids in individual studied coenzymes. “Backbone & Side chain” interactions are not depicted. Unclassified cofactors are in gray, Post-LUCA in yellow, LUCA in cyan and Ancient in purple. Amino acids are ranked by the order of addition of amino acids to the genetic code (Higgs and Pudritz, 2009). (B) Proportion of early versus late residues in coenzyme categories by interaction type. In each coenzyme category, the individual proportions add up to 100%. The amino acid composition was normalized by the percentage of late residues from the UniProt sequences retrieved from our database. The statistical significance of early versus late amino acid composition for each interaction type per coenzyme temporality was determined by a Chi-squared test (*, P < 0.05; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001) . For detailed statistical analysis, refer to Supplementary Table 7.

Next, we inspected the interaction types for each amino acid-coenzyme binding event employing Arpeggio (Jubb et al., 2017). The analysis revealed that electrostatic interactions are dominant in all coenzyme ages (Supplementary Fig. 2). In ancient cofactors, electrostatic interactions are more frequently mediated by early residues. This trend is more significant for the nucleotide-derived ancient coenzymes. The second most prevalent interaction type is Van der Waals for ancient cofactors while hydrophobic interactions are similarly frequent as Van der Waals in the LUCA and post-LUCA classes.

Structural properties of cofactor binding sites

Structural properties of proteins have been observed to change during eons of life’s evolution (Edwards et al., 2013; Lupas and Alva, 2017; Kovacs et al., 2017) To map its interdependence with binding of cofactors, the secondary structure and fold classes of individual coenzyme binding sites were analyzed here.

There are detectable differences in the binding site secondary structure content among the coenzyme classes (Fig. 5, Supplementary Fig. 3). While loops and helices dominate all the binding sites, they are less represented in the Ancient and LUCA coenzyme sites which are more rich in beta-sheet structures (Fig. 5). This distinction is found only on the level of the binding sites and not preserved on the level of the overall protein structure.

Secondary structure content in coenzyme binding sites. Composition of secondary structural elements in amino acids interacting with coenzymes. The PDB category represents secondary structure content across the dataset for comparison with coenzyme binding sites. Additional statistical analyses are shown in Supplementary Table 8.

To explore the fold diversity of domains containing the coenzyme binding sites, we assigned their ECOD X-groups (Cheng et al., 2014) at a residue level (Supplementary Table 3). In total, 101 groups were identified. The ancient coenzymes are associated with higher numbers of different X-groups than the LUCA and post-LUCA cofactors (Fig. 6). Some coenzymes stand out by their large number of associated folds: the ancient ATP (74); Coenzyme A (34); NAD (30); Heme (27) and the unclassified cofactor Glutathione (25).

Fold diversity of coenzyme binding sites. (A) Folds represented by ECOD X-groups, according to numbers of coenzyme binding sites. (B) Comparison of numbers of ECOD X-groups vs. UniProt entries per cofactor class.

The most frequently observed X-groups include Rossmann-like, Alpha-beta plaits, TIM beta/alpha-barrel, Flavodoxin-like, cradle loop barrel, HUP domain-like and beta-Grasp (Fig. 6). Among these, Rossmann-like, TIM beta/alpha-barrel and Flavodoxin-like bind to coenzymes of all ages.

Coenzyme early vs late binding sites

To further explore whether extant proteins can bind enzymes only by early or only by late residues (featuring early vs late binding sites), we looked for these specific cases and analyzed their evolutionary conservation.

We found 25 PDB entries that contain at least one chain bound to coenzymes solely by early amino acids (Fig. 7; Supplementary Fig. 4). Those structures correspond to 17 different proteins, represented by unique UniProt codes. The full set of those proteins bind exclusively ancient coenzymes: ATP, NAD and Phosphopantetheine. In comparison, 15 PDB entries, representing 12 unique proteins, bind coenzymes only by late amino acids (Fig. 7; Supplementary Fig. 4). These examples include all Ancient-to-Post-Luca coenzymes: ATP, CoA, NAD, PLP, biotin, and ascorbic acid.

Examples of coenzyme binding solely through early or late amino acids. (A) Coenzymes bound exclusively by early residues (AMP bound by ATP-phosphoribosyltransferase. PDB code 6czm (chain B) created by LIGPLOT (Laskowski and Swindells, 2011). (B) Coenzyme, entirely bound by late residues (Ascorbic acid bound by Hyaluronate lyase. PDB code 1f9g (chain A), created by LIGPLOT).

To assess the conservation of amino acids in these specific binding sites we used ConSurf (Ashkenazy et al., 2010; Ashkenazy et al., 2016). According to the analysis, both the early and late binding sites are relatively highly conserved. Around 60% of the residues from both cases have conservation scores ≥7. Furthermore, we employed the MAX AA parameter, that represents the most abundant residue in the multiple sequence alignment of all homologs. 76 vs 72% of the residues in the early vs late binding sites are the same, which suggests their evolutionary conservation in both cases.

Coenzyme binding mediated by metal ions

Because of the significance of metal ions in both extant and early life, we also analyzed coenzyme binding via metal ions (Supplementary Table 4). Notably, this phenomenon is more frequent in ancient cofactors, constituting approximately 24% of coenzyme binding sites that have at least one metal ion. Younger cofactors exhibit a lower requirement for metal ion binding. LUCA coenzymes exhibit a metal ion binding in approximately 13% of instances, while in post-LUCA, 11% and in Unclassified coenzymes, about 23%. Certain coenzymes exhibit a notably high percentage of cases reliant on at least one Mg2+ ion (76 % in case of Thiamine diphosphate and 55% in case of ATP binding). The subsequent most prevalent mediating ion is Ca2+, found along with 65 % cases of the LUCA coenzyme Pyrroloquinoline Quinone. Following Ca2+, the next most frequent metal ions mediating coenzyme binding are Mn2+ and Fe2+.

Discussion

Enzymatic activities rely heavily on interplay with organic cofactors. Those are found at the very heart of cellular metabolism and some of them quite possibly branch deep to life’s early start (White, 1976). The core chemistries of the most abundant and ancient coenzymes have been repeatedly detected in material and experiments mimicking prebiotic environments (Miller and Schlesinger, 1993; Keefe et al., 1995; Holliday et al., 2007; Kirschning, 2021; Menor-Salván et al., 2022; Pinna et al., 2022). Along with metal ions and minerals, some of the extant coenzymes could probably catalyze metabolic reactions in the absence of enzymes, before their emergence (Muchowska et al. 2020; Henriques Pereira et al. 2022; Cvjetan et al., 2023; Dherbassy et al. 2023). When and how coenzymes seeded the functional hubs of today’s enzymes represents a fundamental bridge between prebiotic chemistry and biochemistry and therefore one of the central questions in the study of life’s origins (Preiner et al. 2020).

The aim of this study was to resolve this conundrum by analyzing protein-coenzyme interactions throughout PDB with respect to amino acid and coenzyme evolutionary age (Fig. 2). While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.

Ancient coenzymes are abundant in extant life and bind more frequently to early amino acids

We find that an absolute majority (94 %) of extant coenzymes that appear in PDB structures is conserved across all the life’s domains and available already through prebiotic syntheses (i.e. ancient). This class of coenzymes bears many molecules that are derived from nucleotides (such as ATP, FADH, NADH) and is spread throughout all the E.C. classes of protein catalysts. This is the first obvious distinction from the other (less-populated) classes of coenzymes and supports the coenzyme-peptide significance from the earliest life until today (White, 1976)

There are several outstanding differences in the properties of binding to proteins among the different coenzyme classes. First, the ancient coenzymes bind to proteins more frequently via early amino acids. On average, this interaction presents 61 % of all ancient coenzyme bonds to proteins. It is only 53 and 38 % for the LUCA and post-LUCA coenzymes, respectively.

While the ancient nucleotide-derived and unclassified coenzyme binding sites are dominated by residues in the loop conformation (Fig. 5; Supplementary Fig. 3), there is also a substantially higher frequency of residues in beta-sheet conformations when compared to post-LUCA coenzyme sites. Those, on the other hand, are dominated by alpha-helical conformations. Loops exhibit greater sequence variability compared to ordered structures, and their flexible nature enables them to undergo structural changes (Kessel and Ben-Tal, 2018, Tokuriki and Tawfik, 2009). Such properties of protein active sites were previously associated with evolvability and promiscuity (Tokuriki and Tawfik, 2009, Corbella et al., 2023). The role of loops could thus be important for the flexibility and versatility of early peptide-coenzyme binding sites. It has been noted that evolutionary benefits would be presented by sequences that could adopt closed loop conformations, providing stability and protection to early coenzymes and such hubs could truly transition the peptide-coenzyme world towards primordial enzymes (Goncearenco and Berezovsky 2011, Gamiz-Arco et al., 2021; Toledo-Patiño et al., 2022; Gutierrez-Rus et al., 2023). Besides sequences without regular secondary structure elements, beta-sheets have been considered a more prevalent and significant motif during early stages of protein evolution than alpha helices. Beta-sheet represents the first structural motif in models of the ribosomal evolution, and it has also been observed as a mildly enriched motif in sequences formed from early amino acids (Brack and Orgel 1975, Lupas and Alva 2017, Kovacs et al., 2017, Tretyachenko et al., 2022). Ancient coenzyme-peptide binding properties support the scenario of its significance during early stages of protein evolution.

Ancient coenzymes bind to proteins through more backbone interactions, typically assigned to early amino acids

While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). A specific example of such a scenario was recently reported, where a dihydrofolate reductase activity was supported purely by protein backbone-coenzyme interactions (Lemay-St-Denis and Pelletier, 2023). Finally, Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone.

Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay. To address this intriguing possibility, we next searched whether there are such examples in the PDB dataset, where coenzymes would be bound exclusively by early amino acids. We found 17 such proteins where all the coenzymes belonged to the ancient class (such as ATP and NAD). Together with all of the above, this finding supports the possibility that peptide-coenzyme functional hubs could have originated before the evolution of the full canonical amino acid alphabet.

Reinforcing this, we have recently demonstrated on a select example of a ribosomal RNA-binding domain, that a negatively charged variant of the protein composed of only early amino acids is indeed capable of binding to RNA (Giacobelli et al., 2022). In that case, the interaction is further supported by metal ions that were not present at the binding interface of the wild-type protein. Interestingly, the same trend was observed here throughout the PDB dataset. 24% of the ancient coenzymes in PDB are additionally mediated by at least one metal ion. LUCA and post-LUCA coenzymes involved metal ions only in 13 and 11 %, respectively. It is quite probable that not all the metal ion densities are recognized or fully resolved in all the PDB structures that were used in our analysis. Nevertheless, we hypothesize that the overall trend can be attributed to the inherent negative charge of many of the ancient coenzymes, necessitating engagement of positively charged metal ions. Along with the general adaptive properties of late amino acids in expanding the chemistry space, the late amino acids would supplement the positive charge in the residue side chains (Ilardo and Freeland, 2014).

Coenzymes could have served as bridges towards protein structural and functional sovereignty from the peptide-nucleotide world

Our study further revealed that ancient coenzymes stand out in the variety of protein structures that they bind to, as represented by the ECOD X-groups. While this may partly be caused by their general over-representation in our dataset, the significance of coenzymes has been pointed out previously throughout the most ancient protein folds, such as P-loop NTPases, TIM beta/alpha-barrels, OB and Rossmann folds (Caetano-Anollés et al., 2017; Goldman et al. 2013; Longo et al., 2020(a); Longo et al., 2020(b)). Phosphate-containing coenzymes truly stand out by their large number of associated folds (e.g. 74 different ECOD X-groups in case of ATP).

It has been postulated that phosphate-binding loops served as the most significant precursors for contemporary enzymes (Romero Romero et al., 2018). Combining ancestral sequence reconstruction and selection/protein design, short polypeptide motifs capable of poly/nucleotide binding have been recovered from the P-loop NTPase and HhH motifs, relying primarily on early amino acids (Longo et al., 2020 (b); Romero Romero et al., 2018). Both demonstrated that these ancient motifs are highly robust to sequence variations, implying that such interaction can be encountered more easily than previously thought (Longo et al., 2020 (b); Keefe and Szostak, 2001). Provokingly, it has also been implied that many of these motifs emerged initially as polynucleotide binders and started serving catalysis only after gaining higher structural complexity (Romero Romero et al., 2018).

Our entry search of PDB coenzymes also retrieved 80 RNA structures. While these were not the primary subject of our analysis, the majority of those were found to interact with ancient coenzymes, mainly the nucleotide derived ones, such as ATP, FMN, NAD, SAM (Table 5). While some of these structures are assigned as riboswitches, other coenzyme-RNA complexes belong to ribozymes, representing the potential of polynucleotides in early catalysis as discussed by many previous studies (White, 1976; Gilbert, 1986; Reyes-Prieto et al., 2012; Goldman and Kacar 2021). If peptide-polynucleotide interactions were initially more feasible and dominant (in a putative peptide-polynucleotide world, as implied above), coenzymes could have played a key role in resolving the sovereignty of these molecules towards their tertiary structures and catalytic functions. Polypeptide-coenzyme catalysts would soon dominate in performance (enabling more efficient catalysis) and functional repertoires, especially those that would be hard to facilitate in their absence, such as oxidoreductases and transferases (Fried et al., 2022; Goldman and Kacar 2021; Kessel and Ben-Tal 2018).

Limitations of our work

The first obvious drawback of this analysis is the ambiguity that accompanies the division of coenzymes into evolutionary age classes. While the prebiotic availability (or lack of) is quite consensual in some cases, there are also contradictory studies and opinions in the case of some other coenzymes. Some of the coenzymes have e.g. prebiotic precursors but are not present across all the kingdoms of life. This may suggest that the coenzyme became important only post-LUCA but it can also mean that its importance was only preserved in specific branches of life. “Unclassified” category has been included for such specific cases (presented e.g. by glutathione). Several properties of glutathione that were identified here (such as protein backbone vs. sidechain interaction, ubiquity across all E.C. enzyme classes and a high number of associated X-groups) would suggest that it is closer to ancient coenzymes. The other “classified” coenzymes were categorized based on prevailing studies although some ambiguity remains. Despite our effort, the evolutionary ratings of coenzymes (and amino acids) are therefore not always clear cut and not all belong to the categories with the same weight (e.g. some are likely to be more ancient than others). For example, the nucleotide-derived coenzymes probably predate the others of the ancient class - it has been proposed that S-adenosylmethionine emerged before the more complex heme-related porphyrin adenosylcobalamin or coenzyme B12 functionalities (Lazcano, 2013).

Another possible bias of our study stems from the population differences among the three coenzyme classes. The ancient coenzymes are by far the most abundant class in the PDB dataset. It can be argued that this is the result of their ∼4 billion years of essentiality to life (Goldman and Kacar 2021). Nevertheless, it may as well be contributed by the bias of structures that are deposited in the PDB and most probably does not reflect the true distribution of coenzymes in the biological protein space. Additionally, the comparison of differentially populated classes has challenged some aspects of the analysis presented here although care has been taken to perform all the appropriate statistical tests.

Conclusions

The findings presented here propose that early (the less complex and prebiotically plausible) amino acids are sufficient for binding to ancient coenzymes. Consequently, coenzyme-peptide interactions might have been conceivable at a time when the amino acid alphabet was not yet evolved to its current form. In such interactions, binding modes would rely more on the protein backbone atoms and on involvement of metal ions, both of which are less frequent in interactions with evolutionarily young coenzymes.

Methodology

Identification of organic cofactors

We systematically identified all available cofactor and cofactor-like molecules in the Protein Data Bank in Europe (wwPDB Consortium, 2020) programmatically through the PDBe REST API (pdbe.org/api) (Mukhopadhyay et al., 2019). All cofactor molecules were classified into 27 classes based on the CoFactor database (Fischer et al., 2010). Furthermore, we included ATP and its analogs as an additional cofactor class.

The identification of all the available ligand codes from the PDB chemical component dictionary for each cofactor class was achieved by programmatic access through the “Cofactors” endpoint (https://www.ebi.ac.uk/pdbe/api/pdb/compound/cofactors) using the PDBe REST API (pdbe.org/api) and all responses were in JSON format.

Structural database and classifications

We retrieved the PDB entries associated to each chemical component from every cofactor class using the “PDB entries containing the compound” endpoint (https://www.ebi.ac.uk/pdbe/api/pdb/compound/in_pdb/:id) via the Entry-based API. The count of PDBe entries for each cofactor class is provided in the supplementary information (Supplementary file 1). The information from the REST API was unavailable for two coenzyme classes, MIO and Orthoquinone, so they were excluded from the analysis.

The secondary structure assignments and (EC) numbers for all PDB structures analyzed were determined through residue-level cross references obtained from the SIFTS XML files (Velankar et al., 2013; Dana et al., 2019). Secondary structure elements include “h” for helix, “b” for strand, and “c” for coil; and they correspond to the information available in the PDBe website. Only observed residues were examined.

Furthermore, we assigned all our PDB entries to the ECOD hierarchical system groups “X”, “H” and “F” (Cheng et al., 2014).

UniProt assignment and interaction ratio

The assignment of UniProt codes to our structural dataset was achieved by mapping the information with the SIFTS (Velankar et al., 2013; Dana et al., 2019) file “pdb_chain_uniprot.tsv.gz” (https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html). Next, we mapped the UniProt residue to each of the PDB structures with the residue-level cross-reference data of SIFTS by retrieving the XML files (https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html).

Each UniProt code represents a unique protein sequence that encompasses one or various PDBe associated entries. With the aim of filtering those residues relevant to the interaction sites at the level of protein sequence, we incorporated the interaction ratio. The interaction ratio is a measure of the interaction for each ligand with all its PDB associated entries by UniProt residue. Those residue-ligand interactions that were preserved in more than 50% of the associated PDB entries were selected (we call the ratio of preserved interactions among structures of one unique protein interaction ratio).

Upon UniProt residues assignment for each residue of the PDB structures, we downloaded the calculated interaction ratio with the endpoint “UniProt-Get ligand binding residues for a UniProt accession” (https://www.ebi.ac.uk/pdbe/graph-api/uniprot/ligand_sites/:accession).

The redundancy of our database was removed by clustering the UniProt sequences using CD-HIT (Li and Godzik, 2006) with a 90% sequence identity parameter.

Analysis of coenzyme interactions

In order to analyze the amino acid-coenzyme interactions, we downloaded information of all bound molecules found in a given PDB entry using the “Get bound molecules” endpoint (https://www.ebi.ac.uk/pdbe/graph-api/pdb/bound_molecule_interactions/:pdbId/:bmid) of the Aggregated API. Then, we retrieved the ligand interactions for each bound molecule in every entry through the “PDB-Get bound ligand interactions” endpoint (https://www.ebi.ac.uk/pdbe/graph-api/pdb/bound_ligand_interactions/:pdbId/:chain/:seqId), which calculates these interactions with Arpeggio (Jubb et al., 2017). The retrieved interactions included the standard amino acid codes, water molecules and metal ions.

We classified all the interactions reported by Arpeggio into nine distinct interaction types. The classification scheme aligns with the one used by PDBe and encompasses the following categories: i) “covalent”; ii) “electrostatic”, which combines “ionic”, “hbond”, “weak_hbond”, “polar”, “weak_polar”, “xbond” and “carbonyl”; iii) “amide”, consisting in “AMIDEAMIDE” and “AMIDERING”; iv) “vdw”, denoting van der Waals interactions; v) “hydrophobic”; vi) “aromatic”, grouping “aromatic”, “FF”, “OF”, “EE”, “FT”, “OT”, “ET”, “FE”, “OE” and “EF” contacts; vii) “atom-pi”, comprised of “CARBONPI”, “CATIONPI”, “DONORPI”, “HALOGENPI”, and “METSULPHURPI”; viii) “metal” and ix) “clashes”, including “clash” and “vdw_clash” contacts. We have omitted this last category due to the limited number of interactions, most of which result from experimental errors during X-ray diffraction.

Backbone and side chains interactions were identified based on the atom identities in the coenzyme binding sites. Those atoms corresponding to the backbone of standard amino acids were identified as: “N”, “C”, “CA”, “O”. Glycine has only a hydrogen atom as its side chain; nevertheless, no side chain atom mediating any interaction was identified.

Secondary structure analysis

Statistical analysis of secondary structure content was conducted at the UniProt level. For each residue within every UniProt entry, we considered all potential secondary structure elements derived from the PDB structures associated with each UniProt code. Subsequently, we eliminated redundancy on a per-residue basis. This methodological approach enabled us to comprehensively encompass the structural diversity at each position of the protein.

Interactions mediated exclusively by early or late amino acids

To examine proteins that interacted with cofactors solely through early or late amino acids, we filtered the data to include only proteins interacting with at least two amino acids.

For the assessment of the evolutionary conservation of coenzyme-binding amino acids, we employed ConSurf (Ashkenazy et al., 2010; Ashkenazy et al., 2016). Specifically, we analyzed the msa_positional_aa_frequency files generated for each PDB structure.

Supporting information

Supplementary material

Supplementary file 1

Supplementary Table 1

Supplementary Table 2

Supplementary Table 3

Supplementary Table 4

Supplementary Tables 5-8

Acknowledgements

This work was supported by the Human Frontier Science Program grant HFSP-RGEC27/2023 and was carried out with the support of ELIXIR CZ Research Infrastructure (ID LM2023055, MEYS CR). A.C.S.R. and M.M. acknowledge support by the project the “Grant Schemes at CU” (reg. no. CZ.02.2.69/0.0/0.0/19_073/0016935), project no. START/SCI/148. Finally, we would like to thank Prof. Stephen Freeland and Prof. Janet Thornton for helpful discussions on this manuscript.

References

1. Alva V
2. Söding J
3. Lupas AN
2015A vocabulary of ancient peptides at the origin of folded proteinseLife 4:e09410.https://doi.org/10.7554/eLife.09410
1. Aylward N.
2006An ab initio computational study of thiamin synthesis from gaseous reactants of the interstellar mediumBiophysical Chemistry 121:185–193https://doi.org/10.1016/j.bpc.2005.12.018
1. Aylward N
2. Bofinger N.
2006A plausible prebiotic synthesis of pyrdoxal phosphate: Vitamin B-6 - A computational studyBiophysical Chemistry 123:113–121https://doi.org/10.1016/j.bpc.2006.04.014
1. Ashkenazy H
2. Erez E
3. Martz E
4. Pupko T
5. Ben-Tal N.
2010ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acidsNucleic Acids Res 38:W529–33https://doi.org/10.1093/nar/gkq399
1. Ashkenazy H
2. Abadi S
3. Martz E
4. Chay O
5. Mayrose I
6. Pupko T
7. Ben-Tal N.
2016ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromoleculesNucleic Acids Res 44:W344–50https://doi.org/10.1093/nar/gkw408
1. Barrick JE
2. Breaker R.
2007The distributions, mechanisms, and structures of metabolite-binding riboswitchesGenome Biology 8:R239https://doi.org/10.1186/gb-2007-8-11-r239
1. Bonfio C
2. Valer L
3. Scintilla S
4. Shah S
5. Evans D
6. Jin L
7. Szostak J
8. Sasselov D
9. Sutherland J
10. Mansy S.
2017UV-light-driven prebiotic synthesis of iron-sulfur clustersNature Chemistry 9:1229https://doi.org/10.1038/nchem.2817
1. Burton AS
2. Stern JC
3. Elsila JE
4. Glavin DP
5. Dworkin JP
2012Understanding prebiotic chemistry through the analysis of extraterrestrial amino acids and nucleobases in meteoritesChemical Society reviews 41:5459–72https://doi.org/10.1039/c2cs35109a
1. Brack A
2. Orgel LE
1975Beta structures of alternating polypeptides and their possible prebiotic significanceNature 256:383–7https://doi.org/10.1038/256383a0
1. Caetano-Anollés G.
2. Hee SK
3. Mittenthal JE
2007The origin of modern metabolic networks inferred from phylogenomic analysis of protein architectureProceedings of the National Academy of Sciences of the United States of America 104:9358–9363https://doi.org/10.1073/pnas.0701214104
1. Cheng H
2. Schaeffer RD
3. Liao Y
4. Kinch LN
5. Pei J
6. Shi S
7. Kim BH
8. Grishin NV
2014ECOD: an evolutionary classification of protein domainsPLoS Computational Biol 10:e1003926.https://doi.org/10.1371/journal.pcbi.1003926
1. Chu XY
2. Zhang HY
2020Cofactors as molecular fossils to trace the origin and evolution of proteinsChemBioChem 21:3161–3168https://doi.org/10.1002/cbic.202000027
1. Cvjetan N
2. Schuler L
3. Ishikawa T
4. Walde P.
2023Optimization and Enhancement of the Peroxidaselike Activity of Hemin in Aqueous Solutions of Sodium DodecylsulfateACS Omega 8:42878–42899https://doi.org/10.1021/acsomega.3c05915
1. Cleaves HJ
2010The origin of the biologically coded amino acidsJournal of Theoretical Biology 263:490–498https://doi.org/10.1016/j.jtbi.2009.12.014
1. Copley SD
2. Dhillon JK
2002Lateral gene transfer and parallel evolution in the history of glutathione biosynthesis genesGenome Biology 3:1–16https://doi.org/10.1186/gb-2002-3-5-research0025
1. Corbella M
2. Pinto GP
3. Kamerlin SCL
2023Loop dynamics and the evolution of enzyme activityNature Reviews Chemistry 7:536–547https://doi.org/10.1038/s41570-023-00495-w
1. Dana JM
2. Gutmanas A
3. Tyagi N
4. Qi G
5. O’Donovan C
6. Martin M
7. Velankar S.
2019SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteinsNucleic Acids Res 47:D482–D489https://doi.org/10.1093/nar/gky1114
1. Dherbassy Q
2. Mayer R
3. Muchowska K
4. Moran J.
2023Metal-Pyridoxal Cooperativity in Nonenzymatic TransaminationJournal of the American Chemical Society 145:13357–13370https://doi.org/10.1021/jacs.3c03542
1. Edwards H
2. Abeln S
3. Deane CM
2013Exploring Fold Space Preferences of New-born and Ancient Protein SuperfamiliesPLoS Computational Biology 9:e1003325.https://doi.org/10.1371/journal.pcbi.1003325
1. Fischer JD
2. Holliday GL
3. Thornton JM
2010The CoFactor database: Organic cofactors in enzyme catalysisBioinformatics 26:2496–2497https://doi.org/10.1093/bioinformatics/btq442
1. Frenkel-Pinter M
2. Haynes JW
3. Martin C
4. Petrov AS
5. Burcar BT
6. Krishnamurthy R
7. Hud N
8. Leman L
9. Williams LD
2019Selective incorporation of proteinaceous over nonproteinaceous cationic amino acids in model prebiotic oligomerization reactionsProceedings of the National Academy of Sciences of the United States of America 116:16338–16346https://doi.org/10.1073/pnas.1904849116
1. Frenkel-Pinter M
2. Mousumi S
3. Ashkenasy G
4. Leman L.
2020Prebiotic Peptides: Molecular Hubs in the Origin of LifeChemical Reviews 120:4707–4765https://doi.org/10.1021/acs.chemrev.9b00664
1. Fried SD
2. Fujishima K
3. Makarov M
4. Cherepashuk I
5. Hlouchova K.
2022Peptides before and during the nucleotide world: An origins story emphasizing cooperation between proteins and nucleic acidsJournal of the Royal Society Interface 19:20210641https://doi.org/10.1098/rsif.2021.0641
1. Gamiz-Arco G
2. Gutierrez-Rus LI
3. Risso VA
4. Ibarra-Molero B
5. Hoshino Y
6. Petrović D
7. Justicia J
8. Cuerva JM
9. Romero-Rivera A
10. Seelig B
11. Gavira JA
12. Kamerlin SCL
13. Gaucher EA
14. Sanchez-Ruiz JM
2021Heme-binding enables allosteric modulation in an ancient TIM-barrel glycosidaseNature Communications 12:380https://doi.org/10.1038/s41467-020-20630-1
1. Giacobelli VG
2. Fujishima K
3. Lepšík M
4. Tretyachenko V
5. Kadavá T
6. Makarov M
7. Hlouchová K.
2022In Vitro Evolution Reveals Noncationic Protein-RNA Interaction Mediated by Metal IonsMolecular Biology and Evolution 39:1–11https://doi.org/10.1093/molbev/msac032
1. Gilbert W.
1986The RNA world superlattices point aheadNature 319:618
1. Goldman AD
2. Bernhard TM
3. Dolzhenko E
4. Landweber LF
2013LUCApedia: A database for the study of ancient lifeNucleic Acids Res 41:1079–1082https://doi.org/10.1093/nar/gks1217
1. Goldman AD
2. Kacar B.
2021Cofactors are Remnants of Life’s Origin and Early EvolutionJournal of Molecular Evolution 89:127–133https://doi.org/10.1007/s00239-020-09988-4
1. Goncearenco A
2. Berezovsky IN
2011Prototypes of elementary functional loops unravel evolutionary connections between protein functionsBioinformatics 27:i497.https://doi.org/10.1093/bioinformatics/btq374
1. Gutierrez-Rus LI
2. Gamiz-Arco G
3. Gavira JA
4. Gaucher EA
5. Risso VA
6. Sanchez-Ruiz JM
2023Protection of Catalytic Cofactors by Polypeptides as a Driver for the Emergence of Primordial EnzymesMolecular Biology and Evolution 40:1–8https://doi.org/10.1093/molbev/msad126
1. Henriques DP
2. Leethaus J
3. Beyazay T
4. do Nascimento A
5. Kleinermanns K
6. Tüysüz H
7. Martin W
8. Preiner M.
2022Role of geochemical protoenzymes (geozymes) in primordial metabolism: specific abiotic hydride transfer by metals to the biological redox cofactor NAD+FEBS Journal 289:3148–3162https://doi.org/10.1111/febs.16329
1. Higgs PG
2. Pudritz RE
2009A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic codeAstrobiology 9:483–490https://doi.org/10.1089/ast.2008.0280
1. Holliday GL
2. Thornton JM
3. Marquet A
4. Smith AG
5. Rébeillé F
6. Mendel R
7. Schubert HL
8. Lawrence AD
9. Warren MJ
2007Evolution of enzymes and pathways for the biosynthesis of cofactorsNatural Product Reports 24:972–987https://doi.org/10.1039/b703107f
1. Huang F
2. Bugg CW
3. Yarus M.
4. RNA-Catalyzed CoA NAD
2000FAD synthesis from phosphopantetheine, NMN, and FMNBiochemistry 39:15548–55https://doi.org/10.1021/bi002061f
1. Ilardo MA
2. Freeland SJ
2014Testing for adaptive signatures of amino acid alphabet evolution using chemistry spaceJournal of Systems Chemistry 5:1–9https://doi.org/10.1186/1759-2208-5-1
1. Ji HF
2. Chen L
3. Zhang HY
2008Organic cofactors participated more frequently than transition metals in redox reactions of primitive proteinsBioEssays 30:766–771https://doi.org/10.1002/bies.20788
1. Jubb HC
2. Higueruelo AP
3. Ochoa-Montaño B
4. Pitt WR
5. Ascher DB
6. Blundell TL
2017Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein StructuresJ Mol Biol 429:365–371https://doi.org/10.1016/j.jmb.2016.12.004
1. Keefe AD
2. Newton GL
3. Miller SL
1995A possible prebiotic synthesis of pantetheine, a precursor to coenzyme aNature 373:683–685https://doi.org/10.1038/373683a0
1. Keefe AD
2. Szostak JW
2001Functional proteins from a random-sequence libraryNature 410:715–8https://doi.org/10.1038/35070613
1. Kessel A
2. Ben-Tal N.
2018Introduction to proteins: structure, function, and motionCrc Press (Taylor & Francis Group) https://doi.org/10.1201/9781315113876
1. Kessel A
2. Ben-Tal N.
2022From Molecules to Cells: The Origin of Life on EarthKindle E-Book
1. Kirschning A.
2021Coenzymes and Their Role in the Evolution of LifeAngewandte Chemie - International Edition 60:6242–6269https://doi.org/10.1002/anie.201914786
1. Kirschning A.
2022On the Evolutionary History of the Twenty Encoded Amino AcidsChemistry - A European Journal 28:e202201419.https://doi.org/10.1002/chem.202201419
1. Kolodny R
2. Nepomnyachiy S
3. Tawfik DS
4. Ben-Tal N.
2021Bridging Themes: Short Protein Segments Found in Different ArchitecturesMolecular Biology and Evolution 38:2191–2208https://doi.org/10.1093/molbev/msab017
1. Kovacs NA
2. Petrov AS
3. Lanier KA
4. Williams LD
2017Frozen in Time: The History of ProteinsMolecular Biology and Evolution 34:1252–1260https://doi.org/10.1093/molbev/msx086
1. Lane N
2. Martin WF
2012The origin of membrane bioenergeticsCell 151:1406–1416https://doi.org/10.1016/j.cell.2012.11.050
1. Laurino P
2. Tóth-Petróczy Á
3. Meana-Pañeda Lin W
4. Truhlar DG
5. Tawfik DS
2016An Ancient Fingerprint Indicates the Common Ancestry of Rossmann-Fold Enzymes Utilizing Different Ribose-Based CofactorsPLoS Biology 14:1–23https://doi.org/10.1371/journal.pbio.1002396
1. Laskowski RA
2. Swindells MB
2011LigPlot+: multiple ligand-protein interaction diagrams for drug discoveryJ Chem Inf Model 51:2778–86https://doi.org/10.1021/ci200227u
1. Lazcano A.
2013Planetary change and biochemical adaptation: Molecular evolution of corrinoid and heme biosynthesesHematology 17:s7–s10https://doi.org/10.1179/102453312X13336169155015
1. Lemay-St-Denis C
2. Pelletier J.
2023From a binding module to essential catalytic activity: how nature stumbled on a good thingChem. Commun 59:12560–12572https://doi.org/10.1039/D3CC04209J
1. Li W
2. Godzik A.
2006Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequencesBioinformatics 22:1658–9https://doi.org/10.1093/bioinformatics/btl158
1. Longo LM
2. Jabtoiiska J
3. Vyas P
4. Kanade M
5. Kolodny R
6. Ben-Tal N
7. Tawfik DS
2020On the emergence of p-loop ntpase and rossmann enzymes from a beta-alpha-beta ancestral fragmentELife 9:1–16https://doi.org/10.7554/ELIFE.64415
1. Longo LM
2. Petrovic D
3. Kamerlin SCL
4. Tawfik DS
2020Short and simple sequences favored the emergence of N-helix phospho-ligand binding sites in the first enzymesProceedings of the National Academy of Sciences of the United States of America 117:5310–5318https://doi.org/10.1073/pnas.1911742117
1. Lupas AN
2. Alva V.
2017Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded proteinsJournal of Structural Biology 198:74–81https://doi.org/10.1016/j.jsb.2017.04.007
1. Makarov M
2. Meng J
3. Tretyachenko V
4. Srb P
5. Březinová A
6. Giacobelli VG
7. Bednárová L
8. Vondrášek J
9. Dunker K
10. Hlouchová K.
2021Enzyme catalysis prior to aromatic residues: Reverse engineering of a dephospho-CoA kinaseProtein Science 30:1022–1034https://doi.org/10.1002/pro.4068
1. Menor-Salván C
2. Burcar BT
3. Bouza M
4. Fialho DM
5. Fernández FM
6. Hud NV
2022A Shared Prebiotic Formation of Neopterins and Guanine Nucleosides from Pyrimidine BasesChemistry (Weinheim an Der Bergstrasse, Germany) 28:e202200714.https://doi.org/10.1002/chem.202200714
1. Miller SL
2. Schlesinger G.
1993Prebiotic syntheses of vitamin coenzymes: I. Cysteamine and 2-mercaptoethanesulfonic acid (coenzyme M)Journal of Molecular Evolution 36:302–7https://doi.org/10.1007/BF00182177
1. Milner-White EJ
2. Russell MJ
2011Functional capabilities of the earliest peptides and the emergence of lifeGenes 2:671–88https://doi.org/10.3390/genes2040671
1. Monteverde DR
2. Gómez-Consarnau L
3. Suffridge C
4. Sañudo-Wilhelmy SA.
2017Life’s utilization of B vitamins on early EarthGeobiology 15:3–18https://doi.org/10.1111/gbi.12202
1. Muchowska KB
2. Varma SJ
3. Moran J.
2020Nonenzymatic Metabolic Reactions and Life’s OriginsChemical Reviews 120:7708–7744https://doi.org/10.1021/acs.chemrev.0c00191
1. Mukhopadhyay A
2. Borkakoti N
3. Pravda L
4. Tyzack JD
5. Thornton JM
6. Velankar S.
2019Finding enzyme cofactors in Protein Data BankBioinformatics 35:3510–3511https://doi.org/10.1093/bioinformatics/btz115
1. Naraoka H
2. Takano Y
3. Dworkin JP
4. Oba Y
5. Hamase K
6. Furusho A
7. Ogawa NO
8. Hashiguchi M
9. Fukushima K
10. Aoki D
11. Schmitt-Kopplin P
12. Aponte JC
13. Parker ET
14. Glavin DP
15. McLain HL
16. Elsila JE
17. Graham HV
18. Eiler JM
19. Orthous-Daunay FR
20. Wolters C
21. Isa J
22. Vuitton V
23. Thissen R
24. Sakai S
25. Yoshimura T
26. Koga T
27. Ohkouchi N
28. Chikaraishi Y
29. Sugahara H
30. Mita H
31. Furukawa Y
32. Hertkorn N
33. Ruf A
34. Yurimoto H
35. Nakamura T
36. Noguchi T
37. Okazaki R
38. Yabuta H
39. Sakamoto K
40. Tachibana S
41. Connolly HC
42. Lauretta DS
43. Abe M
44. Yada T
45. Nishimura M
46. Yogata K
47. Nakato A
48. Yoshitake M
49. Suzuki A
50. Miyazaki A
51. Furuya S
52. Hatakeda K
53. Soejima H
54. Hitomi Y
55. Kumagai K
56. Usui T
57. Hayashi T
58. Yamamoto D
59. Fukai R
60. Kitazato K
61. Sugita S
62. Namiki N
63. Arakawa M
64. Ikeda H
65. Ishiguro M
66. Hirata N
67. Wada K
68. Ishihara Y
69. Noguchi R
70. Morota T
71. Sakatani N
72. Matsumoto K
73. Senshu H
74. Honda R
75. Tatsumi E
76. Yokota Y
77. Honda C
78. Michikami T
79. Matsuoka M
80. Miura A
81. Noda H
82. Yamada T
83. Yoshihara K
84. Kawahara K
85. Ozaki M
86. Iijima YI
87. Yano H
88. Hayakawa M
89. Iwata T
90. Tsukizaki R
91. Sawada H
92. Hosoda S
93. Ogawa K
94. Okamoto C
95. Hirata N
96. Shirai K
97. Shimaki Y
98. Yamada M
99. Okada T
100. Yamamoto Y
101. Takeuchi H
102. Fujii A
103. Takei Y
104. Yoshikawa K
105. Mimasu Y
106. Ono G
107. Ogawa N
108. Kikuchi S
109. Nakazawa S
110. Terui F
111. Tanaka S
112. Saiki T
113. Yoshikawa M
114. Watanabe SI
115. Tsuda Y.
2023Soluble organic molecules in samples of the carbonaceous asteroid (162173) RyuguScience 379:eabn9033.https://doi.org/10.1126/science.abn9033
1. Narunsky A
2. Kessel A
3. Solan R
4. Alva V
5. Kolodny R
6. Ben-Tal N.
2020On the evolution of proteinadenine bindingProceedings of the National Academy of Sciences of the United States of America 117:4701–4709https://doi.org/10.1073/pnas.1911349117
1. Nepomnyachiy S
2. Ben-Tal N
3. Kolodny R.
2017Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengthsProceedings of the National Academy of Sciences of the United States of America 114:11703–11708https://doi.org/10.1073/pnas.1707642114
1. consortium PDBe-KB
2020PDBe-KB: a community-driven resource for structural and functional annotationsNucleic Acids Res 48:D344–D353https://doi.org/10.1093/nar/gkz853
1. Pinna S
2. Kunz C
3. Halpern A
4. Harrison SA
5. Jordan SF
6. Ward J
7. Werner F
8. Lane N.
2022A prebiotic basis for ATP as the universal energy currencyPLoS Biology 20:1–25https://doi.org/10.1371/journal.pbio.3001437
1. Putignano V
2. Rosato A
3. Banci L
4. Andreini C.
2018MetalPDB in 2018: A database of metal sites in biological macromolecular structuresNucleic Acids Res 46:D459–D464https://doi.org/10.1093/nar/gkx989
1. Preiner M
2. Asche S
3. Becker S
4. Betts HC
5. Boniface A
6. Camprubi E
7. Chandru K
8. Erastova V
9. Garg SG
10. Khawaja N
11. Kostyrka G
12. Machné R
13. Moggioli G
14. Muchowska KB
15. Neukirchen S
16. Peter B
17. Pichlhöfer E
18. Radványi Á
19. Rossetto D
20. Salditt A
21. Schmelling NM
22. Sousa FL
23. Tria FDK
24. Vörös D
25. Xavier JC
2020The future of origin of life research: Bridging decades-old divisionsLife 10:20https://doi.org/10.3390/life10030020
1. Qiu K
2. Ben-Tal N
3. Kolodny R.
2022Similar protein segments shared between domains of different evolutionary lineagesProtein Science 31:e4407.https://doi.org/10.1002/pro.4407
1. Reyes-Prieto F
2. Hernández-Morales R
3. Jácome R
4. Becerra A
5. Lazcano A.
2012Coenzymes, viruses and the RNA worldBiochimie 94:1467–1473https://doi.org/10.1016/j.biochi.2012.01.004
1. Romero Romero ML
2. Yang F
3. Lin YR
4. Toth-Petroczy A
5. Berezovsky IN
6. Goncearenco A
7. Yang W
8. Wellner A
9. Kumar-Deshmukh F
10. Sharon M
11. Baker D
12. Varani G
13. Tawfik DS
2018Simple yet functional phosphate-loop proteinsProceedings of the National Academy of Sciences of the United States of America 115:E11943–E11950https://doi.org/10.1073/pnas.1812400115
1. Russell MJ
2. Hall AJ
1997The emergence of life from iron monosulphide bubbles at a submarine hydrothermal redox and pH frontJ Geol Soc London 154:377–402https://doi.org/10.1144/gsjgs.154.3.0377
1. Seitz C
2. Eisenreich W
3. Huber C.
2021The abiotic formation of pyrrole under volcanic, hydrothermal conditions—an initial step towards life’s first breath?Life 11:1–10https://doi.org/10.3390/life11090980
1. Söding J
2. Lupas AN
2003More than the sum of their parts: On the evolution of proteins from peptidesBioEssays 25:837–846https://doi.org/10.1002/bies.10321
1. Thauer RK
2. Bonacker LG
1994Biosynthesis of coenzyme F430, a nickel porphinoid involved in methanogenesisCiba Found Symp 180:210–22https://doi.org/10.1002/9780470514535.ch12
1. The UniProt Consortium
2023UniProt: the Universal Protein Knowledgebase in 2023Nucleic Acids Res 51:D523–D531https://doi.org/10.1093/nar/gkac1052
1. Toledo-Patiño S
2. Pascarelli S
3. Uechi GI
4. Laurino P.
2022Insertions and deletions mediated functional divergence of Rossmann fold enzymesProceedings of the National Academy of Sciences of the United States of America 119:e2207965119.https://doi.org/10.1073/pnas.2207965119
1. Tokuriki N
2. Tawfik DS
2009Protein dynamism and evolvabilityScience 324:203–7https://doi.org/10.1126/science.1169375
1. Tretyachenko V
2. Vymětal J
3. Neuwirthová T
4. Vondrášek J
5. Fujishima K
6. Hlouchová K.
2022Modern and prebiotic amino acids support distinct structural profiles in proteinsOpen Biol 12:220040https://doi.org/10.1098/rsob.220040
1. Trifonov EN
2000Consensus temporal order of amino acids and evolution of the triplet codeGene 261:139–151https://doi.org/10.1016/S0378-1119(00)00476-5
1. Velankar S
2. Dana JM
3. Jacobsen J
4. van Ginkel G
5. Gane PJ
6. Luo J
7. Oldfield TJ
8. O’Donovan C
9. Martin MJ
10. Kleywegt GJ
2013SIFTS: Structure Integration with Function, Taxonomy and Sequences resourceNucleic Acids Res 41:D483–9https://doi.org/10.1093/nar/gks1258
1. Wächtershäuser G.
1992Groundworks for an evolutionary biochemistry: The iron-sulphur worldProgress in Biophysics and Molecular Biology 58:85–201https://doi.org/10.1016/0079-6107(92)90022-X
1. Weber AL
2. Miller SL
1981Reasons for the occurrence of the twenty coded protein amino acidsJournal of Molecular Evolution 17:273–84https://doi.org/10.1007/BF01795749
1. White HB
1976Coenzymes as fossils of an earlier metabolic stateJournal of Molecular Evolution 7:101–104https://doi.org/10.1007/BF01732468
1. White HB
1982Evolution of Coenzymes and the Origin of Pyridine Nucleotides. The Pyridine Nucleotide CoenzymesEconometrica 50:1–17https://doi.org/10.1016/b978-0-12-244750-1.50010-5
1. Wong JT
2. Bronskill PM
1979Inadequacy of prebiotic synthesis as origin of proteinous amino acidsJournal of Molecular Evolution 13:115–25https://doi.org/10.1007/BF01732867
1. Wu HH
2. Pun MD
3. Wise CE
4. Streit BR
5. Mus F
6. Berim A
7. Kincannon WM
8. Islam A
9. Partovi SE
10. Gang DR
11. DuBois JL
12. Lubner CE
13. Berkman CE
14. Lange BM
15. Peters JW
2022The pathway for coenzyme M biosynthesis in bacteriaProceedings of the National Academy of Sciences of the United States of America 119:e2207190119.https://doi.org/10.1073/pnas.220719011
1. Zaia DA
2. Zaia CT
3. De Santana H.
2008Which amino acids should be used in prebiotic chemistry studies?Orig Life Evol Biosph 38:469–88https://doi.org/10.1007/s11084-008-9150-5

Article and author information

Author information

Alma Carolina Sanchez-Rocha
Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague, 12843, Czech Republic
Mikhail Makarov
Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague, 12843, Czech Republic
Lukáš Pravda
Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Exscientia, Oxford, UK
Marian Novotný
Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague, 12843, Czech Republic
ORCID iD: 0000-0001-8788-3202
- co-corresponding authors; email: klara.hlouchova@natur.cuni.cz
Klára Hlouchová
Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague, 12843, Czech Republic, Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, 16610, Czech Republic
ORCID iD: 0000-0002-5651-4874
- co-corresponding authors; email: klara.hlouchova@natur.cuni.cz

Version history

Sent for peer review: December 4, 2023
Preprint posted: December 5, 2023
Reviewed Preprint version 1: February 26, 2024
Reviewed Preprint version 2: October 16, 2024

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Reviewing Editor
Donald Hamelberg
Georgia State University, Atlanta, United States of America
Senior Editor
David Ron
University of Cambridge, Cambridge, United Kingdom

Reviewer #1 (Public Review):

Summary:
By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins.

Strengths:
(1) The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented.

(2) This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

(3) The paper is clearly written.

Weaknesses:
(1) The conclusions might not be as strong as presented. First of all, while ancient amino acids interact less frequently in late with a given coenzyme, maybe this just reflects the fact that proteins that evolved later might be using residues that have a more favorable binding free energy.

(2) What about other small molecules that existed in the probiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than the inferred important role of coenzymes.

(3) Perhaps the conclusions just reflect the types of active sites that evolved first and nothing more.

https://doi.org/10.7554/eLife.94174.1.sa1

Reviewer #2 (Public Review):

I enjoyed reading this paper and appreciate the careful analysis performed by the investigators examining whether 'ancient' cofactors are preferentially bound by the first-available amino acids, and whether later 'LUCA' cofactors are bound by the late-arriving amino acids. I've always found this question fascinating as there is a contradiction in inorganic metal-protein complexes (not what is focused on here). Metal coordination of Fe, Ni heavily relies on softer ligands like His and Cys - which are by most models latecomer amino acids. There are no traces of thiols or imidazoles in meteorites - although work by Dvorkin has indicated that could very well be due to acid degradation during extraction. Chris Dupont (PNAS 2005) showed that metal speciation in the early earth (such as proposed by Anbar and prior RJP Williams) matched the purported order of fold emergence.

As such, cofactor-protein interactions as a driving force for evolution has always made sense to me and I admittedly read this paper biased in its favor. But to make sure, I started to play around with the data that the authors kindly and importantly shared in the supplementary files. Here's what I found:

Point 1: The correlation between abundance of amino acids and protein age is dominated by glycine.

There is a small, but visible difference in old vs new amino acid fractional abundance between Ancient and LUCA proteins (Figure 3, Supplementary Table 3). However, the bias is not evenly distributed among the amino acids - which Figure 4A shows but is hard to digest as presented. So instead I used the spreadsheet in Supplement 3 to calculate the fractional difference FDaa = F(old aa)-F(new aa). As expected from Figure 3, the mean FD for Ancient is greater than the mean FD for LUCA. But when you look at the same table for each amino acid FDcofactor = F(ancient cofactor) - F(LUCA cofactor), you now see that the bias is not evenly distributed between older and newer amino acids at all. In fact, most of the difference can be explained by glycine (FDcofactor = 3.8) and the rest by also including tryptophan (FDcofactor = -3.8). If you remove these two amino acids from the analysis, the trend seen in Figure 3 all but disappears.

Troubling - so you might argue that Gly is the oldest of the old and Trp is the newest of the new so the argument still stands. Unfortunately, Gly is a lot of things - flexible, small, polar - so what is the real correlation, age, or chemistry? This leads to point 2.

Point 2 - The correlation is dominated by phosphate.

In the ancient cofactor list, all but 4 comprise at least one phosphate (SAM, tetrahydrofolic acid, biopterin, and heme). Except for SAM, the rest have very low Gly abundance. The overall high Gly abundance in the ancient enzymes is due to the chemical property of glycine that can occupy the right-hand side of the Ramachandran plot. This allows it to make the alternating alphaleft-alpharight conformation of the P-loop forming Milner-White's anionic nest. If you remove phosphate binding folds from the analysis the trend in Figure 3 vanishes.

Likewise, Trp is an important functional residue for binding quinones and tuning its redox potential. The LUCA cofactor set is dominated by quinone and derivatives, which likely drives up the new amino acid score for this class of cofactors.

In summary, while I still believe the premise that cofactors drove the shape of peptides and the folds that came from them - and that Rossmann folds are ancient phosphate-binding proteins, this analysis does not really bring anything new to these ideas that have already been stated by Tawfik/Longo, Milner-White/Russell, and many others.

I did this analysis ad hoc on a slice of the data the authors provided and could easily have missed something and I encourage the authors to check my work. If it holds up it should be noted that negative results can often be as informative as strong positive ones. I think the signal here is too weak to see in the noise using the current approach.

https://doi.org/10.7554/eLife.94174.1.sa0

Author Response

Reviewer #1 (Public Review):

Summary:

By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins.

Strengths:

(1) The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented.

(2) This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

(3) The paper is clearly written.

We are grateful for the kind comments of the reviewer on our manuscript. However, we would like to clarify a possible misunderstanding in the summary of our study. Specifically, analysis of "ancient versus recent folds" was not really reported in our results. Our analysis concerned "coenzyme age" rather than the "protein folds age" and was focused mainly on interaction with early vs. late amino acids in protein sequence. While structural propensities of the coenzyme binding sites were also analyzed, no distinction on the level of ancient vs. recent folds was assumed and this was only commented on in the discussion, based on previous work of others.

Weaknesses:

(1) The conclusions might not be as strong as presented. First of all, while ancient amino acids interact less frequently in late with a given coenzyme, maybe this just reflects the fact that proteins that evolved later might be using residues that have a more favorable binding free energy.

We would like to point out that there was no distinction to proteins that evolved early or late in our dataset of coenzyme-binding proteins. The aim of our analysis was purely to observe trends in the age of amino acids vs. age of coenzymes. While no direct inference can be made from this about early life as all the proteins are from extant life (as highlighted in the discussion of our work), our goal was to look for intrinsic propensities of early vs. late amino acids in binding to the different coenzyme entities. Indeed, very early interactions would be smeared by the eons of evolutionary history (perhaps also towards more favourable binding free energy, as pointed out also by the reviewer). Nevertheless, significant trends have been recorded across the PDB dataset, pointing to different propensities and mechanistic properties of the binding events. Rather than to a specific evolutionary past, our data therefore point to a “capacity” of the early amino acids to bind certain coenzymes and we believe that this is the major (and standing) conclusion of our work, along with the properties of such interactions. In our revised version, we will carefully go through all the conclusions and make sure that this message stands out but we are confident that the following concluding sentences copied from the abstract and the discussion of our manuscript fully comply with our data:

“These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution”

“While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.”

“This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

We would also like to add that proteins that evolved later might not always have higher free energy of binding. Musil et al., 2021 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294521/) showed in their study on the example of haloalkane dehalogenase Dha A that the ancestral sequence reconstruction is a powerful tool for designing more stable, but also more active proteins. Ancestral sequence reconstruction relies on finding ancient states of protein families to suggest mutations that will lead to more stable proteins than are currently existing proteins. Their study did not explore the ligand-protein interactions specifically, but showed that ancient states often show more favourable properties than modern proteins.

(2) What about other small molecules that existed in the probiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than the inferred important role of coenzymes.

We appreciate the comment of the reviewer towards other small molecules, which we assume points mainly towards metal ions (i.e. inorganic cofactors). We completely agree with the reviewer that such interactions are of utmost importance to the origins of life. Intentionally, they were not part of our study, as these have already been studied previously by others (e.g. Bromberg et al., 2022; and reviewed in Frenkel-Pinter et al., 2020) and also us (Fried et al., 2022). For example, it is noteworthy that prebiotically relevant metal binding sites (e.g. of Mg2+) exhibit enrichment in early amino acids such as Asp and Glu while more recent metal (e.g. Cu and Zn) site in the late amino acids His and Cys (Fried et al., 2022). At the same time, comparable analyses of amino acid - coenzyme trends were not available.

Nevertheless, involvement of metal ions in the coenzyme binding sites was also studied here and pointed to their bigger involvement with the Ancient coenzymes. In the revised version of the manuscript, we will be happy to enlarge the discussion of the studies concerning inorganic cofactors.

(3) Perhaps the conclusions just reflect the types of active sites that evolved first and nothing more.

We partly agree on this point with the reviewer but not on the fact why it is listed as the weakness of our study and on the “nothing more” notion. Understanding what the properties of the earliest binding sites is key to merging the gap between prebiotic chemistry and biochemistry. The potential of peptides preceding ribosomal synthesis (and the full alphabet evolution) along with prebiotically plausible coenzymes addresses exactly this gap, which is currently not understood.

Reviewer #2 (Public Review):

I enjoyed reading this paper and appreciate the careful analysis performed by the investigators examining whether 'ancient' cofactors are preferentially bound by the first-available amino acids, and whether later 'LUCA' cofactors are bound by the late-arriving amino acids. I've always found this question fascinating as there is a contradiction in inorganic metal-protein complexes (not what is focused on here). Metal coordination of Fe, Ni heavily relies on softer ligands like His and Cys - which are by most models latecomer amino acids. There are no traces of thiols or imidazoles in meteorites - although work by Dvorkin has indicated that could very well be due to acid degradation during extraction. Chris Dupont (PNAS 2005) showed that metal speciation in the early earth (such as proposed by Anbar and prior RJP Williams) matched the purported order of fold emergence.

As such, cofactor-protein interactions as a driving force for evolution has always made sense to me and I admittedly read this paper biased in its favor. But to make sure, I started to play around with the data that the authors kindly and importantly shared in the supplementary files. Here's what I found:

Point 1: The correlation between abundance of amino acids and protein age is dominated by glycine. There is a small, but visible difference in old vs new amino acid fractional abundance between Ancient and LUCA proteins (Figure 3, Supplementary Table 3). However, the bias is not evenly distributed among the amino acids - which Figure 4A shows but is hard to digest as presented. So instead I used the spreadsheet in Supplement 3 to calculate the fractional difference FDaa = F(old aa)-F(new aa). As expected from Figure 3, the mean FD for Ancient is greater than the mean FD for LUCA. But when you look at the same table for each amino acid FDcofactor = F(ancient cofactor) - F(LUCA cofactor), you now see that the bias is not evenly distributed between older and newer amino acids at all. In fact, most of the difference can be explained by glycine (FDcofactor = 3.8) and the rest by also including tryptophan (FDcofactor = -3.8). If you remove these two amino acids from the analysis, the trend seen in Figure 3 all but disappears.

Troubling - so you might argue that Gly is the oldest of the old and Trp is the newest of the new so the argument still stands. Unfortunately, Gly is a lot of things - flexible, small, polar - so what is the real correlation, age, or chemistry? This leads to point 2.

We truly acknowledge the effort that the reviewer made in the revision of the data and for the thoughtful, deeper analysis. We agree that this deserves further discussion of our data. As invited by the reviewer, we indeed repeated the analysis on the whole dataset. First, we would like to point out that the reviewer was most probably referring to the Supplementary Fig. 2 (and not 3, which concerns protein folds). While the difference between Ancient and LUCA coenzyme binding is indeed most pronounced for Gly and Trp, we failed to confirm that the trend disappears if those two amino acids are removed from the analysis (additional FDcofactors of 3.2 and -3.2 are observed for the early and late amino acids, resp.), as seen in Table I below. The main additional contributors to this effect are Asp (FD of 2.1) and Ser (FD of 1.8) from the early amino acids and Arg (FD of -2.6) and Cys (FD of -1.7) of the late amino acids. Hence, while we agree with the reviewer that Gly and Trp (the oldest and the youngest) contribute to this effect the most, we disagree that the trend reduces to these two amino acids.

In addition, the most recent coenzyme temporality (the Post-LUCA) was neglected in the reviewer’s analysis. The difference between F (old) and F (new) is even more pronounced in PostLUCA than in LUCA, vs. Ancient (Table II) and depends much less on Trp. Meanwhile, Asp, Ser, Leu, Phe, and Arg dominate the observed phenomenon (Table I). This further supports our lack of agreement with the reviewer’s point. Nevertheless, we remain grateful for this discussion and we will happily include this additional analysis in the Supplementary Material of our revised manuscript.

Author response table 1.

Amino acid fractional difference of all coenzymes at residue level

Author response table 2.

Amino acid fractional difference of all coenzymes

Point 2 - The correlation is dominated by phosphate.

In the ancient cofactor list, all but 4 comprise at least one phosphate (SAM, tetrahydrofolic acid, biopterin, and heme). Except for SAM, the rest have very low Gly abundance. The overall high Gly abundance in the ancient enzymes is due to the chemical property of glycine that can occupy the right-hand side of the Ramachandran plot. This allows it to make the alternating alphaleftalpharight conformation of the P-loop forming Milner-White's anionic nest. If you remove phosphate binding folds from the analysis the trend in Figure 3 vanishes.

Likewise, Trp is an important functional residue for binding quinones and tuning its redox potential. The LUCA cofactor set is dominated by quinone and derivatives, which likely drives up the new amino acid score for this class of cofactors.

Once again, we are thankful to the reviewer for raising this point. The role of Gly in the anionic nests proposed by Milner-White and Russel, as well as the Trp role in quinone binding are important points that we would be happy to highlight more in the discussion of the revised manuscript.
Nevertheless, we disagree that the trends reduce only to the phosphate-containing coenzymes and importantly, that “the trend in Figure 3 vanishes” upon their removal. Table III and IV (below) show the data for coenzymes excluding those with phosphate moiety and the trend in Fig. 3 remains, albeit less pronounced.

Author response table 3.

Amino acid fractional difference of non-phosphate containing coenzymes

Author response table 4.

Amino acid fractional difference of non-phosphate containing coenzymes at residue level

In summary, while I still believe the premise that cofactors drove the shape of peptides and the folds that came from them - and that Rossmann folds are ancient phosphate-binding proteins, this analysis does not really bring anything new to these ideas that have already been stated by Tawfik/Longo, Milner-White/Russell, and many others.

I did this analysis ad hoc on a slice of the data the authors provided and could easily have missed something and I encourage the authors to check my work. If it holds up it should be noted that negative results can often be as informative as strong positive ones. I think the signal here is too weak to see in the noise using the current approach.

We are grateful to the reviewer for encouraging further look at our data. While we hope that the analysis on the whole dataset (listed in Tables I - IV) will change the reviewer’s standpoint on our work, we would still like to comment on the questioned novelty of our results. In fact, the extraordinary works by Tawfik/Longo and Milner-While/Russel (which were cited in our manuscript multiple times) presented one of the motivations for this study. We take the opportunity to copy the part of our discussion that specifically highlights the relevance of their studies, and points out the contribution of our work with respect to theirs.

“While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone. Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzymepeptide interplay.”

Unlike any other previous work, our study involves all the major coenzymes (not just the phosphate-containing ones) and is based on their evolutionary age, as well as age of amino acids. It is the first PDB-wide systematic evolutionary analysis of coenzyme-amino acid binding. Besides confirming some earlier theoretical assertions (such as role of backbone interactions in early peptide-coenzyme evolution) and observations (such as occurrence of the ancient phosphatecontaining coenzymes in the oldest protein folds), it uncovers substantial novel knowledge. For example, (i) enrichment of early amino acids in the binding of ancient coenzymes, vs. enrichment of late amino acids in the binding of LUCA and Post-LUCA coenzymes, (ii) the trends in secondary structure content of the binding sites of coenzyme of different temporalities, (iii) increased involvement of metal ions in the ancient coenzyme binding events, and (iv) the capacity of only early amino acids to bind ancient coenzymes. In our humble opinion, all of these points bring important contributions in the peptide-coenzyme knowledge gap which has been discussed in a number of previous studies.

https://doi.org/10.7554/eLife.94174.1.sa3

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Identification of coenzymes in PDB

Evolutionary classification of coenzymes

Distribution of amino acids in the coenzyme binding sites

Interaction types between coenzymes and proteins

Structural properties of cofactor binding sites

Coenzyme early vs late binding sites

Coenzyme binding mediated by metal ions

Discussion

Ancient coenzymes are abundant in extant life and bind more frequently to early amino acids

Ancient coenzymes bind to proteins through more backbone interactions, typically assigned to early amino acids

Coenzymes could have served as bridges towards protein structural and functional sovereignty from the peptide-nucleotide world

Limitations of our work

Conclusions

Methodology

Identification of organic cofactors

Structural database and classifications

UniProt assignment and interaction ratio

Analysis of coenzyme interactions

Secondary structure analysis

Interactions mediated exclusively by early or late amino acids

Supporting information

Acknowledgements

References

Article and author information

Author information

Alma Carolina Sanchez-Rocha

Mikhail Makarov

Lukáš Pravda#

Marian Novotný

Klára Hlouchová

Version history

Copyright

Peer review process

Editors

Lukáš Pravda