Coenzyme-Protein Interactions since Early Life

Alma Carolina Sanchez-Rocha; Mikhail Makarov; Lukáš Pravda; Marian Novotný; Klára Hlouchová

doi:10.7554/eLife.94174.1

eLife assessment

This study presents a useful examination of the prevalence of interactions between amino acids from different periods of Earth's history and coenzymes. While the premise of this work is well founded, the data lend themselves to alternative interpretations, suggesting that the main conclusions might be incompletely supported by the findings. The work would benefit from the inclusion of additional supplementary data and further analysis. This manuscript would be of interest to evolutionary biologists and biophysicists.

https://doi.org/10.7554/eLife.94174.1.sa2

Significance of findings

useful: Findings that have focused importance and scope

landmark
fundamental
important
valuable
useful

Strength of evidence

incomplete: Main claims are only partially supported

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Recent findings in protein evolution and peptide prebiotic plausibility have been setting the stage for reconsidering the role of peptides in the early stages of life’s origin. Ancient protein families have been found to share common themes and proteins reduced in composition to prebiotically plausible amino acids have been reported capable of structure formation and key functions, such as binding to RNA. While this may suggest peptide relevance in early life, their functional repertoire when composed of a limited number of early residues (missing some of the most sophisticated functional groups of today’s alphabet) has been debated.

Cofactors enrich the functional scope of about half of extant enzymes but whether they could also bind to peptides lacking the evolutionary late amino acids remains speculative. The aim of this study was to resolve the early peptide propensity to bind organic cofactors by analysis of protein-coenzyme interactions across the Protein Data Bank (PDB). We find that the prebiotically plausible amino acids are more abundant in the binding sites of the most ancient coenzymes and that such interactions rely more frequently on the involvement of the protein backbone atoms and metal ion cofactors. Moreover, we have identified a few select examples in today’s enzymes where coenzyme binding is supported solely by prebiotically available amino acids. These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution.

Introduction

Organic and inorganic cofactors occupy about half of all known protein structures, expanding across all the enzyme E.C. classes (Putignano, 2018; Mukhopadhyay et al., 2019). While their role in current life is indisputable, some of the cofactors were present and apparently crucial also during life’s early evolution (Chu and Zhang 2020; Goldman and Kacar 2021; Kirschning et al., 2021; Fried et al., 2022; Kirschning, 2022). The significance of metal ions has been broadly discussed, regardless of the different origins-of-life scenarios and has somewhat overshadowed that of organic cofactors (e.g. Wächtershäuser, 1992; Russell and Hall, 1997; Lane and Martin, 2012 ; Chu and Zhang, 2020; Fried et al., 2022).

Diverse lines of evidence have however indicated that many of the extant organic cofactors (coenzymes) date back to the earliest life while their core chemistries have been detected in abiotic material such as recently reported by the Hayabusa2 mission (Holliday et al., 2007; Fried et al., 2022; Naraoka et al., 2023). At the same time, these ancient coenzymes – often of nucleotide origin - have been traced to the most ancient protein folds (such as P-loop NTPases, TIM beta/alpha-barrels, OB and Rossmann folds) that date before the Last Universal Common Ancestor (LUCA) (Goldman and Kacar, 2021; Caetano-Anollés et al., 2007; Goldman et al., 2013; Longo et al., 2020 (a); Kessel and Ben-Tal, 2022). Within the most ancient folds, tens of peptide fragments/themes have been identified throughout seemingly unrelated structural domains, and frequently found to mediate ligand binding (Söding and Lupas, 2003; Alva et al., 2015; Narunsky et al., 2020; Kolodny et al., 2021). Such themes may well represent the remnants of protoenzymes in a peptide-nucleotide world (Fried et al., 2022). Chu and Zhang recently proposed that cofactors could initially “select” the earliest primitive proteins from the vast sequence space by the ability to bind them (Chu and Zhang, 2020). More generally, binding of cofactors to peptides could thus determine the evolution of both protoenzyme function and folding preferences (Tokuriki and Tawfik, 2009).

Prior to the fixation of Central Dogma and ribosomal synthesis, peptides would condense from amino acids (or their alternatives) prebiotically abundant in the environment (Frenkel-Pinter et al. 2019; Frenkel-Pinter et al. 2020; Fried et al., 2022). Independent meta-analyses of the amino acid alphabet evolution based on different possible sources of organic material and different disciplines point towards an “early alphabet” of ∼10 residues (Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr and Val) (Higgs and Pudritz, 2009; Trifonov, 2000; Cleaves, 2010). These could be supplemented by other prebiotically plausible non-canonical amino acids while the other half of the canonical alphabet is assumed to be the product of later biosynthesis (Wong and Bronskill, 1979; Weber and Miller, 1981; Burton et al., 2012; Zaia et al., 2008). Typically, the early amino acids (canonical as well as non-canonical) are smaller and less complex, missing e.g. sulfur groups and aromatics. Additionally, the canonical early alphabet lacks positively charged residues. An emerging question therefore is whether coenzymes could bind to small proteins of prebiotic relevance and whether they could be bound by the prebiotically available residues. In such a scenario, cofactors would provide a palette of functional groups to the early peptide world which would nominate them relatively sophisticated structural and catalytic hubs (Milner-White and Russell, 2011). Alternatively, if coenzymes could not be bound by these simple amino acids, this would suggest that their pairing with peptide molecules would become relevant only after the evolution of the full amino acid alphabet.

Work from our group and others has recently demonstrated that in select cases, protein sequences re-engineered from the early amino acids can still bind to nucleic acid and nucleotide-based cofactors (Longo et al., 2020 (b); Makarov et al., 2021; Giacobelli et al., 2022). Whether this phenomenon is still seen in today’s biology, its abundance and laws represent open questions. Here, we present a systematic survey of coenzyme binding throughout the PDB database. The outcomes of our study support that the coenzyme binding characteristics by amino acids differ by their evolutionary age. Early amino acids are enriched in binding pockets of the most ancient coenzymes and the interaction relies predominantly on the protein backbone groups. Selected examples show that unlike evolutionary younger cofactors, the ancient cofactors can still be bound in proteins only by early amino acids. Our analysis therefore points to an early peptide-coenzyme significance, preceding evolution of proteosynthesis and fixation of the Central Dogma.

Results

Identification of coenzymes in PDB

We identified all the available structures from the PDB that interact with the 27 coenzyme classes as defined in Fischer et al., 2010. In addition, ATP (that was not included in that study) was included here. Using these parameters, we found 25,822 protein structures and 81 nucleic acid macromolecules (Supplementary Table 1, Supplementary File 1). The protein structures were assigned to 8194 UniProt (The UniProt Consortium, 2023) codes. Those UniProt sequences were clustered by 90% identity and resulted in 7399 unique UniProt entries, corresponding to 21,317 protein structures (Fig. 1). In parallel, the clustering was also performed for 30% sequence identity, resulting in 3544 UniProt codes and 9645 PDB structures.

Workflow of this study. All available coenzymes in the PDB were identified according to the CoFactor database (Fischer et al., 2010). The PDB entries of structures bound to coenzymes were downloaded programmatically through the PDBe REST API (pdbe.org/api), including the interatomic cofactor-protein interactions, calculated by Arpeggio (Jubb et al, 2017). The coenzyme binding amino acids were mapped to Uniprot databases via SIFTS (Velankar et al., 2013; Dana et al., 2019). PDB entries were grouped by UniProt code; redundancy was removed by clustering the UniProt sequences by 90% (and in parallel also 30%) sequence identity.

The interaction ratio method was adopted to identify the most relevant residues in coenzyme binding sites. For each protein (unique UniProt ID) we defined the cofactor binding site as a subset of amino acids that appeared to interact with the cofactor in at least 50% of the structures of that particular protein within our dataset to pinpoint the amino acids that are important for the interaction. This methodology does not consider any qualitative criteria (e.g. resolution, R-factor, Clashscore).

Our database is composed of protein structures from all the members of cellular domains - Bacteria (54.3%), Archaea (6.2%), Eukaryota (37.8%), Viruses (1.5%), metagenomes and not assigned (0.3%) (Supplementary Table 1).

Evolutionary classification of coenzymes

To differentiate the evolutionary age of the analyzed coenzymes, we further adapted the classification system from Fried et al., 2022. This system encompasses four primary categories and one additional subcategory: i) “Ancient” coenzymes, including the subcategory “Nucleotide derived”; ii) “LUCA” coenzymes; iii) “Post-LUCA” coenzymes; and iv) “Unclassified” coenzymes (Fig.2).

Classification of coenzymes and amino acids by their assumed evolutionary temporality. The “Unclassified” coenzymes Thiamine diphosphate, Coenzyme M, Factor F430 and Glutathione are not shown in the scheme.

“Ancient” coenzymes comprise those that could be prebiotically synthesized, according to available studies (Miller and Schlesinger, 1993; Keefe et al., 1995; Holliday et al., 2007; Kirschning, 2021; Menor-Salván et al., 2022; Pinna et al., 2022); while the subcategory “Nucleotide derived” includes cofactors chemically derived from nucleotides (White, 1975; (White, 1975; Monteverde et al., 2017). “LUCA” coenzymes were presumably present in the last universal common ancestor (LUCA) and exhibit a universal distribution among Bacteria, Archaea, and Eukarya, although their prebiotically feasible synthesis was not established. “Post-LUCA” coenzymes likely originated only after the divergence of the three cellular domains, mirrored in their non-universal distribution. “Unclassified” coenzymes do not conform to the classification scheme. As a typical representative of the latest category, Coenzyme M has been synthesized under prebiotic conditions (Miller and Schlesinger, 1993; Kirschning, 2021), nonetheless, its biosynthetic pathways in Archaea and Bacteria have been shown to arise through convergent evolution and it is mainly prevalent in methanogens (Wu et al., 2022). Factor F430 is a coenzyme only distributed in methanogens (Thauer and Bonacker, 1994), although its precursors have been synthesized prebiotically (Seitz et al., 2021). Glutathione is another example of a coenzyme with restricted biological distribution, being mainly in eukaryotes, Gram-negative bacteria, and one archaea phylum (Copley and Dhillon, 2002) and the feasibility of its prebiotic synthesis remains unclear (Bonfio et al., 2017). Thiamine diphosphate was also designated as unclassified. Although the definitive prebiotic synthesis of thiamine diphosphate remains unclear, preliminary investigations conducted by Aylward (2006) and Aylward & Bofinger (2006) suggest its presence in the prebiotic world. Its nucleotide nature (White, 1975) and the existence of its universal riboswitch (Barrick and Breaker, 2007) provide compelling evidence of its potential status as an ancient coenzyme.

The Ancient coenzymes represent the most abundant class of our PDB dataset, dominated by ATP, NAD, Heme, FAD, SAM, and Coenzyme A structures and amounting to 94% of all analyzed structures in our database grouped by UniProt codes. Within the enzyme E.C. classification, oxidoreductases and transferases represent the classes with most abundant coenzyme content. While the LUCA, Post-LUCA and Unclassified coenzymes are typically found in specific enzyme classes, the Ancient coenzymes are distributed across all the E.C. classes (Supplementary Fig. 1).

Distribution of amino acids in the coenzyme binding sites

We hypothesized that the evolutionary significance of individual coenzyme classes would be reflected in distinct amino acid binding propensities as a smaller “early” protein alphabet apparently preceded its canonical version. The abundance of residues that compose each coenzyme binding site was analyzed and examined with respect to the order by which individual amino acids have been reported to enter the protein canonical alphabet (Higgs and Pudritz 2009) (Fig. 3; Supplementary Table 2). The binding site composition for both 90% and 30% identity datasets revealed that the occupancy of early amino acids is higher in ancient coenzyme binding sites and tends to decrease in LUCA and Post-LUCA cofactor sites (Fig. 3, Supplementary Table 2). Overall, for the 90% dataset the average occupancy of early vs. late amino acids in the ancient sites is 61% vs 39% while this ratio decreases to 53% vs 47% for the LUCA and 47% vs 53% in post-LUCA sites. These numbers follow the same trend for the 30% identity dataset and throughout the rest of the analysis, the 90% identity dataset – which includes higher number of proteins - was evaluated for more robust statistical analysis. To examine the impact that the distribution of amino acids in the coenzyme binding sites has on the binding modes, the interactions between coenzymes and individual amino acids were further inspected.

Early versus late amino acid composition of the coenzyme binding sites, categorized according to the evolutionary ages of coenzymes. Early amino acids are shown in color blue and late residues in red. The dashed line corresponds to the proportion of early vs. late amino acids within the UniProt composition of the sequences derived from our database (67% early and 33% late residues). The statistical significance of the early versus late amino acid composition was assessed by a Chi-squared test (P < 0.0001). Detailed statistical data are listed in Supplementary Table 6.

Interaction types between coenzymes and proteins

First, backbone vs side chain protein interactions of all the coenzyme classes were mapped (Fig. 4A). As expected, most of the interactions with coenzymes are mediated by amino acid side chains (61 %), frequently in combination with backbone (24%) (Fig. 4). Nevertheless, purely backbone interactions prevail in ancient coenzymes (24 %) (Fig. 4A). When backbone interactions are present throughout the different coenzyme classes, they are dominated by the early amino acids (Fig. 4B).

Binding of coenzymes with early and late amino acids by backbone and side chain atoms. “Backbone” interactions refer to residues in the coenzyme binding sites that interact purely through amino acid backbone atoms. “Side chain” interactions involve residues that interact solely via side chain atoms. “Backbone & Side chain” residues are those that interact with the coenzyme using both their backbone and side chain atoms. (A) Abundance of amino acids in individual studied coenzymes. “Backbone & Side chain” interactions are not depicted. Unclassified cofactors are in gray, Post-LUCA in yellow, LUCA in cyan and Ancient in purple. Amino acids are ranked by the order of addition of amino acids to the genetic code (Higgs and Pudritz, 2009). (B) Proportion of early versus late residues in coenzyme categories by interaction type. In each coenzyme category, the individual proportions add up to 100%. The amino acid composition was normalized by the percentage of late residues from the UniProt sequences retrieved from our database. The statistical significance of early versus late amino acid composition for each interaction type per coenzyme temporality was determined by a Chi-squared test (*, P < 0.05; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001) . For detailed statistical analysis, refer to Supplementary Table 7.

Next, we inspected the interaction types for each amino acid-coenzyme binding event employing Arpeggio (Jubb et al., 2017). The analysis revealed that electrostatic interactions are dominant in all coenzyme ages (Supplementary Fig. 2). In ancient cofactors, electrostatic interactions are more frequently mediated by early residues. This trend is more significant for the nucleotide-derived ancient coenzymes. The second most prevalent interaction type is Van der Waals for ancient cofactors while hydrophobic interactions are similarly frequent as Van der Waals in the LUCA and post-LUCA classes.

Structural properties of cofactor binding sites

Structural properties of proteins have been observed to change during eons of life’s evolution (Edwards et al., 2013; Lupas and Alva, 2017; Kovacs et al., 2017) To map its interdependence with binding of cofactors, the secondary structure and fold classes of individual coenzyme binding sites were analyzed here.

There are detectable differences in the binding site secondary structure content among the coenzyme classes (Fig. 5, Supplementary Fig. 3). While loops and helices dominate all the binding sites, they are less represented in the Ancient and LUCA coenzyme sites which are more rich in beta-sheet structures (Fig. 5). This distinction is found only on the level of the binding sites and not preserved on the level of the overall protein structure.

Secondary structure content in coenzyme binding sites. Composition of secondary structural elements in amino acids interacting with coenzymes. The PDB category represents secondary structure content across the dataset for comparison with coenzyme binding sites. Additional statistical analyses are shown in Supplementary Table 8.

To explore the fold diversity of domains containing the coenzyme binding sites, we assigned their ECOD X-groups (Cheng et al., 2014) at a residue level (Supplementary Table 3). In total, 101 groups were identified. The ancient coenzymes are associated with higher numbers of different X-groups than the LUCA and post-LUCA cofactors (Fig. 6). Some coenzymes stand out by their large number of associated folds: the ancient ATP (74); Coenzyme A (34); NAD (30); Heme (27) and the unclassified cofactor Glutathione (25).

Fold diversity of coenzyme binding sites. (A) Folds represented by ECOD X-groups, according to numbers of coenzyme binding sites. (B) Comparison of numbers of ECOD X-groups vs. UniProt entries per cofactor class.

The most frequently observed X-groups include Rossmann-like, Alpha-beta plaits, TIM beta/alpha-barrel, Flavodoxin-like, cradle loop barrel, HUP domain-like and beta-Grasp (Fig. 6). Among these, Rossmann-like, TIM beta/alpha-barrel and Flavodoxin-like bind to coenzymes of all ages.

Coenzyme early vs late binding sites

To further explore whether extant proteins can bind enzymes only by early or only by late residues (featuring early vs late binding sites), we looked for these specific cases and analyzed their evolutionary conservation.

We found 25 PDB entries that contain at least one chain bound to coenzymes solely by early amino acids (Fig. 7; Supplementary Fig. 4). Those structures correspond to 17 different proteins, represented by unique UniProt codes. The full set of those proteins bind exclusively ancient coenzymes: ATP, NAD and Phosphopantetheine. In comparison, 15 PDB entries, representing 12 unique proteins, bind coenzymes only by late amino acids (Fig. 7; Supplementary Fig. 4). These examples include all Ancient-to-Post-Luca coenzymes: ATP, CoA, NAD, PLP, biotin, and ascorbic acid.

Examples of coenzyme binding solely through early or late amino acids. (A) Coenzymes bound exclusively by early residues (AMP bound by ATP-phosphoribosyltransferase. PDB code 6czm (chain B) created by LIGPLOT (Laskowski and Swindells, 2011). (B) Coenzyme, entirely bound by late residues (Ascorbic acid bound by Hyaluronate lyase. PDB code 1f9g (chain A), created by LIGPLOT).

To assess the conservation of amino acids in these specific binding sites we used ConSurf (Ashkenazy et al., 2010; Ashkenazy et al., 2016). According to the analysis, both the early and late binding sites are relatively highly conserved. Around 60% of the residues from both cases have conservation scores ≥7. Furthermore, we employed the MAX AA parameter, that represents the most abundant residue in the multiple sequence alignment of all homologs. 76 vs 72% of the residues in the early vs late binding sites are the same, which suggests their evolutionary conservation in both cases.

Coenzyme binding mediated by metal ions

Because of the significance of metal ions in both extant and early life, we also analyzed coenzyme binding via metal ions (Supplementary Table 4). Notably, this phenomenon is more frequent in ancient cofactors, constituting approximately 24% of coenzyme binding sites that have at least one metal ion. Younger cofactors exhibit a lower requirement for metal ion binding. LUCA coenzymes exhibit a metal ion binding in approximately 13% of instances, while in post-LUCA, 11% and in Unclassified coenzymes, about 23%. Certain coenzymes exhibit a notably high percentage of cases reliant on at least one Mg2+ ion (76 % in case of Thiamine diphosphate and 55% in case of ATP binding). The subsequent most prevalent mediating ion is Ca2+, found along with 65 % cases of the LUCA coenzyme Pyrroloquinoline Quinone. Following Ca2+, the next most frequent metal ions mediating coenzyme binding are Mn2+ and Fe2+.

Discussion

Enzymatic activities rely heavily on interplay with organic cofactors. Those are found at the very heart of cellular metabolism and some of them quite possibly branch deep to life’s early start (White, 1976). The core chemistries of the most abundant and ancient coenzymes have been repeatedly detected in material and experiments mimicking prebiotic environments (Miller and Schlesinger, 1993; Keefe et al., 1995; Holliday et al., 2007; Kirschning, 2021; Menor-Salván et al., 2022; Pinna et al., 2022). Along with metal ions and minerals, some of the extant coenzymes could probably catalyze metabolic reactions in the absence of enzymes, before their emergence (Muchowska et al. 2020; Henriques Pereira et al. 2022; Cvjetan et al., 2023; Dherbassy et al. 2023). When and how coenzymes seeded the functional hubs of today’s enzymes represents a fundamental bridge between prebiotic chemistry and biochemistry and therefore one of the central questions in the study of life’s origins (Preiner et al. 2020).

The aim of this study was to resolve this conundrum by analyzing protein-coenzyme interactions throughout PDB with respect to amino acid and coenzyme evolutionary age (Fig. 2). While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.

Ancient coenzymes are abundant in extant life and bind more frequently to early amino acids

We find that an absolute majority (94 %) of extant coenzymes that appear in PDB structures is conserved across all the life’s domains and available already through prebiotic syntheses (i.e. ancient). This class of coenzymes bears many molecules that are derived from nucleotides (such as ATP, FADH, NADH) and is spread throughout all the E.C. classes of protein catalysts. This is the first obvious distinction from the other (less-populated) classes of coenzymes and supports the coenzyme-peptide significance from the earliest life until today (White, 1976)

There are several outstanding differences in the properties of binding to proteins among the different coenzyme classes. First, the ancient coenzymes bind to proteins more frequently via early amino acids. On average, this interaction presents 61 % of all ancient coenzyme bonds to proteins. It is only 53 and 38 % for the LUCA and post-LUCA coenzymes, respectively.

While the ancient nucleotide-derived and unclassified coenzyme binding sites are dominated by residues in the loop conformation (Fig. 5; Supplementary Fig. 3), there is also a substantially higher frequency of residues in beta-sheet conformations when compared to post-LUCA coenzyme sites. Those, on the other hand, are dominated by alpha-helical conformations. Loops exhibit greater sequence variability compared to ordered structures, and their flexible nature enables them to undergo structural changes (Kessel and Ben-Tal, 2018, Tokuriki and Tawfik, 2009). Such properties of protein active sites were previously associated with evolvability and promiscuity (Tokuriki and Tawfik, 2009, Corbella et al., 2023). The role of loops could thus be important for the flexibility and versatility of early peptide-coenzyme binding sites. It has been noted that evolutionary benefits would be presented by sequences that could adopt closed loop conformations, providing stability and protection to early coenzymes and such hubs could truly transition the peptide-coenzyme world towards primordial enzymes (Goncearenco and Berezovsky 2011, Gamiz-Arco et al., 2021; Toledo-Patiño et al., 2022; Gutierrez-Rus et al., 2023). Besides sequences without regular secondary structure elements, beta-sheets have been considered a more prevalent and significant motif during early stages of protein evolution than alpha helices. Beta-sheet represents the first structural motif in models of the ribosomal evolution, and it has also been observed as a mildly enriched motif in sequences formed from early amino acids (Brack and Orgel 1975, Lupas and Alva 2017, Kovacs et al., 2017, Tretyachenko et al., 2022). Ancient coenzyme-peptide binding properties support the scenario of its significance during early stages of protein evolution.

Ancient coenzymes bind to proteins through more backbone interactions, typically assigned to early amino acids

While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). A specific example of such a scenario was recently reported, where a dihydrofolate reductase activity was supported purely by protein backbone-coenzyme interactions (Lemay-St-Denis and Pelletier, 2023). Finally, Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone.

Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay. To address this intriguing possibility, we next searched whether there are such examples in the PDB dataset, where coenzymes would be bound exclusively by early amino acids. We found 17 such proteins where all the coenzymes belonged to the ancient class (such as ATP and NAD). Together with all of the above, this finding supports the possibility that peptide-coenzyme functional hubs could have originated before the evolution of the full canonical amino acid alphabet.

Reinforcing this, we have recently demonstrated on a select example of a ribosomal RNA-binding domain, that a negatively charged variant of the protein composed of only early amino acids is indeed capable of binding to RNA (Giacobelli et al., 2022). In that case, the interaction is further supported by metal ions that were not present at the binding interface of the wild-type protein. Interestingly, the same trend was observed here throughout the PDB dataset. 24% of the ancient coenzymes in PDB are additionally mediated by at least one metal ion. LUCA and post-LUCA coenzymes involved metal ions only in 13 and 11 %, respectively. It is quite probable that not all the metal ion densities are recognized or fully resolved in all the PDB structures that were used in our analysis. Nevertheless, we hypothesize that the overall trend can be attributed to the inherent negative charge of many of the ancient coenzymes, necessitating engagement of positively charged metal ions. Along with the general adaptive properties of late amino acids in expanding the chemistry space, the late amino acids would supplement the positive charge in the residue side chains (Ilardo and Freeland, 2014).

Coenzymes could have served as bridges towards protein structural and functional sovereignty from the peptide-nucleotide world

Our study further revealed that ancient coenzymes stand out in the variety of protein structures that they bind to, as represented by the ECOD X-groups. While this may partly be caused by their general over-representation in our dataset, the significance of coenzymes has been pointed out previously throughout the most ancient protein folds, such as P-loop NTPases, TIM beta/alpha-barrels, OB and Rossmann folds (Caetano-Anollés et al., 2017; Goldman et al. 2013; Longo et al., 2020(a); Longo et al., 2020(b)). Phosphate-containing coenzymes truly stand out by their large number of associated folds (e.g. 74 different ECOD X-groups in case of ATP).

It has been postulated that phosphate-binding loops served as the most significant precursors for contemporary enzymes (Romero Romero et al., 2018). Combining ancestral sequence reconstruction and selection/protein design, short polypeptide motifs capable of poly/nucleotide binding have been recovered from the P-loop NTPase and HhH motifs, relying primarily on early amino acids (Longo et al., 2020 (b); Romero Romero et al., 2018). Both demonstrated that these ancient motifs are highly robust to sequence variations, implying that such interaction can be encountered more easily than previously thought (Longo et al., 2020 (b); Keefe and Szostak, 2001). Provokingly, it has also been implied that many of these motifs emerged initially as polynucleotide binders and started serving catalysis only after gaining higher structural complexity (Romero Romero et al., 2018).

Our entry search of PDB coenzymes also retrieved 80 RNA structures. While these were not the primary subject of our analysis, the majority of those were found to interact with ancient coenzymes, mainly the nucleotide derived ones, such as ATP, FMN, NAD, SAM (Table 5). While some of these structures are assigned as riboswitches, other coenzyme-RNA complexes belong to ribozymes, representing the potential of polynucleotides in early catalysis as discussed by many previous studies (White, 1976; Gilbert, 1986; Reyes-Prieto et al., 2012; Goldman and Kacar 2021). If peptide-polynucleotide interactions were initially more feasible and dominant (in a putative peptide-polynucleotide world, as implied above), coenzymes could have played a key role in resolving the sovereignty of these molecules towards their tertiary structures and catalytic functions. Polypeptide-coenzyme catalysts would soon dominate in performance (enabling more efficient catalysis) and functional repertoires, especially those that would be hard to facilitate in their absence, such as oxidoreductases and transferases (Fried et al., 2022; Goldman and Kacar 2021; Kessel and Ben-Tal 2018).

Limitations of our work

The first obvious drawback of this analysis is the ambiguity that accompanies the division of coenzymes into evolutionary age classes. While the prebiotic availability (or lack of) is quite consensual in some cases, there are also contradictory studies and opinions in the case of some other coenzymes. Some of the coenzymes have e.g. prebiotic precursors but are not present across all the kingdoms of life. This may suggest that the coenzyme became important only post-LUCA but it can also mean that its importance was only preserved in specific branches of life. “Unclassified” category has been included for such specific cases (presented e.g. by glutathione). Several properties of glutathione that were identified here (such as protein backbone vs. sidechain interaction, ubiquity across all E.C. enzyme classes and a high number of associated X-groups) would suggest that it is closer to ancient coenzymes. The other “classified” coenzymes were categorized based on prevailing studies although some ambiguity remains. Despite our effort, the evolutionary ratings of coenzymes (and amino acids) are therefore not always clear cut and not all belong to the categories with the same weight (e.g. some are likely to be more ancient than others). For example, the nucleotide-derived coenzymes probably predate the others of the ancient class - it has been proposed that S-adenosylmethionine emerged before the more complex heme-related porphyrin adenosylcobalamin or coenzyme B12 functionalities (Lazcano, 2013).

Another possible bias of our study stems from the population differences among the three coenzyme classes. The ancient coenzymes are by far the most abundant class in the PDB dataset. It can be argued that this is the result of their ∼4 billion years of essentiality to life (Goldman and Kacar 2021). Nevertheless, it may as well be contributed by the bias of structures that are deposited in the PDB and most probably does not reflect the true distribution of coenzymes in the biological protein space. Additionally, the comparison of differentially populated classes has challenged some aspects of the analysis presented here although care has been taken to perform all the appropriate statistical tests.

Conclusions

The findings presented here propose that early (the less complex and prebiotically plausible) amino acids are sufficient for binding to ancient coenzymes. Consequently, coenzyme-peptide interactions might have been conceivable at a time when the amino acid alphabet was not yet evolved to its current form. In such interactions, binding modes would rely more on the protein backbone atoms and on involvement of metal ions, both of which are less frequent in interactions with evolutionarily young coenzymes.

Methodology

Identification of organic cofactors

We systematically identified all available cofactor and cofactor-like molecules in the Protein Data Bank in Europe (wwPDB Consortium, 2020) programmatically through the PDBe REST API (pdbe.org/api) (Mukhopadhyay et al., 2019). All cofactor molecules were classified into 27 classes based on the CoFactor database (Fischer et al., 2010). Furthermore, we included ATP and its analogs as an additional cofactor class.

The identification of all the available ligand codes from the PDB chemical component dictionary for each cofactor class was achieved by programmatic access through the “Cofactors” endpoint (https://www.ebi.ac.uk/pdbe/api/pdb/compound/cofactors) using the PDBe REST API (pdbe.org/api) and all responses were in JSON format.

Structural database and classifications

We retrieved the PDB entries associated to each chemical component from every cofactor class using the “PDB entries containing the compound” endpoint (https://www.ebi.ac.uk/pdbe/api/pdb/compound/in_pdb/:id) via the Entry-based API. The count of PDBe entries for each cofactor class is provided in the supplementary information (Supplementary file 1). The information from the REST API was unavailable for two coenzyme classes, MIO and Orthoquinone, so they were excluded from the analysis.

The secondary structure assignments and (EC) numbers for all PDB structures analyzed were determined through residue-level cross references obtained from the SIFTS XML files (Velankar et al., 2013; Dana et al., 2019). Secondary structure elements include “h” for helix, “b” for strand, and “c” for coil; and they correspond to the information available in the PDBe website. Only observed residues were examined.

Furthermore, we assigned all our PDB entries to the ECOD hierarchical system groups “X”, “H” and “F” (Cheng et al., 2014).

UniProt assignment and interaction ratio

The assignment of UniProt codes to our structural dataset was achieved by mapping the information with the SIFTS (Velankar et al., 2013; Dana et al., 2019) file “pdb_chain_uniprot.tsv.gz” (https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html). Next, we mapped the UniProt residue to each of the PDB structures with the residue-level cross-reference data of SIFTS by retrieving the XML files (https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html).

Each UniProt code represents a unique protein sequence that encompasses one or various PDBe associated entries. With the aim of filtering those residues relevant to the interaction sites at the level of protein sequence, we incorporated the interaction ratio. The interaction ratio is a measure of the interaction for each ligand with all its PDB associated entries by UniProt residue. Those residue-ligand interactions that were preserved in more than 50% of the associated PDB entries were selected (we call the ratio of preserved interactions among structures of one unique protein interaction ratio).

Upon UniProt residues assignment for each residue of the PDB structures, we downloaded the calculated interaction ratio with the endpoint “UniProt-Get ligand binding residues for a UniProt accession” (https://www.ebi.ac.uk/pdbe/graph-api/uniprot/ligand_sites/:accession).

The redundancy of our database was removed by clustering the UniProt sequences using CD-HIT (Li and Godzik, 2006) with a 90% sequence identity parameter.

Analysis of coenzyme interactions

In order to analyze the amino acid-coenzyme interactions, we downloaded information of all bound molecules found in a given PDB entry using the “Get bound molecules” endpoint (https://www.ebi.ac.uk/pdbe/graph-api/pdb/bound_molecule_interactions/:pdbId/:bmid) of the Aggregated API. Then, we retrieved the ligand interactions for each bound molecule in every entry through the “PDB-Get bound ligand interactions” endpoint (https://www.ebi.ac.uk/pdbe/graph-api/pdb/bound_ligand_interactions/:pdbId/:chain/:seqId), which calculates these interactions with Arpeggio (Jubb et al., 2017). The retrieved interactions included the standard amino acid codes, water molecules and metal ions.

We classified all the interactions reported by Arpeggio into nine distinct interaction types. The classification scheme aligns with the one used by PDBe and encompasses the following categories: i) “covalent”; ii) “electrostatic”, which combines “ionic”, “hbond”, “weak_hbond”, “polar”, “weak_polar”, “xbond” and “carbonyl”; iii) “amide”, consisting in “AMIDEAMIDE” and “AMIDERING”; iv) “vdw”, denoting van der Waals interactions; v) “hydrophobic”; vi) “aromatic”, grouping “aromatic”, “FF”, “OF”, “EE”, “FT”, “OT”, “ET”, “FE”, “OE” and “EF” contacts; vii) “atom-pi”, comprised of “CARBONPI”, “CATIONPI”, “DONORPI”, “HALOGENPI”, and “METSULPHURPI”; viii) “metal” and ix) “clashes”, including “clash” and “vdw_clash” contacts. We have omitted this last category due to the limited number of interactions, most of which result from experimental errors during X-ray diffraction.

Backbone and side chains interactions were identified based on the atom identities in the coenzyme binding sites. Those atoms corresponding to the backbone of standard amino acids were identified as: “N”, “C”, “CA”, “O”. Glycine has only a hydrogen atom as its side chain; nevertheless, no side chain atom mediating any interaction was identified.

Secondary structure analysis

Statistical analysis of secondary structure content was conducted at the UniProt level. For each residue within every UniProt entry, we considered all potential secondary structure elements derived from the PDB structures associated with each UniProt code. Subsequently, we eliminated redundancy on a per-residue basis. This methodological approach enabled us to comprehensively encompass the structural diversity at each position of the protein.

Interactions mediated exclusively by early or late amino acids

To examine proteins that interacted with cofactors solely through early or late amino acids, we filtered the data to include only proteins interacting with at least two amino acids.

For the assessment of the evolutionary conservation of coenzyme-binding amino acids, we employed ConSurf (Ashkenazy et al., 2010; Ashkenazy et al., 2016). Specifically, we analyzed the msa_positional_aa_frequency files generated for each PDB structure.

Supporting information

Supplementary material

Supplementary file 1

Supplementary Table 1

Supplementary Table 2

Supplementary Table 3

Supplementary Table 4

Supplementary Tables 5-8

Acknowledgements

This work was supported by the Human Frontier Science Program grant HFSP-RGEC27/2023 and was carried out with the support of ELIXIR CZ Research Infrastructure (ID LM2023055, MEYS CR). A.C.S.R. and M.M. acknowledge support by the project the “Grant Schemes at CU” (reg. no. CZ.02.2.69/0.0/0.0/19_073/0016935), project no. START/SCI/148. Finally, we would like to thank Prof. Stephen Freeland and Prof. Janet Thornton for helpful discussions on this manuscript.

References

1. Alva V
2. Söding J
3. Lupas AN
2015A vocabulary of ancient peptides at the origin of folded proteinseLife 4:e09410.https://doi.org/10.7554/eLife.09410 PubMed Google Scholar
1. Aylward N.
2006An ab initio computational study of thiamin synthesis from gaseous reactants of the interstellar mediumBiophysical Chemistry 121:185–193https://doi.org/10.1016/j.bpc.2005.12.018 PubMed Google Scholar
1. Aylward N
2. Bofinger N.
2006A plausible prebiotic synthesis of pyrdoxal phosphate: Vitamin B-6 - A computational studyBiophysical Chemistry 123:113–121https://doi.org/10.1016/j.bpc.2006.04.014 PubMed Google Scholar
1. Ashkenazy H
2. Erez E
3. Martz E
4. Pupko T
5. Ben-Tal N.
2010ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acidsNucleic Acids Res 38:W529–33https://doi.org/10.1093/nar/gkq399 PubMed Google Scholar
1. Ashkenazy H
2. Abadi S
3. Martz E
4. Chay O
5. Mayrose I
6. Pupko T
7. Ben-Tal N.
2016ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromoleculesNucleic Acids Res 44:W344–50https://doi.org/10.1093/nar/gkw408 PubMed Google Scholar
1. Barrick JE
2. Breaker R.
2007The distributions, mechanisms, and structures of metabolite-binding riboswitchesGenome Biology 8:R239https://doi.org/10.1186/gb-2007-8-11-r239 PubMed Google Scholar
1. Bonfio C
2. Valer L
3. Scintilla S
4. Shah S
5. Evans D
6. Jin L
7. Szostak J
8. Sasselov D
9. Sutherland J
10. Mansy S.
2017UV-light-driven prebiotic synthesis of iron-sulfur clustersNature Chemistry 9:1229https://doi.org/10.1038/nchem.2817 PubMed Google Scholar
1. Burton AS
2. Stern JC
3. Elsila JE
4. Glavin DP
5. Dworkin JP
2012Understanding prebiotic chemistry through the analysis of extraterrestrial amino acids and nucleobases in meteoritesChemical Society reviews 41:5459–72https://doi.org/10.1039/c2cs35109a PubMed Google Scholar
1. Brack A
2. Orgel LE
1975Beta structures of alternating polypeptides and their possible prebiotic significanceNature 256:383–7https://doi.org/10.1038/256383a0 PubMed Google Scholar
1. Caetano-Anollés G.
2. Hee SK
3. Mittenthal JE
2007The origin of modern metabolic networks inferred from phylogenomic analysis of protein architectureProceedings of the National Academy of Sciences of the United States of America 104:9358–9363https://doi.org/10.1073/pnas.0701214104 PubMed Google Scholar
1. Cheng H
2. Schaeffer RD
3. Liao Y
4. Kinch LN
5. Pei J
6. Shi S
7. Kim BH
8. Grishin NV
2014ECOD: an evolutionary classification of protein domainsPLoS Computational Biol 10:e1003926.https://doi.org/10.1371/journal.pcbi.1003926 PubMed Google Scholar
1. Chu XY
2. Zhang HY
2020Cofactors as molecular fossils to trace the origin and evolution of proteinsChemBioChem 21:3161–3168https://doi.org/10.1002/cbic.202000027 PubMed Google Scholar
1. Cvjetan N
2. Schuler L
3. Ishikawa T
4. Walde P.
2023Optimization and Enhancement of the Peroxidaselike Activity of Hemin in Aqueous Solutions of Sodium DodecylsulfateACS Omega 8:42878–42899https://doi.org/10.1021/acsomega.3c05915 Google Scholar
1. Cleaves HJ
2010The origin of the biologically coded amino acidsJournal of Theoretical Biology 263:490–498https://doi.org/10.1016/j.jtbi.2009.12.014 PubMed Google Scholar
1. Copley SD
2. Dhillon JK
2002Lateral gene transfer and parallel evolution in the history of glutathione biosynthesis genesGenome Biology 3:1–16https://doi.org/10.1186/gb-2002-3-5-research0025 PubMed Google Scholar
1. Corbella M
2. Pinto GP
3. Kamerlin SCL
2023Loop dynamics and the evolution of enzyme activityNature Reviews Chemistry 7:536–547https://doi.org/10.1038/s41570-023-00495-w PubMed Google Scholar
1. Dana JM
2. Gutmanas A
3. Tyagi N
4. Qi G
5. O’Donovan C
6. Martin M
7. Velankar S.
2019SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteinsNucleic Acids Res 47:D482–D489https://doi.org/10.1093/nar/gky1114 PubMed Google Scholar
1. Dherbassy Q
2. Mayer R
3. Muchowska K
4. Moran J.
2023Metal-Pyridoxal Cooperativity in Nonenzymatic TransaminationJournal of the American Chemical Society 145:13357–13370https://doi.org/10.1021/jacs.3c03542 PubMed Google Scholar
1. Edwards H
2. Abeln S
3. Deane CM
2013Exploring Fold Space Preferences of New-born and Ancient Protein SuperfamiliesPLoS Computational Biology 9:e1003325.https://doi.org/10.1371/journal.pcbi.1003325 PubMed Google Scholar
1. Fischer JD
2. Holliday GL
3. Thornton JM
2010The CoFactor database: Organic cofactors in enzyme catalysisBioinformatics 26:2496–2497https://doi.org/10.1093/bioinformatics/btq442 PubMed Google Scholar
1. Frenkel-Pinter M
2. Haynes JW
3. Martin C
4. Petrov AS
5. Burcar BT
6. Krishnamurthy R
7. Hud N
8. Leman L
9. Williams LD
2019Selective incorporation of proteinaceous over nonproteinaceous cationic amino acids in model prebiotic oligomerization reactionsProceedings of the National Academy of Sciences of the United States of America 116:16338–16346https://doi.org/10.1073/pnas.1904849116 PubMed Google Scholar
1. Frenkel-Pinter M
2. Mousumi S
3. Ashkenasy G
4. Leman L.
2020Prebiotic Peptides: Molecular Hubs in the Origin of LifeChemical Reviews 120:4707–4765https://doi.org/10.1021/acs.chemrev.9b00664 PubMed Google Scholar
1. Fried SD
2. Fujishima K
3. Makarov M
4. Cherepashuk I
5. Hlouchova K.
2022Peptides before and during the nucleotide world: An origins story emphasizing cooperation between proteins and nucleic acidsJournal of the Royal Society Interface 19:20210641https://doi.org/10.1098/rsif.2021.0641 PubMed Google Scholar
1. Gamiz-Arco G
2. Gutierrez-Rus LI
3. Risso VA
4. Ibarra-Molero B
5. Hoshino Y
6. Petrović D
7. Justicia J
8. Cuerva JM
9. Romero-Rivera A
10. Seelig B
11. Gavira JA
12. Kamerlin SCL
13. Gaucher EA
14. Sanchez-Ruiz JM
2021Heme-binding enables allosteric modulation in an ancient TIM-barrel glycosidaseNature Communications 12:380https://doi.org/10.1038/s41467-020-20630-1 PubMed Google Scholar
1. Giacobelli VG
2. Fujishima K
3. Lepšík M
4. Tretyachenko V
5. Kadavá T
6. Makarov M
7. Hlouchová K.
2022In Vitro Evolution Reveals Noncationic Protein-RNA Interaction Mediated by Metal IonsMolecular Biology and Evolution 39:1–11https://doi.org/10.1093/molbev/msac032 PubMed Google Scholar
1. Gilbert W.
1986The RNA world superlattices point aheadNature 319:618Google Scholar
1. Goldman AD
2. Bernhard TM
3. Dolzhenko E
4. Landweber LF
2013LUCApedia: A database for the study of ancient lifeNucleic Acids Res 41:1079–1082https://doi.org/10.1093/nar/gks1217 PubMed Google Scholar
1. Goldman AD
2. Kacar B.
2021Cofactors are Remnants of Life’s Origin and Early EvolutionJournal of Molecular Evolution 89:127–133https://doi.org/10.1007/s00239-020-09988-4 PubMed Google Scholar
1. Goncearenco A
2. Berezovsky IN
2011Prototypes of elementary functional loops unravel evolutionary connections between protein functionsBioinformatics 27:i497.https://doi.org/10.1093/bioinformatics/btq374 PubMed Google Scholar
1. Gutierrez-Rus LI
2. Gamiz-Arco G
3. Gavira JA
4. Gaucher EA
5. Risso VA
6. Sanchez-Ruiz JM
2023Protection of Catalytic Cofactors by Polypeptides as a Driver for the Emergence of Primordial EnzymesMolecular Biology and Evolution 40:1–8https://doi.org/10.1093/molbev/msad126 PubMed Google Scholar
1. Henriques DP
2. Leethaus J
3. Beyazay T
4. do Nascimento A
5. Kleinermanns K
6. Tüysüz H
7. Martin W
8. Preiner M.
2022Role of geochemical protoenzymes (geozymes) in primordial metabolism: specific abiotic hydride transfer by metals to the biological redox cofactor NAD+FEBS Journal 289:3148–3162https://doi.org/10.1111/febs.16329 PubMed Google Scholar
1. Higgs PG
2. Pudritz RE
2009A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic codeAstrobiology 9:483–490https://doi.org/10.1089/ast.2008.0280 PubMed Google Scholar
1. Holliday GL
2. Thornton JM
3. Marquet A
4. Smith AG
5. Rébeillé F
6. Mendel R
7. Schubert HL
8. Lawrence AD
9. Warren MJ
2007Evolution of enzymes and pathways for the biosynthesis of cofactorsNatural Product Reports 24:972–987https://doi.org/10.1039/b703107f PubMed Google Scholar
1. Huang F
2. Bugg CW
3. Yarus M.
4. RNA-Catalyzed CoA NAD
2000FAD synthesis from phosphopantetheine, NMN, and FMNBiochemistry 39:15548–55https://doi.org/10.1021/bi002061f PubMed Google Scholar
1. Ilardo MA
2. Freeland SJ
2014Testing for adaptive signatures of amino acid alphabet evolution using chemistry spaceJournal of Systems Chemistry 5:1–9https://doi.org/10.1186/1759-2208-5-1 Google Scholar
1. Ji HF
2. Chen L
3. Zhang HY
2008Organic cofactors participated more frequently than transition metals in redox reactions of primitive proteinsBioEssays 30:766–771https://doi.org/10.1002/bies.20788 PubMed Google Scholar
1. Jubb HC
2. Higueruelo AP
3. Ochoa-Montaño B
4. Pitt WR
5. Ascher DB
6. Blundell TL
2017Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein StructuresJ Mol Biol 429:365–371https://doi.org/10.1016/j.jmb.2016.12.004 PubMed Google Scholar
1. Keefe AD
2. Newton GL
3. Miller SL
1995A possible prebiotic synthesis of pantetheine, a precursor to coenzyme aNature 373:683–685https://doi.org/10.1038/373683a0 PubMed Google Scholar
1. Keefe AD
2. Szostak JW
2001Functional proteins from a random-sequence libraryNature 410:715–8https://doi.org/10.1038/35070613 PubMed Google Scholar
1. Kessel A
2. Ben-Tal N.
2018Introduction to proteins: structure, function, and motionCrc Press (Taylor & Francis Group) https://doi.org/10.1201/9781315113876 Google Scholar
1. Kessel A
2. Ben-Tal N.
2022From Molecules to Cells: The Origin of Life on EarthKindle E-Book Google Scholar
1. Kirschning A.
2021Coenzymes and Their Role in the Evolution of LifeAngewandte Chemie - International Edition 60:6242–6269https://doi.org/10.1002/anie.201914786 PubMed Google Scholar
1. Kirschning A.
2022On the Evolutionary History of the Twenty Encoded Amino AcidsChemistry - A European Journal 28:e202201419.https://doi.org/10.1002/chem.202201419 PubMed Google Scholar
1. Kolodny R
2. Nepomnyachiy S
3. Tawfik DS
4. Ben-Tal N.
2021Bridging Themes: Short Protein Segments Found in Different ArchitecturesMolecular Biology and Evolution 38:2191–2208https://doi.org/10.1093/molbev/msab017 PubMed Google Scholar
1. Kovacs NA
2. Petrov AS
3. Lanier KA
4. Williams LD
2017Frozen in Time: The History of ProteinsMolecular Biology and Evolution 34:1252–1260https://doi.org/10.1093/molbev/msx086 PubMed Google Scholar
1. Lane N
2. Martin WF
2012The origin of membrane bioenergeticsCell 151:1406–1416https://doi.org/10.1016/j.cell.2012.11.050 PubMed Google Scholar
1. Laurino P
2. Tóth-Petróczy Á
3. Meana-Pañeda Lin W
4. Truhlar DG
5. Tawfik DS
2016An Ancient Fingerprint Indicates the Common Ancestry of Rossmann-Fold Enzymes Utilizing Different Ribose-Based CofactorsPLoS Biology 14:1–23https://doi.org/10.1371/journal.pbio.1002396 PubMed Google Scholar
1. Laskowski RA
2. Swindells MB
2011LigPlot+: multiple ligand-protein interaction diagrams for drug discoveryJ Chem Inf Model 51:2778–86https://doi.org/10.1021/ci200227u PubMed Google Scholar
1. Lazcano A.
2013Planetary change and biochemical adaptation: Molecular evolution of corrinoid and heme biosynthesesHematology 17:s7–s10https://doi.org/10.1179/102453312X13336169155015 Google Scholar
1. Lemay-St-Denis C
2. Pelletier J.
2023From a binding module to essential catalytic activity: how nature stumbled on a good thingChem. Commun 59:12560–12572https://doi.org/10.1039/D3CC04209J Google Scholar
1. Li W
2. Godzik A.
2006Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequencesBioinformatics 22:1658–9https://doi.org/10.1093/bioinformatics/btl158 PubMed Google Scholar
1. Longo LM
2. Jabtoiiska J
3. Vyas P
4. Kanade M
5. Kolodny R
6. Ben-Tal N
7. Tawfik DS
2020On the emergence of p-loop ntpase and rossmann enzymes from a beta-alpha-beta ancestral fragmentELife 9:1–16https://doi.org/10.7554/ELIFE.64415 PubMed Google Scholar
1. Longo LM
2. Petrovic D
3. Kamerlin SCL
4. Tawfik DS
2020Short and simple sequences favored the emergence of N-helix phospho-ligand binding sites in the first enzymesProceedings of the National Academy of Sciences of the United States of America 117:5310–5318https://doi.org/10.1073/pnas.1911742117 PubMed Google Scholar
1. Lupas AN
2. Alva V.
2017Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded proteinsJournal of Structural Biology 198:74–81https://doi.org/10.1016/j.jsb.2017.04.007 PubMed Google Scholar
1. Makarov M
2. Meng J
3. Tretyachenko V
4. Srb P
5. Březinová A
6. Giacobelli VG
7. Bednárová L
8. Vondrášek J
9. Dunker K
10. Hlouchová K.
2021Enzyme catalysis prior to aromatic residues: Reverse engineering of a dephospho-CoA kinaseProtein Science 30:1022–1034https://doi.org/10.1002/pro.4068 PubMed Google Scholar
1. Menor-Salván C
2. Burcar BT
3. Bouza M
4. Fialho DM
5. Fernández FM
6. Hud NV
2022A Shared Prebiotic Formation of Neopterins and Guanine Nucleosides from Pyrimidine BasesChemistry (Weinheim an Der Bergstrasse, Germany) 28:e202200714.https://doi.org/10.1002/chem.202200714 PubMed Google Scholar
1. Miller SL
2. Schlesinger G.
1993Prebiotic syntheses of vitamin coenzymes: I. Cysteamine and 2-mercaptoethanesulfonic acid (coenzyme M)Journal of Molecular Evolution 36:302–7https://doi.org/10.1007/BF00182177 PubMed Google Scholar
1. Milner-White EJ
2. Russell MJ
2011Functional capabilities of the earliest peptides and the emergence of lifeGenes 2:671–88https://doi.org/10.3390/genes2040671 PubMed Google Scholar
1. Monteverde DR
2. Gómez-Consarnau L
3. Suffridge C
4. Sañudo-Wilhelmy SA.
2017Life’s utilization of B vitamins on early EarthGeobiology 15:3–18https://doi.org/10.1111/gbi.12202 Google Scholar
1. Muchowska KB
2. Varma SJ
3. Moran J.
2020Nonenzymatic Metabolic Reactions and Life’s OriginsChemical Reviews 120:7708–7744https://doi.org/10.1021/acs.chemrev.0c00191 PubMed Google Scholar
1. Mukhopadhyay A
2. Borkakoti N
3. Pravda L
4. Tyzack JD
5. Thornton JM
6. Velankar S.
2019Finding enzyme cofactors in Protein Data BankBioinformatics 35:3510–3511https://doi.org/10.1093/bioinformatics/btz115 PubMed Google Scholar
1. Naraoka H
2. Takano Y
3. Dworkin JP
4. Oba Y
5. Hamase K
6. Furusho A
7. Ogawa NO
8. Hashiguchi M
9. Fukushima K
10. Aoki D
11. Schmitt-Kopplin P
12. Aponte JC
13. Parker ET
14. Glavin DP
15. McLain HL
16. Elsila JE
17. Graham HV
18. Eiler JM
19. Orthous-Daunay FR
20. Wolters C
21. Isa J
22. Vuitton V
23. Thissen R
24. Sakai S
25. Yoshimura T
26. Koga T
27. Ohkouchi N
28. Chikaraishi Y
29. Sugahara H
30. Mita H
31. Furukawa Y
32. Hertkorn N
33. Ruf A
34. Yurimoto H
35. Nakamura T
36. Noguchi T
37. Okazaki R
38. Yabuta H
39. Sakamoto K
40. Tachibana S
41. Connolly HC
42. Lauretta DS
43. Abe M
44. Yada T
45. Nishimura M
46. Yogata K
47. Nakato A
48. Yoshitake M
49. Suzuki A
50. Miyazaki A
51. Furuya S
52. Hatakeda K
53. Soejima H
54. Hitomi Y
55. Kumagai K
56. Usui T
57. Hayashi T
58. Yamamoto D
59. Fukai R
60. Kitazato K
61. Sugita S
62. Namiki N
63. Arakawa M
64. Ikeda H
65. Ishiguro M
66. Hirata N
67. Wada K
68. Ishihara Y
69. Noguchi R
70. Morota T
71. Sakatani N
72. Matsumoto K
73. Senshu H
74. Honda R
75. Tatsumi E
76. Yokota Y
77. Honda C
78. Michikami T
79. Matsuoka M
80. Miura A
81. Noda H
82. Yamada T
83. Yoshihara K
84. Kawahara K
85. Ozaki M
86. Iijima YI
87. Yano H
88. Hayakawa M
89. Iwata T
90. Tsukizaki R
91. Sawada H
92. Hosoda S
93. Ogawa K
94. Okamoto C
95. Hirata N
96. Shirai K
97. Shimaki Y
98. Yamada M
99. Okada T
100. Yamamoto Y
101. Takeuchi H
102. Fujii A
103. Takei Y
104. Yoshikawa K
105. Mimasu Y
106. Ono G
107. Ogawa N
108. Kikuchi S
109. Nakazawa S
110. Terui F
111. Tanaka S
112. Saiki T
113. Yoshikawa M
114. Watanabe SI
115. Tsuda Y.
2023Soluble organic molecules in samples of the carbonaceous asteroid (162173) RyuguScience 379:eabn9033.https://doi.org/10.1126/science.abn9033 PubMed Google Scholar
1. Narunsky A
2. Kessel A
3. Solan R
4. Alva V
5. Kolodny R
6. Ben-Tal N.
2020On the evolution of proteinadenine bindingProceedings of the National Academy of Sciences of the United States of America 117:4701–4709https://doi.org/10.1073/pnas.1911349117 PubMed Google Scholar
1. Nepomnyachiy S
2. Ben-Tal N
3. Kolodny R.
2017Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengthsProceedings of the National Academy of Sciences of the United States of America 114:11703–11708https://doi.org/10.1073/pnas.1707642114 PubMed Google Scholar
1. consortium PDBe-KB
2020PDBe-KB: a community-driven resource for structural and functional annotationsNucleic Acids Res 48:D344–D353https://doi.org/10.1093/nar/gkz853 PubMed Google Scholar
1. Pinna S
2. Kunz C
3. Halpern A
4. Harrison SA
5. Jordan SF
6. Ward J
7. Werner F
8. Lane N.
2022A prebiotic basis for ATP as the universal energy currencyPLoS Biology 20:1–25https://doi.org/10.1371/journal.pbio.3001437 PubMed Google Scholar
1. Putignano V
2. Rosato A
3. Banci L
4. Andreini C.
2018MetalPDB in 2018: A database of metal sites in biological macromolecular structuresNucleic Acids Res 46:D459–D464https://doi.org/10.1093/nar/gkx989 PubMed Google Scholar
1. Preiner M
2. Asche S
3. Becker S
4. Betts HC
5. Boniface A
6. Camprubi E
7. Chandru K
8. Erastova V
9. Garg SG
10. Khawaja N
11. Kostyrka G
12. Machné R
13. Moggioli G
14. Muchowska KB
15. Neukirchen S
16. Peter B
17. Pichlhöfer E
18. Radványi Á
19. Rossetto D
20. Salditt A
21. Schmelling NM
22. Sousa FL
23. Tria FDK
24. Vörös D
25. Xavier JC
2020The future of origin of life research: Bridging decades-old divisionsLife 10:20https://doi.org/10.3390/life10030020 PubMed Google Scholar
1. Qiu K
2. Ben-Tal N
3. Kolodny R.
2022Similar protein segments shared between domains of different evolutionary lineagesProtein Science 31:e4407.https://doi.org/10.1002/pro.4407 PubMed Google Scholar
1. Reyes-Prieto F
2. Hernández-Morales R
3. Jácome R
4. Becerra A
5. Lazcano A.
2012Coenzymes, viruses and the RNA worldBiochimie 94:1467–1473https://doi.org/10.1016/j.biochi.2012.01.004 PubMed Google Scholar
1. Romero Romero ML
2. Yang F
3. Lin YR
4. Toth-Petroczy A
5. Berezovsky IN
6. Goncearenco A
7. Yang W
8. Wellner A
9. Kumar-Deshmukh F
10. Sharon M
11. Baker D
12. Varani G
13. Tawfik DS
2018Simple yet functional phosphate-loop proteinsProceedings of the National Academy of Sciences of the United States of America 115:E11943–E11950https://doi.org/10.1073/pnas.1812400115 PubMed Google Scholar
1. Russell MJ
2. Hall AJ
1997The emergence of life from iron monosulphide bubbles at a submarine hydrothermal redox and pH frontJ Geol Soc London 154:377–402https://doi.org/10.1144/gsjgs.154.3.0377 PubMed Google Scholar
1. Seitz C
2. Eisenreich W
3. Huber C.
2021The abiotic formation of pyrrole under volcanic, hydrothermal conditions—an initial step towards life’s first breath?Life 11:1–10https://doi.org/10.3390/life11090980 PubMed Google Scholar
1. Söding J
2. Lupas AN
2003More than the sum of their parts: On the evolution of proteins from peptidesBioEssays 25:837–846https://doi.org/10.1002/bies.10321 PubMed Google Scholar
1. Thauer RK
2. Bonacker LG
1994Biosynthesis of coenzyme F430, a nickel porphinoid involved in methanogenesisCiba Found Symp 180:210–22https://doi.org/10.1002/9780470514535.ch12 PubMed Google Scholar
1. The UniProt Consortium
2023UniProt: the Universal Protein Knowledgebase in 2023Nucleic Acids Res 51:D523–D531https://doi.org/10.1093/nar/gkac1052 PubMed Google Scholar
1. Toledo-Patiño S
2. Pascarelli S
3. Uechi GI
4. Laurino P.
2022Insertions and deletions mediated functional divergence of Rossmann fold enzymesProceedings of the National Academy of Sciences of the United States of America 119:e2207965119.https://doi.org/10.1073/pnas.2207965119 PubMed Google Scholar
1. Tokuriki N
2. Tawfik DS
2009Protein dynamism and evolvabilityScience 324:203–7https://doi.org/10.1126/science.1169375 PubMed Google Scholar
1. Tretyachenko V
2. Vymětal J
3. Neuwirthová T
4. Vondrášek J
5. Fujishima K
6. Hlouchová K.
2022Modern and prebiotic amino acids support distinct structural profiles in proteinsOpen Biol 12:220040https://doi.org/10.1098/rsob.220040 PubMed Google Scholar
1. Trifonov EN
2000Consensus temporal order of amino acids and evolution of the triplet codeGene 261:139–151https://doi.org/10.1016/S0378-1119(00)00476-5 PubMed Google Scholar
1. Velankar S
2. Dana JM
3. Jacobsen J
4. van Ginkel G
5. Gane PJ
6. Luo J
7. Oldfield TJ
8. O’Donovan C
9. Martin MJ
10. Kleywegt GJ
2013SIFTS: Structure Integration with Function, Taxonomy and Sequences resourceNucleic Acids Res 41:D483–9https://doi.org/10.1093/nar/gks1258 PubMed Google Scholar
1. Wächtershäuser G.
1992Groundworks for an evolutionary biochemistry: The iron-sulphur worldProgress in Biophysics and Molecular Biology 58:85–201https://doi.org/10.1016/0079-6107(92)90022-X PubMed Google Scholar
1. Weber AL
2. Miller SL
1981Reasons for the occurrence of the twenty coded protein amino acidsJournal of Molecular Evolution 17:273–84https://doi.org/10.1007/BF01795749 PubMed Google Scholar
1. White HB
1976Coenzymes as fossils of an earlier metabolic stateJournal of Molecular Evolution 7:101–104https://doi.org/10.1007/BF01732468 PubMed Google Scholar
1. White HB
1982Evolution of Coenzymes and the Origin of Pyridine Nucleotides. The Pyridine Nucleotide CoenzymesEconometrica 50:1–17https://doi.org/10.1016/b978-0-12-244750-1.50010-5 Google Scholar
1. Wong JT
2. Bronskill PM
1979Inadequacy of prebiotic synthesis as origin of proteinous amino acidsJournal of Molecular Evolution 13:115–25https://doi.org/10.1007/BF01732867 PubMed Google Scholar
1. Wu HH
2. Pun MD
3. Wise CE
4. Streit BR
5. Mus F
6. Berim A
7. Kincannon WM
8. Islam A
9. Partovi SE
10. Gang DR
11. DuBois JL
12. Lubner CE
13. Berkman CE
14. Lange BM
15. Peters JW
2022The pathway for coenzyme M biosynthesis in bacteriaProceedings of the National Academy of Sciences of the United States of America 119:e2207190119.https://doi.org/10.1073/pnas.220719011 PubMed Google Scholar
1. Zaia DA
2. Zaia CT
3. De Santana H.
2008Which amino acids should be used in prebiotic chemistry studies?Orig Life Evol Biosph 38:469–88https://doi.org/10.1007/s11084-008-9150-5 PubMed Google Scholar

Article and author information

Author information

Alma Carolina Sanchez-Rocha
Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague, 12843, Czech Republic
Mikhail Makarov
Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague, 12843, Czech Republic
Lukáš Pravda
Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Exscientia, Oxford, UK
Marian Novotný
Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague, 12843, Czech Republic
ORCID iD: 0000-0001-8788-3202
- co-corresponding authors; email: klara.hlouchova@natur.cuni.cz
Klára Hlouchová
Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague, 12843, Czech Republic, Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, 16610, Czech Republic
ORCID iD: 0000-0002-5651-4874
- co-corresponding authors; email: klara.hlouchova@natur.cuni.cz

Version history

Sent for peer review: December 4, 2023
Preprint posted: December 5, 2023
Reviewed Preprint version 1: February 26, 2024
Reviewed Preprint version 2: October 16, 2024
Version of Record published: December 4, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.94174. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 1,576
downloads: 151
citations: 5

Views, downloads and citations are aggregated across all versions of this paper published by eLife.