On the emergence of P-Loop NTPase and Rossmann enzymes from a Beta-Alpha-Beta ancestral fragment
Abstract
This article is dedicated to the memory of Michael G. Rossmann. Dating back to the last universal common ancestor, P-loop NTPases and Rossmanns comprise the most ubiquitous and diverse enzyme lineages. Despite similarities in their overall architecture and phosphate binding motif, a lack of sequence identity and some fundamental structural differences currently designates them as independent emergences. We systematically searched for structure and sequence elements shared by both lineages. We detected homologous segments that span the first βαβ motif of both lineages, including the phosphate binding loop and a conserved aspartate at the tip of β2. The latter ligates the catalytic metal in P-loop NTPases, while in Rossmanns it binds the nucleotide’s ribose moiety. Tubulin, a Rossmann GTPase, demonstrates the potential of the β2-Asp to take either one of these two roles. While convergence cannot be completely ruled out, we show that both lineages likely emerged from a common βαβ segment that comprises the core of these enzyme families to this very day.
Introduction
In 1970, Michael Rossmann reported the structure of the first αβα sandwich protein, lactate dehydrogenase (Adams et al., 1970). This NAD-utilizing enzyme would later become representative of what is now known as the ‘Rossmann fold’ (Rossmann et al., 1974). About a decade later, on the basis of a sequence analysis, another major αβα sandwich domain that utilizes phosphorylated nucleosides was proposed (Walker et al., 1982), which is now known as the P-loop NTPase, or ‘P-loop’ for short. The importance of these two evolutionary lineages, Rossmanns and P-loops, cannot be overstated: Both lineages have diversified extensively, and each is individually associated with more than 120 families and 75 different enzymatic reactions (see Methods). Furthermore, these two lineages are ubiquitous across the tree of life (Leipe et al., 2003). Accordingly, essentially all studies aimed at unraveling the history of protein evolution concluded that these enzymes emerged well before the last universal common ancestor (LUCA), and were among the very first, if not the first, enzyme families (Leipe et al., 2003; Ma et al., 2008; Edwards et al., 2013; Alva et al., 2015; Aravind et al., 2002a; Goncearenco and Berezovsky, 2015). Indeed, both P-loops and Rossmanns are dubbed nucleotide-binding domains because they both make use of phosphorylated ribonucleosides such as ATP or NAD, as well as of other pre-LUCA cofactors such as SAM (Laurino et al., 2016).
As elaborated in the next section, the P-loop and Rossmann domains share a number of similar features, but also some distinct differences. Given their pre-LUCA origin, a common P-loop/Rossmann ancestor – even if it did exist at some point – is surely lost to time. Both lineages emerged during the so-called ‘big bang’ of protein evolution, an event that marks the birth of the major protein classes, yet occurred too early to be reconstructed by phylogenetic means (Aravind et al., 2002a). Thus, a fundamental enigma surrounding the birth of the first enzymes is whether the Rossmann and the P-loop lineages diverged from a common ancestor, or perhaps, given that they both make use of phosphorylated ribonucleosides, have converged to similar structural and functional features. The former is the more evolutionarily appealing scenario, yet the latter is as common and tangible (Galperin and Koonin, 2012; Elias and Tawfik, 2012).
To address this longstanding question, we performed a detailed analysis looking for indications of common ancestry with respect to the core elements of these two classes, namely their most conserved and functionally critical structural elements. Indeed, global sequence homology between these lineages, or even shared short sequence motifs, cannot be detected. As such, large-scale analyses of protein homology (Alva et al., 2015), including SCOPe (Chandonia et al., 2017) and the Evolutionary Classifications of Protein Domains (ECOD) database (Cheng et al., 2014) classify P-loop NTPases and Rossmanns as independent evolutionary emergences. However, loss of detectable sequence homology would be expected between lineages that split in the distant past, especially if both have diverged extensively, as is the case for P-loops and Rossmanns. Nonetheless, structural anatomy (Laurino et al., 2016) and sophisticated ways of detecting sequence homology may assign common ancestry in highly diverged lineages on the basis of a few common sequence-structure features (Galperin and Koonin, 2012; Elias and Tawfik, 2012; Hildebrand et al., 2009; Nepomnyachiy et al., 2017). Further, parallel evolution may operate, with relics of an ancient common ancestor surfacing sporadically in contemporary proteins, thus resulting in detectable sequence and/or structural homology. Thus, if P-loops and Rossmanns do share common ancestry, we might expect the existence of ‘bridge proteins’; that is, proteins belonging to one lineage with features that are distinct for the other lineage.
Here, we report the detection and analysis of common features and bridge proteins between P-loops and Rossmanns. The existence of such common features and bridge proteins supports common ancestry, though it does not rule out convergence. Nonetheless, our results suggest what the key features of the ancestor(s) might have been, and indicate that even if these lineages emerged independently, their ancestors shared the very same features. To best frame this analysis, however, we must first dissect the canonical features of Rossmann and P-loop proteins.
Results and discussion
P-loop and Rossmann – similar but distinctly different
Both P-loops and Rossmanns adopt the αβα 3-layer sandwich fold (Figure 1). This fold, which comprises a parallel β-sheet sandwiched between two layers of α-helices, is among the most ancient, if not the most ancient, protein folds known (Ma et al., 2008; Aravind et al., 2002a; Bukhari and Caetano-Anollés, 2013; Winstanley et al., 2005). In essence, αβα sandwich proteins consist of a tandem repeat of β-loop-α elements, where the loops form the active-site (hereafter referred to as the ‘functional’ or ‘top’ loops; Figure 1A). The minimal P-loop or Rossmann domain comprises five β-loop-α elements linked via short ‘connecting’ or ‘bottom’ loops that generally have no functional role. Although many domains have six strands, and sometimes more, we will hereafter consider the minimal 5-stranded core domain for simplicity.
While the overall fold is conserved, the topology – specifically, the strand order of the interior β-sheet – differs between Rossmanns and P-loops. The Rossmann topology (β3-β2-β1-β4-β5) has a pseudo-2-fold axis of symmetry between β1-β3 and β4-β5 (Figure 1B; or β1-β3 and β4-β6 in the common 6-stranded domains). However, in the P-loop topology, at least two strands are swapped (Figure 1C). The most common P-loop topology is β2-β3-β1-β4-β5 (Leipe et al., 2003); although, as discussed below, P-loops can adopt several different strand topologies.
The second shared hallmark is that both P-loops and Rossmans bind phosphorylated ribonucleoside ligands as substrates, co-substrates or cofactors (hereafter, phospho-ligands). While the overall mode of ligand binding differs, the binding modes of their phosphate moieties share a few similarities (Figure 2A,B): (i) The phosphate is bound by the first β-loop-α element which resides in the center of the domain (hereafter, β1-(phosphate binding loop)-α1); (ii) both phosphate binding loops mediate binding via a ‘nest’ of hydrogen bonds formed by backbone amides at the N-terminus of the first canonical α-helix (α1) as well as via residues from the loop itself; and (iii) both phosphate binding loops are glycine-rich sequences with similar patterns: the canonical Rossmann motif is GxGxxG, while the canonical P-loop motif, dubbed Walker A, is GxxGxGK[T/S]. To avoid confusion, P-loop is used here to refer to the evolutionary lineage of P-loop NTPases only. When referring to the phosphate binding element of a protein, with no relation to a specific protein lineage, phosphate binding loop (or PBL) is used. Hence, P-loop PBL relates to the phosphate binding loop of P-loop NTPases (the Walker A motif), and Rossmann PBL to the Rossmann’s phosphate binding loop. The structural segment in which the phosphate binding loop resides is accordingly dubbed β1-PBL-α1 or, more simply, β-PBL-α.
However, despite similar phosphate binding elements, the mode of phospho-ligand binding by Rossmanns and P-loops is fundamentally different, and this difference relates to important functional differences between the two lineages. Although Rossmann and P-loop proteins both utilize phosphorylated nucleosides, the phosphate groups of these metabolites play a fundamentally different role. P-loops primarily catalyze phosphoryl transfer (including to water, i.e., hydrolysis) and thus most often operate on ATP and GTP with the help of a metal dication. Rossmanns, on the other hand, primarily use NAD(P), with the phosphate moieties serving only as a handle for binding, while the redox chemistry occurs elsewhere (e.g., the nicotinamide base in NAD+). These functional differences are accompanied by a number of structural differences in the mode of phosphate binding: The P-loop Walker A is a relatively long, surface-exposed loop that extends beyond the protein’s core and wraps, like the palm of a hand, around the phosphate moieties of the ligand (Figure 2A). The Rossmann PBL, however, is short and forms a flat interaction surface, with the phosphate groups interacting mostly with the N-terminus of α1 via a highly conserved and ordered water molecule (Figure 2B; Bottoms et al., 2002). Foremost, the orientation of the phospho-ligand being bound is different: The nucleoside moiety in Rossmanns is oriented ‘inside’, that is, in the direction of the β-sheet core, whereas in P-loops it points ‘outside’, i.e., away from the protein interior – an approximately 180-degree rotation compared to Rossmanns.
The above difference in orientation relates to differences in the interactions that Rossmanns and P-loops make with parts of the ribonucleotide ligands other than their phosphate moieties. In the canonical Rossmann binding site, both the phosphate moiety and the ribose moiety are bound. The phosphate interacts with the Rossmann PBL at the N-terminus of α1 while the ribose moiety is held in place by an Asp/Glu residue at the tip of β2 (Figure 2B). This acidic residue forms a unique bidentate interaction with the 2’ and 3’ hydroxyls of the ribose moiety, and was shown to be present as Asp in the earliest Rossmann ancestor (hereafter β2-Asp) (Laurino et al., 2016). In P-loops, on the other hand, the core of the αβα domain does not interact with the ribose, instead making more extensive, catalytically-oriented interactions with the phosphate moieties (via the Walker A PBL, Figure 2A, as well as other key residues). Foremost, phospho-ligand binding also involves coordination of a metal cation, mostly Mg2+, but also Ca2+, by two key conserved residues: the hydroxyl of the canonical serine or threonine of the Walker A motif (GxxGxGK[S/T]) and an Asp/Glu residing on the tip of an adjacent β-strand. This Asp/Glu residue is the crux of the so-called ‘Walker B’ motif, which is typically located at the tip of either β3 or β4 (see Shalaeva et al., 2018 for a detailed analysis).
A shared β-(phosphate binding loop)-α evolutionary seed?
Individually, any one of the shared features described above may relate to convergence. The αβα sandwich fold has likely emerged multiple times, independently (Medvedev et al., 2019). The key shared functional features – namely, a phosphate binding site at the N-terminus of an α-helix and a Gly-rich phosphate binding motif – were likely favored early in protein evolution because they effectively comprise the only mode of phosphate binding that can be realized with short and simple peptides (Longo et al., 2020). Thus, that Rossmanns and P-loops share Gly-rich loops and the same mode of phosphate biding may also be the outcome of convergence, especially because the overall mode of binding of their phospho-ligands fundamentally differs (Figure 2A,B).
Curiously, however, the phosphate binding site is located in the first β-loop-α element of both Rossmanns and P-loops. In fact, the β1-α1 location of the PBL is seen not only in P-loop and Rossmann proteins, but also in Rossmann-like protein classes such as flavodoxin and HUP. However, a closer examination reveals that, although rare, phosphate binding in αβα sandwich folds can occur at alternative locations, suggesting that there is no inherent, physical constraint on its location. An illustrative example can be found in the HUP lineage (ECOD X-group 2005) – a monophyletic group of 3-layer αβα sandwich, Rossmann-like proteins that includes Class I aminoacyl tRNA synthetases (Aravind et al., 2002b). Most families within this lineage achieve phosphate binding at the tip of α1, as do Rossmann and P-loop proteins. However, two families, the universal stress protein (Usp) family (F-group 2005.1.1.145) and electron transport flavoprotein (ETF; F-group 2005.1.1.132) both use the tip of α4 (Figure 3). Intriguingly, α4, resides on the other side of the β-sheet, just opposite to α1. Accordingly, this change in the PBL’s location results in a flip of ATP’s phosphate groups, while preserving all other features of the binding site, including the adenine’s location and the anchoring of the ribose moiety to β1 and β4 (Figure 3 mid panel). Thus, from a purely biophysical point of view, α1 and α4 are equivalent locations for phosphate binding. Nonetheless, the ancestral phosphate binding site in both P-loops and Rossmanns resides at the tip of α1 (as judged by α4 being a rare exception). This suggests that the positioning of the PBL at the tip of α1 in both Rossmann and P-loop proteins is a signal of shared ancestry rather than convergence. As a minimum, the identification of α4 as a feasible alternative supports a model of emergence of both lineages from a seed β−PBL−αβ fragment, as outlined further below. By this scenario, α4 only emerged at a later stage, well after phosphate binding had been established at α1.
A shared β2-Asp motif?
As outlined above, the β1-PBL-α1 segment of P-loops and Rossmanns likely represents a primordial polypeptide that could later be extended to give the modern αβα sandwich domains (Alva et al., 2015; Zheng et al., 2016). However, there are several indications that the ancestral, seeding peptide(s) of both P-loops and Rossmanns also contained β2 (Alva et al., 2015). In the case of the Rossmann, β2 of the seeding primordial peptide plays a functional role: an Asp at its tip forms a bidentate interaction with the hydroxyls of the nucleotide’s ribose (Figure 2B; Laurino et al., 2016). The putative Rossmann common ancestor thus comprises a β-PBL-α-β-Asp fragment. Might such a fragment also be the P-loop ancestor?
In fact, both families make use of an Asp residue at the tip of the β-strand just next to β1 – in P-loops this residue is the above-mentioned Walker B motif (Figure 2A). Is this feature also a sought-after signature of shared ancestry? In the simplest P-loop topology, the Walker B-Asp resides at the tip of the β-strand that is adjacent to β1, as it is in Rossmanns. Thus, putting aside the connectivity of the strands, both P-loop and Rossmann possess a functional core of two adjacent strands, one from which the PBL extends and the other with an Asp at its tip (Figure 2A,B). However, because in P-loops the strand topology is swapped, the Walker B-Asp resides at the tip of β3 in the primary sequence (in the simplest topology described in Figure 1C, and at the tip of β4 in another common topology as detailed below). As elaborated later, variations in the topology of P-loops support a model by which additional β-loop-α elements got inserted into the ancestral β-PBL-α-β-Asp seed fragment such that what was initially β2 became β3 (or even β4 in other P-loop families).
However, even if we put the question of topology aside for the moment, there remains the fundamental functional difference between the P-loop Walker B-Asp (binding of a catalytic dication) and the Rossmann β2-Asp (ribose binding; Figure 2A and B, respectively). Can this difference be reconciled? This question might be answered by identifying cases of parallel evolution, or ‘bridging proteins’. Specifically, we searched for examples of Rossmanns acting as NTPases, and examined whether they use a catalytic metal and, if so, whether this metal cation is bound by the β2-Asp.
Tubulin – a parallelly evolved Rossmann NTPase
As explained above, Rossmanns typically use the ligand’s phosphate moiety as a binding handle, whereas P-loops perform chemistry on the ligands’ phosphate groups. Thus, to discover bridging proteins, we looked at the minority of Rossmann families that do act as NTPases. In all but one of these, the NTP is bound in the canonical Rossmann mode, namely with the NTP’s ribose moiety bound to the β2-Asp (Figure 2—figure supplement 1). However, one family, tubulin, is an outlier. Tubulin is a GTPase first discovered in eukaryotes. With time, bacterial and archaeal tubulins were discovered, indicating that this lineage originated in the LUCA (Yutin and Koonin, 2012; Margolin et al., 1996). Tubulin has undisputable hallmarks of a Rossmann (Nogales et al., 1998a), as noted originally (Nogales et al., 1998b), and is categorized as such (ECOD family: 2003.1.6.1). The strand topology is distinctly Rossmann (3-2-1-4-5), with a phosphate binding loop located between β1 and α1. We further note that binding of GTP’s phosphate groups is mediated by a water molecule bound to the N-terminus of α1 (Figure 2C), as in canonical Rossmanns (Bottoms et al., 2002; Figure 2B) and in contrast to P-loops. However, as noticed by those who solved the first tubulin structures, GTP is oriented differently compared to the nucleotide cofactors bound by other Rossmanns (Nogales et al., 1998a). Our examination reveals that tubulin binds GTP in the P-loop NTPase mode – namely, with the nucleoside pointing away from the domain’s core (Figure 2C). Indeed, tubulin’s phosphate binding loop is truncated relative to other Rossmanns and adopts a conformation akin to a tight hairpin (Figure 2, Figure 2—figure supplement 2A). In fact, tubulin has a second phosphate binding loop that resides at the tip of α4 and has a critical role in catalysis, indicating that α4 can readily take the role of phosphate binding as seen in the HUP families described above (Figure 3).
Foremost, the β2-Asp interaction with the ribose, a hallmark of Rossmanns, is absent in tubulin (Figure 2C). Rather, the canonical Asp at the tip of β2 ligates a catalytic dication (Figure 2C). Further, tubulin’s binding of the dication adopts a P-loop-like octahedral geometry (Kanade et al., 2020), with both the β2-Asp of tubulin and the Walker B-Asp of P-loops interacting with water molecules that occupy equivalent coordination sites (Figure 2—figure supplement 3). The β2-Asp is essential for tubulin’s catalytic activity (Farr and Sternlicht, 1992), and its Walker B-like mode of action is seen across multiple tubulin structures (in many tubulins Asn is seen at the β2 tip position, though this β2-Asn also ligates the dication, either directly or via a water molecule [Figure 2—figure supplement 2B; Supplementary file 1]).
Tubulin therefore comprises an intriguing case of a Rossmann that evolved an NTPase function by reorienting the NTP substrate to bind in the P-loop NTPases mode and repurposing the canonical Rossmann β2-Asp to ligate a catalytic metal cation. Put differently, tubulin shows that the functional differences between the P-loop Walker B and the Rossmann β2-Asp can be reconciled.
Shared themes between Rossmanns and P-loops
Encouraged by tubulin, we endeavored to look for additional evidence for bridging proteins, ideally with respect to not only structure but also sequence. To this end, we employed the concept of a bridging theme – short stretches for which alignments are statistically significant (≥20 residues; HHSearch E-value <10−3) yet with the flanking regions showing no detectable sequence homology (Kolodny et al., 2020). In the context of this work, we specifically searched for shared themes in structures that belong to Rossmanns (X-Group 2003) on the one hand and P-loops (X-group 2004) on the other. By focusing the sequence homology search on evolutionarily-distinct domains, and by using bait sequences derived from validated sequence themes (Nepomnyachiy et al., 2017), the sensitivity and accuracy of this approach exceeds that of standard HMM-based searches (further details about themes detected between other X-groups are described in a forthcoming manuscript Kolodny et al., 2020). Given a stringent statistical threshold, only a few shared themes were detected, all involving the P-loop enzyme HPr kinase/phosphatase (F-Group 2004.1.2.1; PDB: 1ko7; Figure 4A). A few different Rossmann F-groups share a theme with this P-loop, with sorbitol dehydrogenase (F-Group 2003.1.1.417; PDB: 1k2w) and short chain dehydrogenase (F-Group 2003.1.1.332; PDB: 3tjr) showing the greatest overlap (Figure 4).
HPr kinase/phosphatase is a bifunctional bacterial enzyme that catalyzes the phosphorylation and dephosphorylation of a signaling protein (HPr; Márquez et al., 2002). The P-loop domain comprises its C-terminal domain and carries the kinase function (hereafter Hpr kinase). Remarkably, the Walker B-Asp of Hpr kinase resides at the tip of β2, rather than at β3 or β4 as in the canonical P-loops. Consequently, although no such constraint or steering was applied to the search algorithm, the detected shared theme encompasses an intact β1-PBL-α1-β2-Asp element in the Rossmann proteins (where this element is canonical) as well as in this unique P-loop family (Figure 4A). As expected, this element is conserved in both the P-loop Hpr kinase and in the Rossmann families, with the Gly residues of the phosphate binding motifs, and the β2-Asp’s being almost entirely conserved (Figure 4B). This result underscores the significance of the β1-PBL-α1-β2-Asp motif as the shared evolutionary seed of both Rossmanns and P-loops (detailed in the next section).
Consistent with the idea of parallel evolution, these bridging P-loop and Rossmann proteins seem to be at the fringes of their respective lineages. In the case of HPr kinase, the active site is characterized by a canonical Walker A motif but the Walker B-Asp is uncharacteristically situated at the tip of β2 (Figure 4D). Further, in P-loop families with the simplest topology, the Walker B-Asp resides at the tip of a β-strand that structurally resides next to β1 (β3, Figure 2A). However, in most P-loops, another strand, typically β4, is inserted between β1 and the strand carrying the Walker B-Asp (Figure 4—figure supplement 1; Leipe et al., 2003). Hpr kinase belongs to this second category; however, its intervening strand is highly unusual – an anti-parallel β-strand inserted between β1 and β2. In the primary sequence, the intervening strand is an N-terminal extension, and thus upstream of β1 (Figure 4C; annotated as β−1). Indeed, HPr kinase is classified as an outlier with respect to the greater space of P-loop proteins. The P-loop X-group in ECOD (X-group 2004) is split into two topology groups (T-groups): P-loop containing nucleoside triphosphate hydrolases, which includes 196 F-groups that represent the abundant, canonical P-loop proteins, and PEP carboxykinase catalytic C-terminal domain, which is comprised of just three F-groups. HPr kinase is classified as part of the latter. As discussed below, the variation in topology of HPr also highlights the structural plasticity of the P-loop fold with respect to insertions.
The sorbitol dehydrogenase and short chain dehydrogenase both have the canonical Rossmann strand topology and β2-Asp. Homology modeling of the enzyme-NAD+ complex, and inspection of closely related structures, suggest that binding of the NAD+ cofactor is also canonical (Figure 4D). However, the PBL of these three Rossmann proteins is nonstandard (GxxxGxG instead of the canonical Rossmann which is GxGxxG; Figure 4A). Further, although the structural positioning of the last two glycine residues is rather similar to that of canonical Rossmann proteins, the extended GxxxGxG motif results in an extended PBL with higher resemblance to the P-loop (Figure 4F). Indeed, a sequence alignment reveals that Hpr’s P-loop (which is canonical) and these nonstandard Rossmann PBLs are only a few mutations away from each other (Figure 4A and F, overlay in right panel).
An ancestral βαβ seed of both Rossmanns and P-loops
The above findings support the notion of a common Rossmann/P-loop ancestor, the minimal structure of which is βαβ. This ancestral polypeptide includes just two functional motifs: a phosphate binding loop and an Asp, which could play a dual role of either binding the ribose moiety of various nucleotides or of ligating a dication such as Mg2+ or Ca2+. Previously, such a polypeptide (i.e., β-PBL-α-β-Asp) has been proposed as the seed from which Rossmann enzymes emerged (Alva et al., 2015; Laurino et al., 2016 and references therein). In contrast, a βα element was assigned as the P-loop ancestral seed (i.e., a β-P-loop-α segment; Alva et al., 2015; Laurino et al., 2016., and references therein). Here, we argue that ancestral polypeptide(s) comprising a βαβ element gave rise to both lineages, and possibly that a single polypeptide served as a common ancestor of both lineages.
From an ancestral seed to intact domains
We further hypothesize that the above seed fragment was subsequently expanded by addition of β-loop-α and/or α-loop-β elements. Such expansion would have enabled a functional split, or sub-specialization, of the two separate lineages, Rossmann and P-loop, which then further evolved and massively diversified. In essence, this split regards two key elements – phospho-ligand binding and β-strand topology. Our analysis indicates the feasibility of both.
The plausibility of common descent of the P-loop Walker A and the Rossmann phosphate binding loops is indicated by the detected shared theme described above (Figure 4). Although the canonical motifs of both lineages differ, there still exists – particularly among Rossmann proteins – alternative motifs that could diverge via a few mutations to a Walker A P-loop. Other Rossmanns possess a GxGGxG motif that also represents a potential jumping board to a Walker A P-loop (Zheng et al., 2016). Given that it mediates binding rather than catalysis, and owing to its lower conservation, we speculate that a Rossmann PBL could be replaced by a Walker A-like P-loop. However, at present, how permissive the Rossmann PBL is to sequence changes that will render it Walker A-like is unclear. Future experimental work, possibly using the Rossmann enzymes indicated here (Figure 4), might lend support to the hypothesis that these two PBLs are indeed evolutionarily related. The identified shared themes also indicate that the β2-Asp of the presumed ancestral fragment could not only bind the ribose moiety as in Rossmanns, but also serve as a Walker B, as in P-loops. Tubulin lends further support, indicating that a β2-Asp can indeed play a dual role.
Expansion of the ancestral βαβ fragment would enable not only the sub-functionalization of the two functional elements described above, but also the fixation of two separate β-strand topologies – the sequential Rossmann topology versus the swapped P-loop one. Both folds are, in essence, a tandem repeat of β-loop-α elements (Figure 1). The evolutionary history of other repeat folds tells us that expansion typically occurs by duplication of the ancestral fragment (Eck and Dayhoff, 1966; Romero Romero et al., 2018; Zhu et al., 2016; Longo, 2020), or parts of it, but also by fusion of independently emerging fragments (Grishin, 2001; Setiyaputra et al., 2011). Regardless of the origin of the extending fragment(s), given a βαβ ancestor, similar processes could have given rise to both of these folds. As summarized in Figure 5, in both P-loops and Rossmanns, the first newly added β-strand would pack against the ancestral β-strand bearing the Asp residue – irrespective of whether the incoming β-strand was the result of a sequence insertion or a C-terminal extension (Step 2). In the case of a C-terminal extension by an αβ fragment, the strand bearing the β2-Asp is unperturbed and the strand topology is Rossmann-like; conversely, a sequence insertion of a βα element between the two ancestral β-strands results in a P-loop topology, with the β-strand that carries the Walker B-Asp becoming β3. In the next extension, the newly added 4th strand packs against the other side of growing domain, next to β1 (Step 3; β4 added with its preceding helix, α3). Subsequent strand(s), β5 and onward, could in principle be added sequentially, one next to the other, to yield the intact domains as we know them today. Indeed, evidence that extensions at the edges of the core β-sheet can happen in both folds is provided by the existence of Rossmann and P-loop proteins with either 5 or six strands. Further, circular permutations are common in Rossmanns, as are non-canonical additions of a 7th strand, including anti-parallel strands, at either end of the β-sheet (Grishin, 2001).
Upon establishment of the core domain, transitions in the topology of β-sheets that result from strand swaps, or strand invasions, as has been documented (Grishin, 2001), were likely key drivers of divergence and sub-functionalization. In particular, the P-loop lineage seems to have undergone various strand swaps and insertions that gave rise to a variety of topologies (Leipe et al., 2003), including the noncanonical one, with an antiparallel strand, seen in Hpr kinases (Figure 4C). Indeed, a survey of P-loop F-groups reveals multiple strand topologies (Figure 4—figure supplement 1 and Leipe et al., 2003). In general, families that catalyze phosphoryl transfer, namely kinases such as thymidylate kinase (F-group 2004.1.1.166), but also GTPases such as elongation factor Tu (F-group 2004.1.1.258), tend to have the simplest 2-3-1-4-5 topology illustrated in Figure 1C (see Leipe et al., 2003). In these proteins, the Walker A P-loop and the Walker B-Asp reside on adjacent strands, with the Walker B motif on the tip of β3 (as illustrated in Figure 2A). On the other hand, ‘motor proteins’, in which ATPase activity drives a large conformational change that turns into some further action, such as helicases or the ATP cassette of ABC transporters, tend to have a strand inserted between β1 and β3, to yield a 2-3-4-1-5 topology. Here, the Walker B-Asp is also situated on the tip of β3, yet with an intervening strand (β4) between the Walker A and Walker B motifs. The split between these topologies is ancient, likely predating the LUCA (Leipe et al., 2003), and supports the hypothesis that early events of fusions as well as insertions were associated with the functional radiation of the P-loop lineage.
In contrast to the P-loop's variable topologies, the pseudo-symmetrical Rossmann topology seems highly conserved (in a previous analysis of the Rossmann fold, we did not detect a single structure annotated as Rossmann with a swapped strand topology Laurino et al., 2016). Further, the very same topology appears in other domains, so-called Rossmann-like, or Rossmannoid domains (foremost, flavodoxin, 2-1-3-4-5, and HUP, 3-2-1-4-5). The latter two also represent ancient, pre-LUCA phospho-ligand binding domains that likely evolved independently of the Rossmann (Medvedev et al., 2019) and converged to the same topology. That convergence to the Rossmann topology occurred frequently may relate to the higher thermodynamic and/or kinetic stability of the symmetric strand topology. Indeed, the design of Rossmann-like proteins is readily realized compared to P-loops-like proteins with the swapped strand topology (Romero Romero et al., 2018). Furthermore, systematic assays of refoldability of the E. coli proteome, and a comparison of the folds to which these proteins belong, indicated that Rossmann is among the most refoldable folds while P-loop NTPases are among the poorly refolding ones (To et al., 2020).
Concluding remarks
Protein evolution spans nearly 4 billion years, with the founding events occurring pre-LUCA. As such, for many protein families, definitive assignment of homologous versus analogous relationships (shared ancestry versus convergent evolution) may never be possible (Aravind et al., 2002a). Confounding matters further, early constraints on protein sequence and structure have further limited the number of possible solutions to a subset of structures and binding motifs (Longo et al., 2020), making convergence a more likely scenario, particularly in the most ancient proteins. Thus, although discovered several decades ago, whether the P-Loop NTPase and Rossmann lineages diverged or converged remains an open question. The availability of thousands of structures, highly curated databases that catalogue them (Chandonia et al., 2017; Cheng et al., 2014), and sensitive search methods (Hancock et al., 2004) and algorithms (Nepomnyachiy et al., 2014) allows this question to be reexamined. Here, evidence in favor of common ancestry between these lineages is provided, though convergence cannot and should not be entirely ruled out. Whether it was convergence or divergence, our analysis suggests that both lineages emerged from a polypeptide comprising a β-PBL-α-β-Asp fragment. Such a polypeptide was likely the ancestor of both P-loops and Rossmanns – be it the same polypeptide, or two (or more) independently emerged ones. Reconstruction of ancestral polypeptides (Longo, 2020), including 40 residue polypeptides that relate to the P-loop NTPase ancestor (Vyas, 2020), may allow us to further examine the common versus independent emergence scenarios.
Materials and methods
The functional diversity of the P-loop and Rossmann lineages
Request a detailed protocolIn total, three X-groups comprising 663 ECOD F-groups were analyzed (Supplementary file 2): P-loop-like (X-group 2004; 157 F-groups), Rossmann-like (X-group 2003; 168 F-groups) and Rossmann-like structures with the crossover (X-group 2111; 338 F-groups). For this analysis, ECOD version develop210 was used and X-groups 2003 and 2111 were merged. The sequences of each F-group (70% identity cutoff) were mapped to a SUPERFAMILY (Wilson et al., 2009) entry with HMMsearch (Hancock et al., 2004) using the HMM profiles provided by the SUPERFAMILY database. The SUPERFAMILY EC2Domain mapping file was used to collect the Enzyme Commission (EC) classes associated with each family. In total, we identified 75 EC classes associated with P-loops (X-group 2004) and 727 with Rossmanns (X-groups 2003 and 2111). Within all three X-groups, the majority of families exhibit transferase activity (2.-.-.-). Within the Rossmann-like X-group, oxidoreductases (1.-.-.-) are also common. For both P-loop and Rossmann-like structures with the crossover, the second most common enzyme activity is hydrolase (3.-.-.-), while for Rossmann-like families, hydrolase activity is the least common.
Identification of shared themes between P-loops and Rossmanns
Request a detailed protocolWe used HHSearch (version 3.0.0) (Hildebrand et al., 2009) to compare a set of previously curated themes (Nepomnyachiy et al., 2017) to a 70% non-redundant set of ECOD domains (version develop210). Using an E-value threshold of 10−3, a coverage threshold of 85% (for the local alignment), and a minimal length of 20 residues, we identified 267 themes with significant hits to proteins belonging to both ECOD X-groups 2004.1 (P-loop domains-like) and 2003.1 (Rossmann-like). All of these themes matched the same P-loop domain (e1ko7A1) with various Rossmann domains. To reduce the extensive redundancy among the themes, which in turn leads to redundancy in the detected proteins, we kept only two representatives Rossmanns per theme. To identify the representative domains, we re-aligned the parts matching the theme using a Smith-Waterman (SW) or Needleman-Wunsch (NW) alignment, and the parts before and after the theme using an SW alignment. The representative domains are the ones with the most similar matching parts and the most dissimilar flanking parts. A p-value for the aligned parts was calculated from the significance of the alignment score relative to scores from alignments of random segments. Here, we estimated the parameters of the extreme value distribution (EVD) from the scores of the alignments between one of the two well-aligned segments and 1000 randomly chosen segments drawn from a multinomial distribution estimated from the other of the two well-aligned segments. We kept only cases where the matching parts have a score with a p-value lower than 0.05. This procedure resulted in a set of 57 Rossmann domains, each aligned to the Hpr kinase (PDB: 1ko7). For 50 of these 57 hits, the matching parts were aligned with the SW local alignment and the rest were aligned by NW global alignment. Here, we report the 51 cases that match the β1-α1-β2 regions (26 from the ECOD family 2003.1.1.417, 18 from 2003.1.1.332, 5 from 2003.1.1.410, and 2 from 2003.1.1.11; Supplementary file 3) The alignments presented in the manuscript are the global alignments recalculated for the β-PBL-α-β elements in themes that bridge the two evolutionary lineages.
Modeling ligand placements in unliganded structures
Request a detailed protocolHrP kinase (e1ko7A1) does not have a ligand bound. The conformation of the PBL, however, is canonical. Thus, despite no structure from the same F-group having a relevant ligand bound, the overall positioning of the ligand can be estimated by overlaying a canonical PBL with a bound ligand from a different P-loop F-group. To generate Figure 4C, the PBL and ligand from ECOD domain e6at2A2, corresponding to residues 247–257, was aligned to residues 150–160 in chain A of 1ko7. This structure was chosen because it is in the same T-group as HrP kinase (2004.1.2.-) and, although the two domains have nearly undetectable sequence identity, they share the same general topology, including the inserted anti-parallel β-strand adjacent to β1. The sequences of the PBLs (FGLSGTGKTTL and VGPNGSGKSTV for 6at2 and 1ko7, respectively; identical residues underlined) show high similarity as do the structures of the PBLs that were aligned to generate the modeled ligand (Cα RMSD of 0.49 Å).
Calculating consensus sequences and residue conservation scores
Request a detailed protocolThe relevant ECOD F-groups (Figure 4) were mapped to the corresponding Pfam families. Since 2003.1.1.417 and 2003.1.1.332 are associated with one Pfam family, they were analyzed jointly. Seed alignments were extracted from Pfam, clustered at 70% sequence identity using CD-HIT (Fu et al., 2012), and the consensus sequence and conservation scores were calculated for the shared region (theme) using JalView.
Data availability
All data analysed in this manuscript are from publicly available archives such as the Protein Databank.
References
-
Trends in protein evolution inferred from sequence and structure analysisCurrent Opinion in Structural Biology 12:392–399.https://doi.org/10.1016/S0959-440X(02)00334-2
-
Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, Photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNAProteins: Structure, Function, and Genetics 48:1–14.https://doi.org/10.1002/prot.10064
-
SCOPe: manual curation and artifact removal in the structural classification of proteins - extended databaseJournal of Molecular Biology 429:348–355.https://doi.org/10.1016/j.jmb.2016.11.023
-
ECOD: an evolutionary classification of protein domainsPLOS Computational Biology 10:e1003926.https://doi.org/10.1371/journal.pcbi.1003926
-
Exploring fold space preferences of new-born and ancient protein superfamiliesPLOS Computational Biology 9:e1003325.https://doi.org/10.1371/journal.pcbi.1003325
-
Divergence and convergence in enzyme evolution: parallel evolution of paraoxonases from Quorum-quenching lactonasesJournal of Biological Chemistry 287:11–20.https://doi.org/10.1074/jbc.R111.257329
-
Site-directed mutagenesis of the GTP-binding domain of beta-tubulinJournal of Molecular Biology 227:307–321.https://doi.org/10.1016/0022-2836(92)90700-T
-
Divergence and convergence in enzyme evolutionJournal of Biological Chemistry 287:21–28.https://doi.org/10.1074/jbc.R111.241976
-
Fold change in evolution of protein structuresJournal of Structural Biology 134:167–185.https://doi.org/10.1006/jsbi.2001.4335
-
Fast and accurate automatic structure prediction with HHpredProteins: Structure, Function, and Bioinformatics 77:128–132.https://doi.org/10.1002/prot.22499
-
A distinct motif in a prokaryotic small Ras-Like GTPase highlights unifying features of walker B motifs in P-Loop NTPasesJournal of Molecular Biology 432:5544–5564.https://doi.org/10.1016/j.jmb.2020.07.024
-
Evolution and classification of P-loop kinases and related proteinsJournal of Molecular Biology 333:781–815.https://doi.org/10.1016/j.jmb.2003.08.040
-
Characters of very ancient proteinsBiochemical and Biophysical Research Communications 366:607–611.https://doi.org/10.1016/j.bbrc.2007.12.014
-
Functional analysis of Rossmann-like domains reveals convergent evolution of topology and reaction pathwaysPLOS Computational Biology 15:e1007569.https://doi.org/10.1371/journal.pcbi.1007569
-
Tubulin and FtsZ form a distinct family of GTPasesNature Structural Biology 5:451–458.https://doi.org/10.1038/nsb0698-451
-
Simple yet functional phosphate-loop proteinsPNAS 115:E11943–E11950.https://doi.org/10.1073/pnas.1812400115
-
SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogenyNucleic Acids Research 37:D380–D386.https://doi.org/10.1093/nar/gkn762
Article and author information
Author details
Funding
Volkswagen Foundation (94747)
- Dan S Tawfik
- Rachel Kolodny
- Nir Ben-Tal
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This research has been supported by Grant 94747 by the Volkswagen Foundation. NB-T’s research is supported in part by the Abraham E Kazan Chair in Structural Biology, Tel Aviv University. DST is the Nella and Leon Benoziyo Professor of Biochemistry. We are grateful to Ita Gruic-Sovulj for her role in the analysis of the HUP domain that led to Figure 3, and to Andrei Lupas for insightful and critical comments.
Copyright
© 2020, Longo et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 4,786
- views
-
- 459
- downloads
-
- 73
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Evolutionary Biology
- Genetics and Genomics
It is well established that several Homo sapiens populations experienced admixture with extinct human species during their evolutionary history. Sometimes, such a gene flow could have played a role in modulating their capability to cope with a variety of selective pressures, thus resulting in archaic adaptive introgression events. A paradigmatic example of this evolutionary mechanism is offered by the EPAS1 gene, whose most frequent haplotype in Himalayan highlanders was proved to reduce their susceptibility to chronic mountain sickness and to be introduced in the gene pool of their ancestors by admixture with Denisovans. In this study, we aimed at further expanding the investigation of the impact of archaic introgression on more complex adaptive responses to hypobaric hypoxia evolved by populations of Tibetan/Sherpa ancestry, which have been plausibly mediated by soft selective sweeps and/or polygenic adaptations rather than by hard selective sweeps. For this purpose, we used a combination of composite-likelihood and gene network-based methods to detect adaptive loci in introgressed chromosomal segments from Tibetan WGS data and to shortlist those presenting Denisovan-like derived alleles that participate to the same functional pathways and are absent in populations of African ancestry, which are supposed to do not have experienced Denisovan admixture. According to this approach, we identified multiple genes putatively involved in archaic introgression events and that, especially as regards TBC1D1, RASGRF2, PRKAG2, and KRAS, have plausibly contributed to shape the adaptive modulation of angiogenesis and of certain cardiovascular traits in high-altitude Himalayan peoples. These findings provided unprecedented evidence about the complexity of the adaptive phenotype evolved by these human groups to cope with challenges imposed by hypobaric hypoxia, offering new insights into the tangled interplay of genetic determinants that mediates the physiological adjustments crucial for human adaptation to the high-altitude environment.
-
- Evolutionary Biology
Signs of ageing become apparent only late in life, after organismal development is finalized. Ageing, most notably, decreases an individual’s fitness. As such, it is most commonly perceived as a non-adaptive force of evolution and considered a by-product of natural selection. Building upon the evolutionarily conserved age-related Smurf phenotype, we propose a simple mathematical life-history trait model in which an organism is characterized by two core abilities: reproduction and homeostasis. Through the simulation of this model, we observe (1) the convergence of fertility’s end with the onset of senescence, (2) the relative success of ageing populations, as compared to non-ageing populations, and (3) the enhanced evolvability (i.e. the generation of genetic variability) of ageing populations. In addition, we formally demonstrate the mathematical convergence observed in (1). We thus theorize that mechanisms that link the timing of fertility and ageing have been selected and fixed over evolutionary history, which, in turn, explains why ageing populations are more evolvable and therefore more successful. Broadly speaking, our work suggests that ageing is an adaptive force of evolution.