Abstract
Wnt proteins are critical signaling molecules in developmental processes across animals. Despite intense study, their evolutionary roots have remained enigmatic. Using sensitive sequence analysis and structure modeling, we establish that the Wnts are part of a vast assemblage of domains, the Lipocone superfamily, defined here for the first time. It includes previously studied enzymatic domains like the phosphatidylserine synthases (PTDSS1/2) and the TelC toxin domain from Streptococcus intermedius, the enigmatic VanZ proteins, the animal Serum Amyloid A (SAA) and a further host of uncharacterized proteins in a total of 30 families. Although the metazoan Wnts are catalytically inactive, we present evidence for a conserved active site across this superfamily, versions of which are consistently predicted to operate on head groups of either phospholipids or polyisoprenoid lipids, catalyzing transesterification and phosphate-containing head group cleavage reactions. We argue that this superfamily originated as membrane proteins, with one branch (including Wnt and SAA) evolving into diffusible versions. By comprehensively analyzing contextual information networks derived from comparative genomics, we establish that they act in varied functional contexts, including regulation of membrane lipid composition, extracellular polysaccharide biosynthesis, and biogenesis of bacterial outer-membrane components, like lipopolysaccharides. On multiple occasions, members of this superfamily, including the bacterial progenitors of Wnt and SAA, have been recruited as effectors in biological conflicts spanning inter-organismal interactions and anti-viral immunity in both prokaryotes and eukaryotes. These findings establish a unifying theme in lipid biochemistry, explain the origins of Wnt signaling and provide new leads regarding immunity across the tree of life.
Graphical abstract

Introduction
The canonical Wnt signaling network is central to developmental decisions across animals relating to axis patterning, cell fate, cell migration and proliferation, and systems morphogenesis at many levels (1–7). Other crucial pathways, dubbed non-canonical Wnt signaling pathways, include those that regulate planar cell polarity and intracellular calcium levels (8–10). With these roles in development and homeostasis, dysfunction of Wnt signaling is causally associated with a range of diseases, including diverse cancer types and type II diabetes (11,12). Wnt signaling networks are centered on the secreted Wnt proteins acting as both paracrine and autocrine diffusible, extracellular messenger molecules (13). Wnt proteins are ligands for the N-terminal, cysteine-rich CBD/Fz domains of the Frizzled class of G-protein coupled receptors (GPCRs) (14,15). Modifications of the Wnt proteins via palmitoleoylation and glycosylation at internal sites are associated with their secretion (16). Palmitoleoylation of Wnt occurs at a conserved serine residue and is also required for recognition by the Frizzled receptors (17). Binding of the Frizzled receptor by Wnt recruits the Disheveled (Dsh) protein to its cytoplasmic face, in turn triggering a bevy of downstream responses, resulting in β-catenin stabilization in canonical pathways (18,19). When β-catenin concentrations cross a threshold, it is translocated into the nucleus, where it acts as a transcriptional coactivator, usually with an HMG domain transcription factor, to stimulate multiple transcriptional programs (20–22).
Despite its initial discovery over 40 years ago, the evolutionary origins of the Wnt protein have, until recently, been mysterious (23). In 2020, our group reported the discovery of the first prokaryotic versions of the Wnt domain (24). Using comparative genomics, we showed that these bacterial Wnt domains present contexts characteristic of toxins or effectors in biological conflict systems (24,25). Prompted by these initial observations, we set out to comprehensively identify and computationally characterize the evolutionary relationships of these newly identified Wnt homologs in an effort to understand their evolutionary history and predict their functions.
Consequently, we were able to unify the Wnt family with several other domains into a large superfamily described for the first time herein. These include two biochemically characterized families that were hitherto not known as Wnt homologs: the phosphatidylserine synthase (PTDSS1/2, EC: 2.7.8.29) (26–28) and the toxin domain of TelC from Streptococcus intermedius (29). However, the majority of the families we unify are either reported for the first time or are functionally poorly understood, including the animal Serum Amyloid A (SAA) (30) and the vancomycin resistance protein VanZ families (31,32). Our comparative genomics analyses, paired with existing experimental evidence, suggest that the superfamily is broadly comprised of enzymes operating on lipid head groups (e.g., transesterification reactions) in a diversity of biochemical contexts, notably including the regulation of membrane composition, extracellular biopolymer metabolism and as effectors in biological conflicts. Thus, we identify a unifying theme across diverse aspects of lipid metabolism.
Results
Identification of the structural core of the Wnt domain
Although the structure of Wnt was described over a decade prior (33–35), its origins have been a mystery as it is phyletically restricted to Metazoa. Much attention has been focused on the three extended β-hairpins and a poorly structured loop extruding out of the core, their stabilizing cysteine residues, and the absolutely conserved serine residue, the site of palmitoleoylation (34,36) (Figure 1A). Our discovery of the first prokaryotic Wnt domains helped define its ancestral α-helical core, revealing the cysteine-rich extensions as Metazoa-specific insertions. Comparison of the core of the metazoan Wnt with AlphaFold structural models of the prokaryotic versions (24) revealed a shared globular domain composed of five α-helices (Figure 1A). As the prokaryotic homologs retained just the conserved core of the Wnt proteins, we named these the minimal Wnt (Min-Wnt) family. The core helices of the Min-Wnt family contained absolutely conserved sequence motifs (Figure 2), consistent with the enzymatic function we had earlier proposed for them (24) (see below).

(A) The four individual helices forming the core of the Lipocone superfamily are consistently colored across the illustrated representatives. The inter-helix linkers are colored gray, and lineage-specific synapomorphic insertions and extensions are colored light brown. Active site and other residues of interest are rendered as ball-and-stick. Protein Data Bank (PDB) IDs or Genbank accessions used to generate AF3 models are provided. (B) Relationship network of the Lipocone families. The thickness of the edges is scaled by the negative-log HHalign p-values. Families are colored according to the community identified by the Leiden algorithm (37) (see Methods). (C) Box plots displaying core helix transmembrane propensity scores of individual sequences within different Lipocone families. The horizontal divider represents the boundary between typical TM and soluble sequences.

Sequence logo of conserved core elements of the Lipocone families.
These correspond to the core helices H2, H3, and H4. The three conserved active site residue positions are boxed in dotted lines with the inferred ancestral residue indicated at the top of the alignment. Families are grouped and labeled on the left in their higher-order clades.
Having defined this shared core, we initiated sequence-based homology searches in an effort to identify remote homologs. Iterative position-specific sequence matrix (PSSM)-based searches (see Methods) initially recovered animal and bacterial versions of the Serum Amyloid A (SAA) proteins, and further rounds of searching initiated from this set of sequences further recovered a vast collection of additional homologous families. As an example, a search initiated with a bacterial SAA-like sequence from Bdellovibrio bacteriovorus (Genbank acc: AHZ84906.1) retrieved a sequence overlapping with the Pfam models for “Domain of unknown function”, DUF2279 (acc: WP_146898260.1, iteration: 5, e-value: 0.004), DUF4056 domain (acc: MBW8016507.1, iteration: 5, e-value: 0.005), and sequences automatically annotated as “YfiM” in the GenBank database (acc: WP_019077413.1, iteration: 4, e-value: 0.004). Sequence profile-profile searches with HHpred confirmed these relationships and captured more distant ones. For instance, a HHpred search initiated with the Bacteriovorax stolpii Min-Wnt domain (acc: WP_102242990.1, residues 1-109) recovered the Pfam Wnt profile (PF: PF00110.23, p-value: 1.5e-6) and the Pfam SAA profile (PF: PF00277.22, p-value: 3.7e-5). Similarly a HHpred search initiated with a Gemmatimonadetes sequence (acc: PYP94660.1, residues 75..170) recovered the DUF2279 Pfam profile (PF: PF10043.12, p-value: 5.4E-21), the DUF2238 profile (PF: PF09997.12, p-value: 1.3E-07), and the DUF4056 Pfam profile (PF: PF13265.9, p-value: 2.5E-05), among others.
Exhaustion of these searches, followed by clustering and manual inspection of the multiple sequence alignments of the retrieved sequences (see Methods), revealed a shared four-helix core across all of them, hereinafter referred to as H1 through H4 (Figure 1A). This 4-helix core of the domain was further confirmed by inspection of AlphaFold structural models constructed for representatives of the individual families, along with the rare instances of experimentally determined structures. These comparisons established that the above-mentioned fifth C-terminal helix in the Wnt core is a synapomorphy (shared derived character) restricted to the Wnts and closely related families like SAA (Figure 1A). In all, the results of our clustering analysis tallied 30 distinct families constituting a large superfamily. Remarkably, of these, 17 families had no pre-existing annotations. Phyletic analysis of individual families revealed a range of distributions, ranging from broad conservation in multiple superkingdoms of Life to those restricted to a small number of lineages (see below, Figure S1). A relationship network for the superfamily was constructed based on p-value and e-value scores using alignments of each family as a query in HHalign profile-profile searches against the rest (see Methods, Figure 1B). The Leiden community detection algorithm (37) was then applied to this network to identify higher-order assemblages (see Methods). These groupings were also supported by structural synapomorphies, such as a circular permutation and versions with a two-stranded ‘handle’ (see below).
The four helices conserved across the superfamily constitute a cone-like structure (Figure 1A), with the helices tending to coalesce on one end and opening out into a pocket on the other, lined by the conserved sequence positions (Figures 1A, S2). The core is also marked by a linker between H1 and H2, which adopts characteristic extended conformations in certain families and higher-order groups. While the linkers joining H2 and H3 and H3 and H4 tend to be more constrained, there are some exceptions; for example, the extended loop insert housing the palmitoleoylated serine residue between H2 and H3 in the metazoan Wnt family (Figure 1A).
Dramatic variability in hydrophobicity of the conserved core across the superfamily
We observed that these Wnt-related families dramatically varied in their hydrophobicity. Using an index for transmembrane propensity (38) (see Methods) and comparing that to known transmembrane (TM) segments, we predict that the α-helices in 18 of the 30 families are hydrophobic enough to qualify as TM domains, and show a statistically significant tendency to group to the exclusion of the other families (Figure 1C, S3). Thus, these are predicted to be integral membrane domains. Further, these ‘hydrophobic families’ often evince a broader and deeper phyletic distribution pattern than the less-hydrophobic families (Figure S1, methods), implying that the ancestral version of the superfamily was likely an integral membrane domain. Thus, their association with the lipid membrane, combined with the cone-like shape of the conserved core (Figure 1A), leads us to refer to the whole superfamily hereinafter as the Lipocone superfamily.
Alphafold3-assisted transmembrane topology prediction (39) revealed that 14 of the 17 integral membrane families are consistently oriented with the aperture of the cone-like structure opening toward the outer face of the membrane. This predicted TM topology is also generally consistent with the domain fusions when present: e.g., domains that are typically cytoplasmic and those that have extracellular or periplasmic functions are respectively predicted as projecting either inside or outside the membrane (see below). However, three families in the cpCone clade (see below) did not yield consistent orientation predictions, potentially owing to the diversity of structural variations observed in the clade, including a circular permutation event.
A unified biochemistry for the Lipocone superfamily
Of the 30 identified families, 26 display a striking conservation pattern of polar residues associated with the pocket of the Lipocone domain (Figure 2, S2). Of these, a set of three positions, one mapping to each of H2, H3, and H4, can be inferred as being ancestrally present and were likely occupied by a histidine (H2), glutamate (H3), and aspartate (H4), though in some families their identities have secondarily changed (Figures 2, S2). A fourth well-conserved polar position is observed at or near the end of H3; while its ancestral identity is difficult to establish, it is frequently an aspartate or glutamate (Figure 2). Two further well-conserved positions are often seen in H4: a polar position downstream of the broadly conserved aspartate residue and a glycine residue near the C-terminus of H4 (Figure 2) that likely caps the said helix. Although the ancestral pattern is noticeably degraded in the metazoan Wnt (Met-Wnt) family, it is strongly preserved in the prokaryotic Min-Wnt family (Figure 1A). In experimentally determined and modeled structures, the above set of 4 conserved positions forms a predicted active site in the aperture of the Lipocone domain. This, in turn, implies a shared biochemistry across the superfamily, with secondary inactivation in some families like Met-Wnt (see below, Figure 2). At the same time, the differences in the specific residues in the conserved positions between different families point to a range of distinct but related activities across the superfamily (40–42).
Consistent with these observations, two of the families with intact active sites, the PTDSS1/2 (28,43) and TelC (29), which we identified in this work as members of the Lipocone superfamily, have been characterized as active enzymes operating on different lipid substrates (Figure 3A). The eukaryotic PTDSS1/2 localizes to the endoplasmic reticulum (ER) membrane and catalyzes a reaction on the polar head group of phosphatidylcholine or phosphatidylethanolamine (44–46). PTDSS1 and PTDSS2, respectively, exchange the phosphate-linked choline or ethanolamine head groups with L-serine (28) (Figure 3A). The toxin domain of TelC acts on lipid II (29), the final intermediate in peptidoglycan biosynthesis, which couples an undecaprenyl diphosphate tail to a head group comprised of a N-acetylmuramic acid-N-acetylglucosamine disaccharide, with a pentapeptide further linked to the former sugar (47,48). TelC cleaves the bond between the undecaprenol and the diphosphate coupled to the head group (29) (Figure 3B). The reaction is comparable to that catalyzed by PTDSS1/2, as both attack phosphate linkages in lipid head groups. However, TelC apparently directs a water molecule for the attack in lieu of the hydroxyl group of serine directed by PTDSS1/2 (Figure 3B).

Known and predicted Lipocone reaction mechanisms.
Experimentally supported reactions are boxed in blue (A-B), while a predicted reaction based on genome displacement by a Lipocone domain of an experimentally characterized enzyme is boxed in orange (C). The remaining reactions (D-G) are suggested based on the contextual inferences in this work. Attacking and leaving groups are denoted by dashed green and red circles, respectively.
Combining the above observations, we infer the unified biochemistry for the catalytically active families thus: 1) They act on the head groups of lipids either by removing or swapping phosphate-linked head groups (Figure 3A-B). These would be comparable to the phospholipase D (PLD), transphosphatidylation or polyisoprenol phosphoesterase reactions (49). 2) Given the cone-like cavity and the hydrophobicity of the helices, the lipid tail is predicted to be housed within the lipocone with the head group positioned in the active site. 3) In the case of the integral membrane versions, their orientation would predict the targeting of the head groups of the outer leaf of the bilayer.
Major clades of the Lipocone superfamily
The extreme sequence divergence of the superfamily, coupled with the small size of the domain, prevents the use of simple phylogenetic tree analyses to resolve its deep evolutionary history. Hence, we combined community finding algorithms applied on profile-profile similarity networks, comparison of structural features and motifs, and phyletic patterns (Figures 1B, 2, S1) to reconstruct the most parsimonious evolutionary scenario for the diversification of the Lipocone superfamily (Figure 4, see Methods). In the below sections, we survey the higher-order clades, highlighting their specific features.

Reconstructed evolutionary scenario for the Lipocone superfamily.
The relative temporal epochs are demarcated by vertical lines and labeled at the bottom. The clades are represented by colored lines indicating the maximum depth to which the families listed to the right can be traced. Colors track the superkingdom-level phyletic distribution of the family. Dashed-line circles indicate uncertainty in the origin of lineage(s). Inferred or experimentally characterized functions for families are indicated to the left of family names. Asterisks denote newly described families.
SAW (SAA-Wnt) clade
This clade consists of four families, with the two prokaryotic families (Min-Wnt and prok-SAA) (24,50), respectively, giving rise to their counterpart eukaryotic families (Met-Wnt and Met-SAA; Figures 1A,4). This clade is structurally unified by the presence of a fifth helix that stacks in the space between the H2 and H4 helices (Figure 1A, S2). In the Wnt families, this helix is comparable in length to the core helices, while in the SAA families it is usually shorter (Figure 1A). The clade is further unified by the pronounced conservation of a sNxxGR motif (where ‘s’ is a small residue) encompassing the conserved active site position in H4 (Figure 2). SAW clade Lipocones show low overall hydrophobicity and are known or predicted to be soluble domains. Outside of the clearly inactive eukaryotic Wnt family, the remaining three families largely conserve the core active site residues (Figure 2).
VanZ-Skillet clade
This clade unites seven families: the two VanZ families, VanZ-1 and VanZ-2, prototyped by the bacterial VanZ protein originally identified in the context of vancomycin resistance and the five Skillet families, which form a distinct subclade. These are unified by a “handle”-like structure (hence, “Skillet”), adopting a helical conformation in the H1-H2 linker (Figure 1A, S2). Strikingly, a symmetric helical handle is present in the H3-H4 loop of the Skillet-DUF2809 and Skillet-3 families (Figure S2) of this clade. VanZ-1 features a conserved asparagine residue in the H2 position and a DxDDxxxN motif in H4, while VanZ-2 features RKxxH and DxxxD motifs in these respective positions (Figure 2). The Skillet families are largely unifiable in their conservation of an ExxQ motif in H3, an aspartate three positions upstream of the canonical H4 aspartate, and another aspartate in the H2 contributing to the active site. These first two features specifically ally them with the VanZ-1 family (Figure 2).
While the VanZ domain was previously reported as including a fifth TM helix, which is C-terminal to the 4-helix Lipocone core defined here (51,52), our survey instead reveals a striking diversity of configurations around the core 4-helix Lipocone domain (Supplemental Material). These range from standalone Lipocone configurations to one or more TM-helices adorning the domain at its N- and/or C-terminus. This variation is consistent with a further tendency for the VanZ families to feature an extensive diversity of domain fusions to both soluble globular domains and discrete TM modules (see below).
The VanZ families are deep-branching, as suggested by their wide phyletic spread (Figure S1). VanZ-2 is the most widespread individual Lipocone family in bacteria, with several genomes encoding multiple paralogs (Supplemental Material) (51,53). It is also found in certain eukaryotes, including a pan-fungal presence and in some representatives of the SAR clade. Both VanZ-1 and VanZ-2 are particularly well-represented in Gram-positive bacterial lineages like Actinomycetota and Firmicutes, while VanZ-2 is nearly universally conserved in the Bacteriodetes/Chlorobi lineage (Figure S1). In contrast, only one of the Skillet families, Skillet-DUF2809, is widely but sporadically distributed, with the four others being more restricted (Figures 4, S1).
YfiM clade
This clade includes three families that are consistently centrally located in the profile-profile similarity network (Figure 1B). This is likely due to their being close in sequence conservation to the ancestral state of the superfamily (Figure 2). Consistent with this, the YfiM-1 family also presents a structurally minimal Lipocone domain, comprised of just the 4-helix configuration with no further elaborations. Notably, this also extends to a lack of domain fusions in this family. In contrast, YfiM-DUF2279 and YfiM-Griddle (DUF3943) are structurally distinguished by an unusual H1-H2 linker (Figure 1A), which wraps around the outside and stacks against the H3-H4 linker (Figure S2). The YfiM-Griddle family further features a unique ‘flattened’ surface around the aperture of the Lipocone formed by protruding loops (hence, “Griddle”; Figure S2). This leaves the active site pocket more accessible relative to families with more elaborately structured inter-helix linkers. The Griddle family also features a C-terminal extension with a two-helix hairpin (with a hhsP motif in the turn between the two helices, where ‘h’ is a hydrophobic residue and ‘s’ is a small residue) (Figure S2). The three YfiM families straddle the membrane-propensity boundary in the plot (Figure 1C). Further, the YfiM-DUF2279 and Griddle families are strikingly absent in Gram-positive bacterial lineages (Figure S1). Concurrent with these features, they are often predicted by the deep-learning-based localization predictor deepTMHMM as outer-membrane proteins, suggesting a role in this subcellular location (see below).
ClaspCone-CapCone-TelC clade
Members of this clade are unified by an elaborated H1-H2 linker that often contains one or more helical segments that are typically predicted to guard the aperture of the Lipocone domain (Figure S2). This linker ends in a “clasp”-like element, which forms a range of structures in different families of the clade before leading into H2 (Figure S2). The clade is also unified by a striking reduction of overall hydrophobicity, predicting that the members of this clade are soluble domains (Figure 1C). Outside of the divergent TelC subclade, most of the families in this clade conserve a serine residue three positions upstream of the active site aspartate in H4, often preceded by an aromatic residue, which is typically phenylalanine. H4 also usually features a conserved asparagine four positions downstream of the conserved aspartate active site position, immediately preceded by a small residue (Figure 2). The second H3 active site position is generally poorly conserved, though when present, it is usually an aspartate residue. Finally, H2 contains either a DK or xD motif four positions upstream of the canonical H2 active site histidine residue (Figure 2).
The most rudimentary clasps are found in the ClaspCone-1, -2, and -3 families, where it is little more than a rounded loop, though, in ClaspCone-1, a small β-hairpin emerges within it. The three ClaspCone families are further unified by the presence of a two-helix insert leading into H2 that stacks against the Lipocone core (Figure S2). The three CapCone families, CapCone-DUF4056, CapCone-1, and CapCone-2, are named so for an encasing structure over the active site resembling a cap (Figure S2). They share a conserved glycine residue six positions upstream of the active site H2 histidine and a S/GxxSxx motif upstream of the conserved H4 aspartate (Figure 2). They are further unified by a pronounced β-hairpin clasp augmented by an additional strand (Figure S2). They also display varying degrees of degeneration of H1, along with family-specific structural elaborations.
The TelC group of this clade, prototyped by the streptococcal TelC toxin (29), is divided into two families featuring prokaryotic (prok-TelC) (29) and metazoan versions (Met-TelC) (54). Both TelC families feature a “cap” with contributions from inserts in the H1-H2 and H3-H4 loops (Figures 1A,S2). Unique to these families is the conservation of an aspartate residue located six positions downstream of the canonical active site aspartate of H4 (Figure 2). This aspartate points away from the center of the Lipocone and interacts with a conserved arginine from a synapomorphic C-terminal helical extension.
cpCone clade
A widespread yet sporadically distributed clade of seven families emerging as a stable community in the profile-profile similarity network (Figure 1B) is defined by a unique structural synapomorphy: a circular permutation (55) (hence, cpCone) placing the normally N-terminal H1 at the C-terminus of H4 (Figure S2, S4). This clade is also united by unique sequence features, viz., a polar residue (typically aspartate) six positions upstream of the conserved H2 histidine and a second glutamate three positions downstream of the conserved H3 glutamate (Figure 2). While the circular permutation is shared across the clade, several structural variations are seen, often within the same family (Figure S4). These include: 1) versions containing a duplication of the Lipocone domain. While the second copy in these versions is catalytically inactive, the H1’ from the second duplicate displaces the H1 from the first copy, suggestive of an intermediate to the circular permutation. 2) Versions retaining a candidate H1 that has been displaced by H1’ in a five-helix arrangement. 3) Those containing just the circularly permuted core. 4) Versions showing a degradation of the H1 helix, preserving just a 3-helix core (Figure S4). Despite this propensity for structural variation, the active site residues are strongly conserved, with the exception of the cpCONE-i family, which we infer to be catalytically inactive (Figure 2). The core helices of the cpCone clade are strongly hydrophobic, and they are all predicted to be integral membrane domains (Figure 1C). Consistent with this, the eukaryotic PTSSD1/2 domains reside in the ER membrane (44,45).
Wok family
The Wok family (partly covered by the Pfam DUF2238 model) shows a higher order grouping with the above circularly permuted clade (Figures 1B,4) but has a phyletic distribution only rivaled by the VanZ-2 family (Figure S1), suggesting a deep-branching origin. The shape of this family is reminiscent of a wok formed by two distinguishing structural synapomorphies: a 2-TM helix N-terminal extension and a unique “handle” formed by the linker between the H3 and H4 (Figure S2). It additionally features a C-terminal, rapidly diversifying cytoplasmic tail. Despite these elaborations, it retains the inferred ancestral active site configuration (Figure 2). The strongly hydrophobic core helices of the Wok family predict it to be an integral membrane enzyme (Figure 1C).
Functional themes in the Lipocone superfamily
Given our inference of shared general biochemistry across the Lipocone superfamily in targeting phosphate-containing linkages in head groups of both classic phospholipids and polyisoprenoid lipids, we next used contextual information from conserved gene-neighborhoods, domain architectures and phyletic pattern vectors, a powerful means of deciphering gene function (56), to narrow down the predictions for specific families (Figure 5, Table S1, Supplementary Data). To this end, we constructed a graph (network) wherein the nodes are individual domains and edges indicate adjacency in domain architectures or conserved gene-neighborhoods (Figure 6, see Methods). We then identified cliques in these networks and merged the individual cliques containing a particular Lipocone domain to define its dense subgraph (Figures S5-S7). We then analyzed these subgraphs to identify statistically significant functional categories represented in them (Table S2; see Methods). This data was combined with existing experimental results and the sequence and structure analyses outlined above to arrive at the functional themes surveyed in the below sections.

Representative contexts for the Lipocone superfamily, grouped by shared functional themes.
Genes are depicted by box arrows, with the arrowhead indicating the 3’ end of genes. Genes encoding proteins with multiple domains are broken into labeled sections corresponding to them. Domain architectures are depicted by the individual domains represented by distinct shapes. TMs, lipoboxes (LPs), and SPs are depicted as unlabeled, narrow yellow, blue, and red rectangles, respectively. All Lipocone domains are consistently colored in orange. Genes marked with asterisks are labeled by the Genbank accession number below each context. Colored labels above genes denote well-known gene names or gene cluster modules. Abbreviations: PTase, peptidase; TFase, transferase; GlycosylTFase, Glycosyltransferase; MPTase, metallopeptidase; TGase, transglycosylase; SLP, serine-containing lipobox; cNMPBD, cNMP-binding domain; NCPBM, novel putative carbohydrate binding module; (w)HTH, (winged) helix-turn-helix; ZnR, Zinc ribbon; PPTs, pentapeptide repeats; Imm, immunity protein; βPs, β-propeller repeats; Cystatin-FD, Cystatin fold domain; MTase, methylase; PGBD, peptidoglycan-binding domain; MβL, metallo-β-lactamase; L12-ClpS, ClpS-ribosomal L7/L12 domain; TA, teichoic acid.

Lipocone contextual network.
The network represents the conserved contextual associations of Lipocone domains (hexagonal nodes). Nodes and edges are colored based on known or inferred functional categories of the domains. The nodes are scaled by their degree. Gray coloring indicates domains without specific functional assignments. Examples of conserved gene neighborhoods and domain architectures supplementing those in Figure 5 illustrate contexts that bridge functional themes. Here, individual domains are colored to match network coloring. Additional abbreviations to those in Figure 5: APH-Pkinase, aminoglycoside phosphotransferase-like kinase; HUP, HIGH, UspA and PP-ATPase superfamily-like domain; Alk-phosphatase, Alkaline phosphatase; dehyd, dehydrogenase; TPRs, tetratricopeptide repeats; PMM/PGM, phosphomannomutase/phosphoglucomutase; ZnF, zinc finger; APC-transporter, amino acid-polyamine-organocation transporter; LPS, lipopolysaccharide.
Lipocone domains in membrane lipid, peptidoglycan and exopolysaccharide modifications
Across different Lipocone families, we found statistically significant connections to roles in modifying lipid head groups in various membranes and in lipids involved in the synthesis of extracellular matrix polymers such as peptidoglycan and lipopolysaccharides (Figure 6, Table S2, Supplementary Data).
Archetypal lipid head group exchange reactions catalyzed by the cpCone clade
One of the few experimentally characterized Lipocone families is the eukaryotic PTDSS1/2 family of the cpCone clade, members of which exchange the head group of essential membrane phospholipids to generate phosphatidylserine from phosphatidylethanolamine or phosphatidylcholine (Figure 3A) (28,57). Given the pervasive presence of this clade in archaea (Figure S1), it is thus tempting to speculate that these archaeal cpCones may play a role in the modification of Archaea-specific lipids (58–60) through a comparable head group exchange reaction (see below).
In bacteria, the related cpCone-1 family shows operonic association with a LolA-like lipoprotein which shuttles lipoproteins to the outer membrane (61) and a novel 4TM protein (Figure 5A). This raises the possibility that cpCone-1 might mediate the formation of membrane domains featuring lipids with a modified head group that act as foci for the trafficking of lipoproteins. Curiously, the cpCone-1 gene might also be inserted between the bacterial chromosome segregation and condensation complex subunits the Kleisin ScpA and the wHTH ScpB (62–65). The bacterial cpCONE-DUF2585 is operonically coupled to a GNAT family NH2-group-acetyltransferase and further linked to genes for the glycolate oxidase GlcE and GlcF (66) and the bacterial proteasome subunits HslV and HslU (67) (Figure 5A, Supplementary Data). These might point to the coupling of membrane lipid head group modifications with disparate processes, such as chromosome segregation during cell division or different responses to stress (68–70).
The Wok and YfiM-1 families in cardiolipin and modified isoprenoid lipid pathways
We observed a set of conserved gene neighborhoods displaying the mutually exclusive presence of a synaptojanin-like phosphatase gene, with one encoding either a member of the Wok family or a cardiolipin synthase of the HKD superfamily (71) (Figure 5B, Supplementary Data). This suggested that the latter two are analogous enzymes catalyzing equivalent reactions. The cardiolipin synthase utilizes two phosphatidyl glycerol molecules as substrates to generate cardiolipin with the release of one of the glycerol head groups (72). This is comparable to the head group exchange reaction catalyzed by PTDSS1/2 from the cpCone clade (Figure 3A). Hence, we propose that these members of the Wok clade are cardiolipin synthases (Figure 3C). Distinct phosphoesterases, namely the synaptojanin-like, calcineurin-like (73) and HAD (74) enzymes, are also observed in gene-neighborhood associations with the Wok, suggesting that they might together regulate membrane lipid composition by acting on the phospholipids or their precursors (Figure 5C). In a distinct neighborhood, the Wok clade enzyme is coupled to carotenoid biosynthesis genes (75,76). (Figures 5D,S5). This raises the possibility that these members might also catalyze a comparable reaction to the above on isoprenoid lipids: for instance, they could synthesize a carotenoid from two geranylgeranyl-diphosphate molecules (77,78). In both these contexts, the actinobacterial operons often include genes for GT-A family glycosyltransferases, suggesting the further synthesis of glycosylated derivatives of the lipids or carotenoids (79) (Supplementary Data). In several bacteria, a YfiM-1 family Lipocone is operonically coupled to a UbiA-like prenyltransferase (80). This gene neighborhood additionally codes for a slew of enzymes, such as an amidophosphoribosyltransferase (81), a RidA-like deaminase (82), and a pair of structurally distinct phosphoesterases, respectively, containing an HD and a PHP domain (73,83) (Figures 5E, S5 Supplementary Data). This suggests a role for the YfiM-1 Lipocone and the associated enzymes in generating a modified polyisoprenoid metabolite.
VanZ families modifying lipid head groups in peptidoglycan and exopolysaccharide metabolism
The widespread VanZ-1 and VanZ-2 families (Figure 1A) frequently show either gene neighborhood associations or direct domain fusions, with diverse genes involved in both peptidoglycan and other extracellular polysaccharide pathways. Chief among these are the lipid carrier flippase (Pfam: MviN_MATE clan) (84–86), the UDP-GlcNAc/MurNAc lipid transferases, which generate the lipid-linked exopolysaccharide precursors (lipid I) (48,87), and UDP-N-acetylglucosamine (UDP-GlcNAc) biosynthesis enzymes (88,89). Despite certain examples of crossover in functional themes, the gene-neighborhood contexts of VanZ-1 and VanZ-2 suggest a metabolic partitioning, with VanZ-2 significantly associating specifically with peptidoglycan-related genes and VanZ-1 significantly linking with biosynthesis genes for other exopolysaccharides (e.g., the outer-membrane-associated lipopolysaccharide) (90) (Figures 5F,6,S6, Table S2). The latter include WaaL-like lipid A transferase (91), the polysaccharide chain-length determination domain Wzz (92), the Wzc kinase and the “extracellular antigen”-regulating ElyC-like domain (Pfam: DUF218) (93), and numerous nucleotide-diphosphate sugar biosynthesis and modification enzymes (94) (Figures 5F,6,S6).
The precursors of both peptidoglycan and exopolysaccharides are synthesized in the cytosol, linked to lipid carriers via a diphosphate linkage, e.g., the polyisoprenoid lipid undecaprenol (bactoprenol) (90,94–97). A key step in their maturation is the flipping by the flippase of the lipid-linked intermediates associated with the inner membrane to the outer membrane. These flipped units are then incorporated into the maturing chain (98,99) by the peptidoglycan glycosyltransferase (GTase) (100) and the chain length determination protein, WzzE/polymerase (WzyE) (92,101), in peptidoglycan and other exopolysaccharide maturation pathways, respectively. Based on the precedence of the TelC-catalyzed reaction (Figure 3B), we propose that VanZ-1 and VanZ-2 comparably act on the flipped lipid II head groups bearing the modified sugar intermediates to release the undecaprenol via phosphoester cleavage (Figure 3F). Such activity could modulate the concentration of available peptidoglycan intermediates and allow formation of peptidoglycan with varying thickness and composition during different phases of the life cycle, e.g., sporulation versus vegetative growth in Bacillota. Such a reaction could also possibly modulate exopolysaccharide biosynthesis by comparably acting on their precursors.
The terminal transfer from the lipid carrier of the Gram-negative bacterial O-antigen (as well as other exopolysaccharides attached to the lipid A carrier) has been attributed to the WaaL-like enzymes (91,102). However, bacteria generate further lineage-specific polysaccharide decorations, capsule structures, and other exopolysaccharides (e.g., xanthan, enterobacterial common antigen (ECA), alginate, colonic acid), as well as teichoic acids (e.g., wall teichoic acids, WTA) (103, 104). Notably, the analogs of WaaL, i.e., the terminal transferases for several exopolysaccharides, including ECA and WTA, have to date escaped identification (93). Hence, it is possible that, by analogy to the PTDSS1/2 reaction (Figure 3A), the VanZ families act on the lipid carrier-linked sugar head groups to catalyze either the extension of the polysaccharide chains through transesterification or the terminal release of the mature chain through phosphoester cleavage (Figure 3E).
Atypical VanZ domains in uncharacterized modifications of peptidoglycan and the outer membrane
Certain representatives of the two VanZ families also show operonic associations indicative of outer membrane-associated or peptidoglycan modification functions distinct from those described above (Figures 5G,6 Supplementary Data): 1) An operon in FCB group bacteria couples a VanZ-2 gene with those coding for a SprA secretin-like channel protein (105), a glycine cleavage H (GCVH)-like lipoyl-group carrier protein (106), a 2TM protein fused via a proline-rich linker to a C-terminal TonB-C domain (107), and a secreted, second TonB-C domain fused to a Wzi-like outer membrane protein (OMP) superfamily β-barrel (108) (Figures 5G, 6). 2) In betaproteobacteria, certain VanZ-1 domains are duplicated with the C-terminal copy being inactive (VanZ-i) and found in an unusual four-gene operon with a thioredoxin-fold [2Fe-2S] ferredoxin (109), a possible lipase of the α/β-hydrolase superfamily (110), and a metallo-β-lactamase (MβL) fold D-Ala-D-Ala cross-linking transpeptidase (111,112). 3) A patescibacterial operon encodes a VanZ-2 domain with an ABC ATPase transporter system, either of two structurally distinct peptidases, namely a Papain-like or glycine-glycine peptidase (113,114), fused to the same membrane-anchored N-terminal coiled-coil region, and a further TM protein containing one or more external Lamin-Tail domains (LTDs) predicted to bind extracellular DNA or polysaccharides (115) (Figures 5G,6,S6). The associations in the first of the above neighborhoods point to a distinct outer membrane-associated lipid modification, while the other two might be involved in lineage-specific decorations/modifications of peptidoglycan, accompanied by peptide-crosslinking or cleavage activities.
Lipocone domains operating in the outer membrane
Contextual associations, phyletic patterns, and localization predictions support the action of two Lipocone families directly in the outer membrane. Notably, the YfiM-Griddle and YfiM-DUF2279 families are found nearly obligately directly fused or operonically linked to several distinct OMP β-barrels (116,117) (Figures 5H,6,S5). Up to three YfiM-Griddle Lipocones, usually with a cognate OMP β-barrel, might be encoded next to each other in the genome. Additionally, YfiM-Griddle family genes are often encoded in operons with several components of the outer membrane lipid and protein trafficking apparatus, including the LolA-like chaperone (118), the POTRA domain (119,120), the channel-blocking Plug domains (121), and the TolA-binding TolB-N domain (122). Further, these operons might encode a Patatin-like lipase (123), GT-B family glycosyltransferases (79), and a range of phosphoesterases (e.g., an integral membrane phosphatidic acid phosphatase PAP2 (124), a lipobox-containing synaptojanin superfamily phosphoesterase (125) and a secreted R-P phosphatase (126) (see Figures 5H,6, and Supplementary Data)). In addition to the fusion to the OMP β-barrel, the YfiM-DUF2279 family (Figure 5H) shows operonic associations with a secreted MltG-like peptidoglycan lytic transglycosylase (127,128), a lipid-anchored cytochrome c heme-binding domain (129), a phosphoglucomutase/phosphomannomutase enzyme (130), a GNAT acyltransferase (131), a diaminopimelate (DAP) epimerase (132), and a lysozyme-like enzyme (133). In a distinct operon, YfiM-DUF2279 is combined with a GT-A glycosyltransferase domain (79), a further OMP β-barrel, and a secreted PDZ-like domain fused to a ClpP-like serine protease (134,135) (Figure 5H).
The strong linkage to the OMP β-barrel, together with their predicted localization, suggests that these YfiM-Griddle and YfiM-DUF2279 Lipocone domains operate in the outer membrane, potentially in concert with both cytoplasmic carbohydrate biosynthetic modules and periplasmic lipid- and carbohydrate-processing enzymes. As with the inner membrane lipids, they could potentially catalyze modifications of head groups through transesterification and/or linkage/release of outer membrane-associated polysaccharide chains through action on lipid-head group phosphoesters.
Lipocone domains acting on lipids in transit to the outer membrane
The ClaspCone-1 and ClaspCone-3 families lack the hydrophobicity indicative of direct residence in the membrane (Figure 1C); instead, they are predicted to localize to the periplasmic space. In the ClaspCone-1 family, the Lipocone domain is fused at the extreme N-terminus to either a single TM or a 5TM domain predicted to anchor it to the cell membrane. Between this TM element and the Lipocone domain, we detected a previously uncharacterized version of the Tubular lipid binding protein (TULIP) domain (136,137) or an Ig-like and a Zincin-like metallopeptidase (MPTase) domain (138) (Figures 5I, 6). These ClaspCone-1 genes may also show operonic associations with genes encoding a lipase of the SGNH family (139) and a membrane-bound O-acyltransferase (MBOAT; Figure 5I, Supplementary Data) (140). The TULIP domain superfamily has recently been characterized as a lipid-binding domain (136,137), which in proteobacteria functions in outer membrane lipid transport (141,142). Thus, we propose that the ClaspCone-1 family is likely to act in the periplasmic space on the head groups of outer-membrane targeted lipids bound to the TULIP or potentially to the Ig-like domains occupying an equivalent position in the domain architecture.
A Lipocone domain catalyzing a predicted lipoprotein lipid linkage reaction
The Skillet-1 Lipocone is strongly coupled in an operon with a downstream gene coding for a protein with an unusual lipobox-like sequence followed by one of several extracellular domains (e.g., concanavalin, β-jelly roll, OB-fold, Ig-like, β-propeller) predicted to bind carbohydrates or other ligands (143–147) (Figures 5J,6,S6 Supplementary Data). The lipobox-like sequence features a conserved GS motif at its C-terminus instead of the usual GC of the classic lipobox of bacterial lipoproteins (148) (Figure S8). In the canonical lipoprotein processing pathway, a thioether linkage is formed between the sulfhydryl of the cysteine and a diacylglycerol lipid embedded in the inner membrane by the lipoprotein diacylglyceryl transferase (lgt) enzyme, followed by the cleavage of the signal peptide at the GC motif junction by the signal peptidase (149,150). Given the serine in place of the cysteine in these lipobox-like sequences, we propose that it undergoes non-canonical lipidation by the associated Skillet-1 Lipocone protein in lieu of lgt. We propose that, comparable to PTDSS1/2, which act on free serine, the Skillet-1 family links the conserved serine from the lipobox-like sequence to a phospholipid (Figure 3A,D).
Lipocone domains in predicted lipid-associated signaling systems
Systems defined by standalone proteins with Lipocone domains
Several representatives of the two VanZ and Skillet-3 families are fused to a diverse array of known or predicted extracellular ligand-binding domains (Figure 5K), where the architecture takes the form of SP+X+TM+Lipocone or Lipocone+TM+X, where ‘X’ is the extracellular ligand-binding domain and SP is a signal peptide. The ligand binding domains include: (i) carbohydrate-binding lectin domains such as jelly-roll, concanavalin-like, NPCBM-like, CBD9-like, and other β-sandwiches (143,144,151–153)); (ii) a lipid-binding helix-grip superfamily domain (154)); (iii) those binding other potential ligands (e.g., Ig, OB-fold, YycI-like, DUF498-like, PepSY-like, β-helix, TPR, MORN, and β-propeller repeats (145–147,153,155–158)) (Figure 5K, Supplementary Data). We interpret these architectures as implying signaling, wherein the binding of the cognate ligand by one of the above domains regulates the catalytic activity of the associated Lipocone domain. Among these, the extracellular domains fused to the Skillet-3 family are particularly notable for their extreme variability (Figure 5K). This suggests their diversification under an arms race scenario (also see below) in a biological conflict. Further, the genes coding for the above are sporadically associated with exopolysaccharide metabolism genes (Supplementary Data). Hence, it is conceivable that this signaling is associated with exopolysaccharide variation (e.g., O-antigen phase-variation (159,160)), which might play a role in evading bacteriophage attachment.
Additionally, VanZ-1 Lipocone domains are also fused to several known signaling domains confidently predicted to reside in the cytoplasm, including the cyclic nucleotide-binding domain (cNMPBD), phosphopeptide-binding FHA, and DNA-binding RHH and HTH domains (65,161–163) (Figure 5K). These associations suggest potential VanZ regulation via a cytoplasmic cyclic nucleotide (sensed by cNMPBD) or, conversely, VanZ acting as an allosteric regulator of a transcriptional program via the HTH or RHH domain. One of the most common yet enigmatic fusions to VanZ is with the integral membrane RDD domain (53). The role of this domain is unknown; however, our analysis indicates that it contains a conserved intra-membrane binding site oriented towards the cytoplasmic face of the membrane (Nicastro GN, Burroughs AM, Aravind L, manuscript in preparation). The VanZ-RDD fusion is sometimes further fused to other domains (Figure 5K), the most notable being a highly derived but active novel Histidine Kinase domain (Figure 5K). Together, these associations point to the coupling of lipid modification with a signaling event on the cytoplasmic face of the membrane, which might relate to the dynamic regulation of lipid-carrier-bound exopolysaccharide precursors.
Multi-component associations of the Lipocone proteins in signaling
These systems resemble the above-discussed versions but are encoded by conserved gene neighborhoods that separate the Lipocone and the signaling elements (typically predicted transcription regulators) into distinct genes. Our analysis recovered at least three such systems: 1) A VanZ-1 Lipocone in the recently described HAAS/PadR-HTH two-component systems, which sometimes replace classical Histidine kinase-Receiver two-component systems (164). In these systems, the detection of an extracellular or intramembrane stimulus by a sensor domain releases the PadR-HTH transcription regulator bound to the sensor-fused HAAS domain. Here, VanZ-1 occupies the sensor position (Figure 5L). 2) A Skillet-2 Lipocone is coupled in a core two-gene system to a conserved upstream gene (Figure 5L,S5). That gene encodes a single TM protein with either a zinc ribbon (ZnR) fused to a conserved helix or an HTH domain fused to a ClpS-ribosomal L7/L12 domain in its cytoplasmic region (165). These neighborhoods might also code for an HMG-CoA reductase and GHMP kinase that catalyze successive reactions in the production of phosphomevalonate, a precursor of isoprenoid lipids (166,167). 3) A Skillet-DUF2809 Lipocone protein is operonically coupled with a 6TM protein and a further predicted transcription factor with a wHTH protein. These operons are further elaborated via additional embedded and flanking genes, either coding for components of isoprenoid lipid (e.g., undecaprenol) (168,169) or exopolysaccharide (e.g., ECA and related polysaccharides) metabolism (94,97) (Figures 5L,6,S6 Supplementary Data).
The Lipocone domains in these systems are predicted to be active enzymes, which, together with their operonic associations, point to functions involving the modification or transesterification of isoprenoid lipid head groups, sometimes in the context of exopolysaccharide biosynthesis. However, their associations with the intracellular HTH domains suggest that the Lipocone enzymatic activity is potentially coupled with the transcriptional regulation of the production of precursors of the lipids or exopolysaccharides. Given the high variability in the associated genes related to exopolysaccharide/lipopolysaccharide biosynthesis, we anticipate that the associated transcriptional regulation potentially relates to functional categories showing high diversity across bacteria, such as responses to environmental stress, phages, predatory bacteria attacks, or host immune response.
Lipocone domains as effectors in biological conflicts
Lipocone domains in antiviral immunity
The Min-Wnt domains (Figure 1A) that we originally identified were predicted to play a role in biological conflicts with invasive selfish elements, such as viruses (24). In this work, we better explain their potential mechanism of action. These versions show no fusions to extracellular domains or secretory signals, suggesting that they are deployed from within the bacterial cell (Figure 1C). These Min-Wnts are typically fused to the DUF3892, which displays a fold characterized by a three-stranded meander followed by a helix also seen in the dsRNA-binding domain and the ribosome hibernation factors (HPF) (170,171) (Figure 5M). Hence, we propose that these versions might potentially act to sense virally induced RNAs or modified ribosomes (24) to trigger a dormancy or suicide response to limit viral infection via the Min-Wnt effector. Specifically, the Min-Wnt might attack peptidoglycan precursors, such as lipid II, prior to their ‘flipping’ to restrict cell wall synthesis (85,172,173) or other such carrier lipids.
One other Min-Wnt domain, N-terminally fused to a three-stranded β-meander, is pervasive in the Bacteroidetes clade. This is operonically coupled with genes encoding a TM-linked run of pentapeptide repeats and two structurally distinct, secreted glycosyl hydrolase enzymes, respectively, containing a TIM barrel domain and a run of β-helix repeats (Figure 5M, Supplementary Data). Further, cyanobacteria show a standalone prok-TelC domain without any secretory signals. These could again act as effectors targeting lipid-linked precursors of peptidoglycan or expolysaccharides in response to intracellular invaders or stress (Figure 5M). Interestingly, some tailed bacteriophages also code for intracellular Min-Wnt domains, suggesting that they might also be deployed on the virus side in biological conflicts such limiting superinfection (Supplementary Data).
Lipocones as toxin domains in polymorphic and allied conflict systems
Polymorphic toxins and related systems, widespread across bacteria and certain archaea, are characterized by a highly variable C-terminal toxin domain (“toxin tip”) that is preceded by a range of more conserved domains typically required for autoproteolytic processing of the toxin, its packaging and trafficking (e.g., RHS repeats), adhesion and secretion via one of several secretory systems (174,175). The toxin might be delivered via one of the secretory systems into a target cell or else via direct contact between interacting cells. Classical polymorphic toxins are usually involved in kin discrimination and are accompanied by genomically linked cognate immunity proteins that protect against self-intoxication (174,176). Keeping with the principle of effector sharing between systems involved in distinct types of biological conflicts, we had originally identified a Min-Wnt domain closely related to those described in the above subsection as a toxin tip in polymorphic toxin systems (24). In the current work, we extend these findings to show that several distinct Lipocone families have been independently recruited as toxin tips of polymorphic toxins and related systems, namely Min-Wnt, prok-SAA, prok-TelC, CapCone-1, CapCone-2, ClaspCone-2 and VanZ-1 (Figures 4,5N,6,S7).
Certain CapCone-2 and Min-Wnt toxins from Gram-positive bacteria define some of the simplest of these toxin systems. Here, a standalone Lipocone domain is coupled to a signal peptide or lipobox via a poorly structured linker. These are usually encoded in a two-gene configuration with their cognate immunity protein (Figure 5O). More complex versions present, in addition to adhesion, peptidoglycan-binding, lipid-binding and proteolytic processing domains, multiple hallmarks of delivery through specific secretion systems. These include T4SS (VirD4-binding domain), T6SS (PAAR domain), T7SS (WXG/LXG domain), T9SS, and MuF domains (174,176) (Figure 5P, Supplementary Data). Additionally, we recovered standalone CapCone-1 domains encoded in an operon with a PsbP/MOG1 superfamily domain diagnostic of secretion via the T6SS (174,177) (Figure 5P). Further, we also found Min-Wnt domains fused to the N-termini of RTX-like β-roll repeats, suggestive of T1SS-mediated export (178) (Figure 5P).
Our analysis also uncovered multiple, previously uncharacterized trafficking/packaging systems associated with different Lipocone polymorphic toxins. Several Min-Wnt and CapCone-1 domains with lipoboxes are fused to an N-terminal Cystatin-like superfamily domain (179) (Figure 5Q). The same domain is also comparably fused to several other C-terminal toxin domains in related organisms, some of which are also predicted to target lipid head groups: (i) a novel toxin domain we unified with the lipid-targeting Colicin M fold (180); (ii) a lipid-binding START-domain-like helix-grip fold domain (154); (iii) a papain-like fold fatty acyltransferase (181); (iv) a domain related to the VanY-like D-Ala-D-Ala carboxypeptidase (182) (Figure 5Q, Supplementary Data). In all these cases, the toxins are coupled to a related immunity protein (see below), suggesting that they define a distinct polymorphic toxin system. We propose that this Cystatin-like domain specifies a novel packaging or deployment system upon secretion for the C-terminal toxin domain, analogous to Cystatin domains in functioning with eukaryotic proteases (183). The prok-TelC family Lipocones are found in distinctive architectures in two poorly characterized, predicted polymorphic toxin systems. In one of them, they are fused to an N-terminal glucan-binding GbpC β-sandwich domain (184) and repeats of MucBP-like Ig domains (185), which might anchor them to exopolysaccharides (Figures 5Q,S7 Supplementary Data). The second variant found in association with T9SS components (186) shows fusions to one or more copies of a previously undetected TPM domain (Figure 5Q). While the domain has been claimed to be a phosphatase (187), our recent analysis indicates that this is unlikely to be the case (164). Instead, we propose that the TPM domain might assist in assembling membrane-linked protein complexes, a role that might be relevant to the trafficking of these toxins (164).
To date, the only experimentally characterized Lipocone domain from polymorphic toxins is of the prok-TelC family that are secreted via T7SS (29,188) (Figures 5P,6). Notably, prok-TelC has been shown to be active only outside the cell and not in the cytoplasm (29). As noted above, it attacks lipid II to cleave off the peptide-linked disaccharide pyrophosphate head group from the undecaprenol tail (Figure 3B). Prok-TelC has also been speculated to similarly attack WTA-lipid II linkages (29). These findings provide a template for other Lipocone superfamily effectors in potentially targeting lipid carrier linkages in peptidoglycan and exopolysaccharide intermediates. However, given the diversity within the family (Figure 3F), it is conceivable that they also target other lipids.
Immunity proteins of Lipocone polymorphic toxins indicate periplasmic/intramembrane action
To date, only a single immunity protein has been reported for Lipocone toxins, viz., TipC, which counters prok-TelC toxin in the periplasm (29,189) (Figure 5P,6). Here, we uncovered a range of immunity proteins belonging to structurally distinct folds that counter the remaining Lipocone toxins (Figures 5N-Q,S9, Supplementary Data). The most widespread of these is a rapidly evolving, membrane-anchored member of the BamE-like superfamily that associates with not only Min-Wnt and CapCone toxins but also other above-mentioned lipid-head-group targeting toxins (e.g., the novel Colicin M-like domain). The BamE-like fold features a core two-helix hairpin followed by a run of three β-strands (Figure S9). The classical BamE operates in a pathway for the assembly of OMP β-barrels (190,191), suggesting that these immunity proteins emerged from an ancestral BamE and, like it, function in the periplasm. Additional candidate immunity proteins with more restricted phyletic spreads include (Figure S9): (i) a β-jelly-roll fold-containing protein (144); (ii) an integral membrane protein with a 4-TM core. These two are observed with Min-Wnt toxins. (iii) A novel domain combining an α-helix with a run of 4 β-strands stabilized by four absolutely conserved cysteine residues. This is coupled to both Min-Wnt and prok-SAA toxins; (iv) a protein with an OB-fold domain (145) (v) a protein with a β-sandwich related to the eukaryotic centriolar assembly SAS-6 N-terminal domain (192). The last two are coupled to CapCone-2 toxins (Supplementary Data). Notably, despite their structural diversity, these immunity proteins are all TM or lipoproteins and, like TipC (29,189), are predicted to operate at the membrane or in the periplasm (Figure 5N-Q, Supplementary Data). This suggests that they intercept their cognate Lipocone toxin domain outside cells or in the membrane rather than within the cell.
Lipocone toxins in predator-prey and other interspecific conflicts
In contrast to polymorphic toxins, which are typically deployed in intraspecific conflict between competing strains of the same species, other toxin systems are deployed against more distantly related target cells, such as prey and eukaryotic hosts (193). While some of these closely parallel polymorphic toxins in their domain architecture, they are usually distinguished by the lack of an accompanying immunity protein. The simplest of these systems are secreted Min-Wnt proteins from bacteria and fungi. These present just a standalone Min-Wnt domain or one fused to a novel domain with a half β-barrel wrapping around a helix (Figure 5R, Supplementary Data). These are probably deployed as diffusible toxins that target rival organisms in the environment.
Another architectural theme is defined by Min-Wnt and prok-SAA Lipocones fused to an enigmatic, novel, short C-terminal domain, which is comprised of a long β-hairpin with a characteristic “break” in its central region, causing it to acquire an arch-like appearance (Figures 5S, S10). Hence, we refer to this domain as the broken-hairpin. We found the broken-hairpin domain to be fused to a wide array of predicted toxin domains across the bacterial superkingdom. These include effector domains otherwise found in polymorphic toxin and allied systems that target peptidoglycan, carrier lipids and the membrane, such as members of the Colicin M (180), Zeta toxin-kinase (194), lysozyme (195), an α/β-hydrolase superfamilies (110) and nuclease toxins such as members of the HNH, HipA, SNase, and BECR superfamilies (174) (Figures 5S, S10C). Remarkably, these proteins with the broken-hairpin tend to lack a signal peptide or association with any other secretion system or immunity proteins (Figure 5S, Supplemental Material). Hence, we propose that the broken-hairpin domain itself serves as a trafficking mechanism for the externalization of these toxins in conflicts with rival environmental organisms.
Some predicted secreted Lipocones are found predominantly in predatory bacteria. The first of these are CapCone-2 domains from lineages like Bdellovibrionota, which are encoded in two-gene systems, with the second gene coding for a further secreted effector such as an α/β-hydrolase, Patatin, or acyltransferase or an OMP β-barrel domain (110,116,117,123) (Figures 5T,S7). Myxobacteria and some other lineages code for secreted prok-SAA domains fused to a N-terminal Zincin-like metallopeptidase domain, and the first bacterial example of the von Willebrand Factor D (vWD) and Ig domains at the C-terminus (196) (Figure 5T, Supplemental Material). In the recently described predatory Patescibacterial branch of Omnitrophota species, Skillet-clade Lipocone domains are found in gigantic proteins combined with several other domains and TM segments. Domains found in these proteins include polysaccharide biosynthesis enzymes (94,97), signaling proteins involved in Histidine kinase-Receiver relays (197), peptidases of the MPTase and Papain-like superfamily (113,138), diverse methylases, and extracellular ligand-binding domains like the peptidoglycan-binding LysM domain (198) (Figures 5T,6). Given the concentration of the above systems in predatory bacteria (Supplementary Data), we posit that the above Lipocones might function as toxins targeting prey membranes alongside a battery of effectors targeting other cellular components. In particular, the CapCone-2 systems might play a role in the breaching of outer membranes by Bdellovibrionota. Animal vWD domains are involved in adhesion (199); hence, the bacterial versions might play a similar role in adhering to prey cells, while the MPTase in these proteins potentially releases the associated Prok-SAA toxin through autoproteolysis. Finally, the giant proteins from the Patescibacteria are likely to combine signaling prey presence with overcoming prey defenses and breaching prey membranes.
Certain prok-TelC proteins are observed as part of several distinctive systems that could be involved in as-yet-undiscovered predatory interactions or in targeting environmental competitors. One such, defined by large proteins from spore-forming Bacillota, combines a diversifying set of extracellular ligand-binding domains (e.g., Ig-like, Cell-wall-binding β-hairpins and β-propellers (146,147,200)) with a two-enzyme core formed by a prok-TelC and a N-acetylglucosamine (GlcNAc)-1-phosphodiester alpha-N-acetylglucosaminidase (NAGPA). NAGPA catalyzes phosphoric-diester hydrolysis to release phosphodiester-linked sugars (Figures 5U,6,S7) (201). Some of these proteins feature an additional NlpC/p60 superfamily peptidase domain predicted to target peptidoglycan (181). The recombinational diversity of ligand-binding domains in this system, even among closely related Bacillota species, supports a possible arms race and involvement in a biological conflict. Other TelC domains in some Bacillota, Actinomycetota, and fungi are fused to peptidoglycan-binding domains (PGBD) (202) and an Rv2525c-like TIM-barrel (203) (Figures 5U,S7). In Actinomycetota, this protein is further combined in operons with either of two mutually exclusive genes coding for rapidly evolving proteins (Figure 5U): (i) a secreted protein containing a pair of Ig domains (200); (ii) a 3-TM protein (3TM-CCDN) with two conserved cysteines, an aspartate and asparagine residues predicted to be located between the TM segments outside the cell. This version is further coupled to a gene for a secreted VanY superfamily peptidase (182) (Figure 5U, Supplementary Data). Common to these contexts are rapidly evolving and variable domains on the one hand and peptidoglycan/exopolysaccharide binding or degrading domains on the other. Hence, we interpret these as potential conflict systems that engage the cell wall and target it and associated membranes in rival bacteria.
Lipocone domains in resistance to antimicrobial agents
VanZ-1 proteins (Figure 1A) were initially identified as encoded by a gene linked to that coding for the VanY D-alanyl-D-alanine carboxypeptidase involved in resistance to glycopeptide antibiotics like vancomycin and teicoplanin (31,204–206) (Figure 6). These antibiotics bind the terminal D-Ala-D-Ala in the peptide moiety of peptidoglycan, preventing the transpeptidase cross-linking reaction necessary for its maturation. Upon detection of these antibiotics, enzymes encoded by the core vancomycin resistance operon re-engineer the exported peptidoglycan by inserting a D-Ala-D-Lac in place of the D-Ala-D-Ala linkage, precluding antibiotic binding (53). The VanY peptidase, while not strictly required for antibiotic resistance, acts as an accessory to this system by cleaving any remaining D-Ala-D-Ala linkages generated via the canonical pathway (53,205). However, the role of VanZ in this system has so far remained unknown. While only a small fraction of the VanZ-1 genes are found in these antibiotic resistance contexts (Supplementary Data), interestingly, other Lipocone genes, namely those of the VanZ-2 and the Skillet families, might also be linked to VanY in lieu of VanZ-1. Further, VanY might be replaced by a structurally unrelated secreted D-Ala-D-Ala carboxypeptidase of the metallo-beta-lactamase fold (111) in operonic contexts with VanZ-1 (Figure 5V). Hence, given our above prediction regarding VanZ acting in peptidoglycan and/or exopolysaccharide metabolism, VanZ-1 and the Lipocones displacing it might indeed play an accessory role with VanY at the membrane (204,205) in antibiotic resistance. We posit that, in these contexts, it likely acts on the head group of Lipid II to recycle canonical peptidoglycan intermediates for their accelerated or more thorough replacement with the resistant versions (Figure 3G).
We also identified a conserved five-gene operon featuring a YfiM-1 family Lipocone that might play a role in resistance to antibacterial agents (Figures 5W,S5). Other than YfiM-1, this operon contains genes for: (i) a thioredoxin domain protein (109); (ii) A DTW clade RNA modifying enzyme of the SPOUT superfamily (207,208); (iii) a protein with acyl-CoA ligase, GNAT superfamily N-acetyltransferase and ATP-grasp domains (131,209,210); (iv) a PssA-like phosphatidylserine synthetase of the HKD superfamily (211) (Figure 5W, Supplementary Data). Of these enzymes, the phosphatidylserine synthetase is predicted to act in its usual capacity to generate a lipid with a serine head group (211). We propose that this would then function as a substrate for the YfiM-1 Lipocone domain, which might exchange the serine for another moiety via a reaction paralleling PTDSS1/2 (Figure 3A). This moiety could then be modified by aminoacylation, further acylation and a redox modification by the third protein listed above, together with the thioredoxin. Indeed, such peptide modifications of lipid head groups by lysine, alanine, or arginine aminoacylation catalyzed by derived tRNA synthetases fused to GNATs have been shown to be a key resistance mechanism against breaching of the membrane by antibacterial peptides (212,213). Hence, we predict the modifications catalyzed by this system might play a comparable role. The presence of a tRNA-modifying DTW domain suggests that in parallel to the tRNA synthetases, the GNAT in this system might use a tRNA-linked acyl group as a substrate, as seen in peptidoglycan biosynthesis (214,215).
Eukaryotic recruitments of the Lipocone superfamily
Lipocone domains have been transferred on several occasions from bacteria to eukaryotes (Figures 4,S1). While there is predicted functional overlap with the above-described, predominantly bacterial versions, we discuss these separately as the inferred biological contexts of their deployment are often distinct from the above.
Plant YfiM-1 and eukaryotic VanZ-2 proteins
A conserved YfiM-1 family protein typified by the Arabidopsis AT1G15900 was acquired from the bacteroidetes lineage of bacteria at the base of the plant lineage prior to the chlorophyte-streptophyte (including land plants) split and is predicted to be catalytically active (Figure S2, Supplementary Data). In Arabidopsis, this gene is widely expressed across different tissue types, developmental stages, and other tested conditions (216,217). Given the above-predicted roles for bacterial YfiM-1 proteins, it is conceivable that the plant version plays a comparable role in the metabolism of a conserved plant-specific lipid. In a similar vein, a distinct clade of standalone VanZ-2 domains typified by the Saccharomyces cerevisiae YJR112W-A was acquired early in the fungal lineage. A similar transfer is also seen in the SAR clade of eukaryotes (Figure S1). Since these eukaryotes lack peptidoglycan and other bacterial-type isoprenoid lipid-borne exopolysaccharide intermediates, we suggest that this version was recruited for modifications of a fungus-specific lipid (e.g., highly oxygenated isoprenoid lipids) (218).
The Met-TelC proteins
The Met-TelC clade is comprised of versions of the TelC family with a reconfigured active site transferred from bacteria to Metazoa prior to the divergence of the cnidarians, and most members are predicted to be catalytically inactive (Figure 2). In cnidarians and arthropods, the Met-TelC domain is found in a secreted protein fused to C-terminal adhesion-related vWA (219) and Ig domains, followed by a TM helix (Figure 5X, Supplementary Data). The chordate version, typified by human PGLYRP2 (220), is also secreted and is fused to a C-terminal Amidase targeting the N-acetylmuramoyl-L-alanine linkage (Figure 5X, Supplementary Data). PGLYRP2 is a key innate immunity factor against bacterial pathogens degrading sugar-peptide linkages in peptidoglycan via the Amidase domain (221–223). As most Met-TelC proteins lack the active site residues but are modeled to retain the substrate-binding pocket, we propose that they participate in anti-bacterial immunity as a Pathogen-Associated Molecular Pattern (PAMP) receptor (224). Specifically, they could recognize polyisoprenoid pyrophosphate-linkage-containing lipid intermediates of bacterial cell-surface molecules like peptidoglycan or exopolysaccharides.
Eukaryotic Wnt proteins
Wnt family Lipocones were transferred on multiple occasions to eukaryotes. The best-known of these are Met-Wnt proteins, which were acquired from bacteria at the base of Metazoa after they had separated from their closest sister group, the choanoflagellates. These lost the ancestral active site residues and function as well-studied secreted signaling molecules and will not be detailed further in this work (for review, see (1,225)). Independently of the Met-Wnt proteins, catalytically active, secreted versions closely related to the bacterial Min-Wnt proteins were transferred to fungi and, within Metazoa, to the rotifers and the hemichordate acorn worm Saccoglossus kowalevskii, where they are lineage-specifically expanded (Figure S1, Supplementary Data). These versions are primarily standalone versions of the Min-Wnt domain, lacking the large inserts typical of the Met-Wnt proteins (Figure 1A). We predict that these eukaryotic Min-Wnt proteins retain their ancestral toxin role and might participate in anti-bacterial immunity.
Met-SAA proteins
Met-SAA proteins (Figure 1A) were acquired from bacteria prior to the divergence of the cnidarians from the rest of Metazoa. However, unlike the Met-Wnt and Met-TelC proteins, they often conserve the ancestral active site residues, indicating that they are usually enzymatically active (Figure 3). Human SAA has been recognized as a key immune marker that dramatically increases in blood during the Acute Phase Response (226). It has been reported to bind the E. coli outer membrane protein OmpA (227) and claimed to function as an opsonin in innate immunity (228). Like Met-TelC, but in contrast to Met-Wnts, Met-SAAs appear to have been lost or pseudogenized in several animal lineages (229–231) (Figure 3). This is consistent with an arms-race scenario in immunity and the development of pathogen resistance against the Met-SAAs, leading to loss. Keeping with an immune role for the Met-SAAs, we propose a catalytic function for the active versions in severing lipid head groups of outer-membrane lipids or of isoprenoid lipid carrier intermediates. Such action could also generate PAMPs that could explain the activation of neutrophil- and macrophage-based immunity by SAA (228). Pertinent to these observations, diverse OMP β-barrels have been linked to the translocation of polymorphic toxin domains across the outer membrane of target cells (232–234). Given this and the origin of Met-SAA from bacterial polymorphic toxin-related systems (Figure 4), its interaction with OmpA might help it cross over into the periplasmic space and act on maturing peptidoglycan or teichoic acid intermediates.
SAA was first reported as a component of secondary amyloid deposits (235), and its capacity to form amyloid fibrils upon protease cleavage was theorized as a potential PAMP activating the immune response (30). Indeed, bacteria produce their own secreted amyloids, such as Curli and Fap, believed to contribute to biofilm formation (236,237), and might be PAMPs recognized by animal immune systems (238). Further, other animal amyloids, such as the β-amyloid, have been proposed to play a role as physical barriers in immunity against bacteria (239). Thus, amyloid formation by protease cleavage (including potentially by bacterial proteases) may represent a second line of defense mediated by Met-SAA proteins.
Discussion
Early Evolution of the Lipocone superfamily
No single well-defined Lipocone clade is universally conserved across the three superkingdoms of Life (Figures 4, S1). However, the VanZ and Wok clades are both found across all major bacterial phyla (notwithstanding sporadic losses in certain lineages) and in some archaeal lineages (Figure S1). At the same time, the cpCone clade is found across most major archaeal lineages and is nearly universally conserved in the eukaryotes (absent in Ascomycota and some choanoflagellates) (Figure S1). Notably, the cpCone and Wok clades tend to group together in the profile-profile similarity network (Figure 1B). These observations suggest that at least a single version of the Lipocone superfamily was likely present in the Last Universal Common Ancestor (LUCA). The phyletic patterns suggest that the LUCA Lipocone gave rise to the VanZ/Wok precursor in the bacterial lineage on the one hand and the cpCONE clade via a circular permutation event in the archaeo-eukaryotic lineage on the other (Figure 4). Based on the features of these deep-branching clades, the LUCA version is inferred to feature a hydrophobic domain with a 4TM helix core, with the active site facing the outer leaf of the lipid bilayer (Figure 1C). Given that extant versions operate both on classic phospholipids and isoprenoid lipids, it is difficult to infer which of these might have been substrates for the LUCA version. It is not impossible that this early version had a generic specificity that became specialized in the descendant clades.
Subsequent diversification of the Lipocone domain
The early diversification of the Lipocone domain appears to have had different drivers in the two prokaryotic superkingdoms. The presence of an extensive repertoire of exopolysaccharides in the cell wall (peptidoglycan, teichoic acids), cell surface (e.g., ECA), and outer membrane (e.g., lipopolysaccharide), synthesized via isoprenoid lipid-linked intermediates, like lipid-II, was the primary driver in the bacterial superkingdom (240). Here, this diversification yielded 4 monophyletic groups: the VanZs, Wok, YfiM and Skillet (Figure 4). The deeper VanZ and Wok branches, which were likely recruited first for lipid-II-related functions, were probably the predecessors of the more restricted bacterial families with specialized functions. For instance, the emergence of the outer membrane in certain bacteria was potentially coupled with the origin of the YfiM-like clade (Figure 4). Similarly, our predictions suggest that within these clades, further diversification accompanied the acquisition of specialized functional roles in antibiotic resistance, secondary sensor roles in single and multicomponent signaling and lipoprotein processing. The interoperability of Lipocone domains on lipid carriers shared across different biosynthetic pathways (see above, Figures 3,S5-S6) appears to have been a key factor leading to this versatility.
In the ancestral archaeo-eukaryotic lineage, the absence of peptidoglycan and an apparently lower diversity of structures with exopolysaccharides was reflected in the lesser diversification of the Lipocone clades (Figure 4). There are open questions regarding the biochemical functions of the primary archaeo-eukaryotic Lipocone clade, the cpCONE. Although the eukaryotic cpCone PTDSS1/2 family has been shown to swap serine for ethanolamine or choline in lipid head groups (28,57), their archaeal counterparts remain uncharacterized. Archaea have their own lipid with a serine in the head group (archaeophosphatidyserine), but to date, its synthesis has been shown to depend on a patchwork of different CDP-alcohol phosphatidyltransferase enzymes (CaPs) in different archaeal species (58,241,242). While the CaPs are also integral membrane enzymes with a 6TM helix core, catalyzing comparable reactions as the Lipocones on lipid head groups in archaea and eukaryotes (242), they are evolutionarily unrelated. Nevertheless, we suggest that the archaeal cpCones, like their eukaryotic counterparts, could contribute to distinct, as yet uncharacterized, pathways for the generation of cell membrane phospholipids like archaeophosphatidylserine or those with other head groups.
Emergence of diffusible versions of the Lipocone domain and their repeated recruitment in biological conflicts
One of the remarkable aspects of the Lipocone superfamily is the loss of ancestral hydrophobicity in several families (Figure 1C), transforming them from integral membrane proteins to diffusible domains. While unexpected, such a transition in integral membrane enzymes acting on lipid substrates is not unprecedented. The PAP2 superfamily of integral membrane enzymes (e.g., diacylglycerol diphosphate phosphatase) (124,243) also contains several soluble versions (244) that appear to have emerged from an integral membrane ancestor (AMB and LA, unpublished observations). Most of the soluble Lipocone domains retain their active site conservation (Figure 2) and, at least in one experimentally characterized case, catalyze a comparable reaction as the TM version (29) (Figure 3B). The weight of the evidence presented here, including the profile-profile similarity network (Figure 1B), phyletic patterns (Figure S1), functional contexts (Figure 5-6), and the broadly shared structural features (Figures 1A, S2, S4), suggests that the loss of hydrophobicity occurred on a single occasion in the Lipocone superfamily, followed by diversification of these diffusible versions.
Our analysis of the diffusible Lipocone families reveals repeated recruitment as toxins/effectors in anti-viral and polymorphic toxin and allied systems (174), suggesting that their diversification was driven by the arms races arising from the biological conflicts where they are deployed. Recruitment of a representative of the VanZ-1 family as a polymorphic toxin on rare occasions (Figures 5N,6) suggests a possible evolutionary pathway for their recruitment as toxins: the effector version of Lipocones attacking lipids in competing bacteria likely emerged from an ancestral version that catalyzed endogenous lipid-head-group modifications on the same lipids in metabolic pathways. Once versions with reduced hydrophobicity emerged, they could be deployed as diffusible effectors that were shared across extracellular and intracellular conflict systems, a trend previously recognized in many other effector domains (25).
Repeated acquisition of Lipocones of bacterial origin by eukaryotes
Unlike bacteria, eukaryotes as a whole do not possess a rich repertoire of Lipocone domains. The PTDSS1/2 family, vertically inherited from the archaeal progenitor, is the only version that can be inferred as being present in the Last Eukaryotic Common Ancestor (Figure 4). However, distinct Lipocone families of ultimately bacterial provenance were acquired early and fixed in certain eukaryotic lineages: (i) YfiM-1 in the plant lineage; (ii) the fungal VanZ-2 domains typified by the Saccharomyces cerevisiae YJR112W-A; (iii) Met-Wnt (discussed further below) (Figure S1). The early fixation of these versions in the eukaryotic lineages possessing them suggests that they were recruited for definitive “housekeeping” or developmental in the respective lineages. Beyond these, the fungal and metazoan lineages show more sporadically distributed versions, which have all been acquired from bacterial secreted-toxin or antiviral systems: (i) Min-Wnt independently in fungi and certain Metazoa; (ii) SAA; (iii) TelC; the latter two are absent in the basal-most metazoans, the sponges, but are present in Cnidaria, suggesting a relatively early acquisition (Figure 4). The weight of the evidence suggests that they have retained certain aspects of the ancestral bacterial effector function for anti-pathogen immunity in eukaryotes. This is consistent with both their episodic loss and lineage-specific expansion, the tendency to show rapid sequence divergence and, in the Met-TelC family, loss of catalytic activity (Figures 2,S1, Supplementary Data).
This independent acquisition of at least 3 distinct Lipocone families in metazoan immunity from polymorphic and allied effector systems of prokaryotes points to a persistent evolutionary trend. Notably, the Lipocone domains participating in animal immunity have been drawn from secreted effectors rather than the intracellular versions (bacterial intracellular Min-Wnts) predicted to participate in bacterial anti-selfish element immunity. More generally, this adds to a growing list of components drawn from secreted effector systems of prokaryotes in eukaryotic immune systems (174,193,245). For example, this closely parallels another structurally unrelated effector domain, the Zn-dependent deaminase (e.g., metazoan AID/APOBEC deaminases) (246). Hence, these observations add further support to our hypothesis that the extensive expansion of effectors in diverse prokaryotic inter-organismal conflict systems served as a reservoir from which eukaryotic immune systems repeatedly acquired components (193,245). We propose that symbiotic associations between the early animals and bacteria resulted in potential interactions via secreted effectors of the latter that aided the former against antagonistic bacteria. This probably led to their eventual acquisition by animals and incorporation into their immune processes.
Origin of Wnt as a signaling molecule
Earlier considerations on the evolution of Wnt signaling indicated that it emerged at the base of the metazoan lineage and incorporated a wide range of components of different origins (e.g., the HMG domain transcription factor TCF/LEF, the HEAT repeat protein β-catenin and the 7TM receptor Frizzled) (1). However, the provenance of Met-Wnt itself had been mysterious and was seen as a possible example of a metazoan innovation (225). While the Met-Wnt domains possess peculiar structural elaborations (34), its conserved core is a Lipocone domain (Figure 1A). We establish that the progenitor of Met-Wnt emerged as part of the radiation of Lipocone domains in bacteria as effectors deployed in both intracellular and inter-organismal conflict – the Min-Wnt proteins.
Whereas the Min-Wnt proteins are predicted to be secreted toxins, the Met-Wnts underwent an ancestral inactivation through loss of the catalytic residues (Figure 2). However, they retained their ancient involvement in cell-cell interactions as secreted agents. The Met-Wnt residues recognized as essential for the receptor (Frizzled) binding, including the absolutely conserved palmitoleoylated serine residue, are found in the aforementioned Metazoa-specific hairpins and loops (34,36). However, despite their inactivation, the Met-Wnts retain the ancestral substrate-binding pocket (Figures 1A, S2). This raises the possibility that they might be involved in as-yet unexplored interactions with ligands such as lipids.
Our tracing of the provenance of Wnt back to an effector in secreted bacterial toxin systems adds it to a growing list of components in metazoan signaling networks that have been acquired from such systems. For instance, this is also the case with components of the other key metazoan signaling pathway, Hedgehog (247). Here, the Hedgehog protein itself contains an autoproteolytic HINT peptidase domain that was likely drawn from a structurally and functionally cognate domain observed in polymorphic toxin systems (174,247). Further, an intracellular component of the same signaling pathway, Supressor of Fused (SuFu), was derived from a common immunity protein found in polymorphic toxin systems (247). Similarly, the Teneurin/Odd Oz proteins mediating signaling in cell migration, neuronal pathfinding, and fasciculation in Metazoa descended from a polymorphic toxin protein with a C-terminal HNH endonuclease toxin tip (174). In a similar vein, the immunity protein of certain CapCone toxins identified in this study might have given rise to the β-sandwich domain in the eukaryotic centriolar assembly factor SAS-6. These observations suggest that, in addition to immune system components, interactions with symbiotic bacteria also potentially furnished the progenitors of components of eukaryotic signaling and cytoskeletal networks that were central to the emergence of Metazoa as a clade of multicellular eukaryotes (248,249).
Conclusions
Using sensitive sequence and structure analysis, we unify a large, hitherto unrecognized superfamily of enzymatic domains, the Lipocone. By combining analysis of the active site and the structure of the Lipocone domain with contextual information from conserved gene-neighborhoods and domain architectures, we present evidence that members of this superfamily target phosphate linkages in head groups of both classical phospholipids and polyisoprenoid lipids. Specifically, they catalyze reactions such as head group exchange or severing of the head group-diphosphate linkage from the polyisoprenol. We present evidence that these activities have been recruited in a wide range of biochemical contexts, including cell membrane lipid modification, metabolism of peptidoglycan and exopolysaccharide lipid-carrier linked intermediates, lipoprotein modifications, bacterial outer membrane modification, sensing of membrane-associated signals, effector activity in antiviral and inter-organismal conflicts and resistance to antimicrobials. Further, catalytically inactive versions like Met-Wnt have been recruited for signaling roles in Metazoa. We predict the catalytic activity and potential biochemical pathways of numerous representatives for the first time, including some proteins that have remained enigmatic for over two decades, like VanZ.
We identify three notable trends in Lipocone evolution. First, although we reconstruct the ancestral member of the superfamily as being a 4TM integral membrane domain, a large monophyletic subset underwent a dramatic loss of hydrophobicity, transforming them into diffusible versions, including the Wnts and the SAAs (Figure 1C). Second, the superfamily expanded in two major functional niches in bacteria, namely peptidoglycan/exopolysaccharide metabolism and effector domains of both secreted toxins and immune systems (Figure 4). Finally, members of the Lipocone superfamily were acquired on multiple occasions from bacteria by Metazoa and were reused in new functional contexts as signaling messengers and immune factors (Figure 4).
Importantly, our predictions in this regard underscore that much remains unexplored in terms of lineage-specific cell wall and membrane metabolism in prokaryotes. We present several testable biochemical, functional hypotheses for the many poorly understood branches of the superfamily, several of which are being recognized as enzymatic for the first time here. We hope this will also open new avenues of research to fill key gaps in our understanding of lipid metabolism.
Methods
Sequence analysis
Sequence similarity searches were performed using PSI-BLAST (250) and JackHMMER (251) against the NCBI non-redundant protein database (nr) (252) or a version clustered down to 50% sequence identity (nr50). The searches were initiated using the previously identified prokaryotic Wnt (24), with multiple rounds of searches conducted, each using seeds collected from the preceding searches. Clusters based on sequence similarity (percentage identity or bit-score) were generated using MMseqs (253). The clustering parameters were adjusted according to specific goals, enabling redundancy removal, the definition of homologous groups, and the creation of new profiles. Multiple sequence alignments (MSA) were generated using the MAFFT program (254) with the local-pair algorithm, combined with the parameters –maxiterate 3000, –op 1.5, and –ep 0.2, and were manually refined based on structural superpositions and profile-profile comparisons.
Sequence similarity network analysis
The HHalign program (255) was used to perform profile-profile comparisons, with the resulting p-value and e-value scores serving as edges for constructing a superfamily relationship network. This was then analyzed using the Leiden community finding algorithm (37) to detect sub-networks. Network analysis and visualization were performed using the R igraph (256) or Python networkX libraries (257).
Comparative genomics, domain identification, and phylogenetic analysis
Genomic neighborhoods were obtained from genomes available in the NCBI Genome database (252) using in-house scripts written in Perl and Python. Conservation analysis of these genomic neighborhoods was performed by clustering the protein products of neighboring genes. Domain identification was conducted using a collection of HMMs and PSSMs maintained by the Aravind lab, along with HMMs from the Pfam database (258), utilizing the RPSBLAST (259) and HMMSCAN (260) programs. To further refine detection, domain identification was extended through remote homology analysis using the HHpred (261) program, against profiles built from the Pfam (258) and PDB70 (262) databases. Phylogenetic analyses were performed using FastTree (263) and iqTREE2 (264). Experimental functional data for characterized members of the superfamily were collected with the assistance of the ChatGPT language model (https://chat.openai.com). Structural comparisons, along with shared genomic associations, were used to further refine the interrelationships within and between the groups of the superfamily.
Families with broader presence across multiple major lineages (“phyla”) and deeper conservation within each of those lineages were inferred to be more ancient. In contrast, those with a more limited phyletic spread and/or limited depth of occurrence within each major lineage were likely later derivations (Figures 4,S1). We formalized this inference by calculating a phyletic metric for the Lipocone clades (Figure S1) comprised of both the phyletic spread and depth. The phyletic spread 𝑆𝑖 of the ith Lipocone clade was computed thus:
Where 𝑚𝑖 is the number of lineages with at least one representative of the Lipocone clade 𝑆𝑖, and M is the total number of lineages examined. The phyletic depth 𝐷𝑖 of the ith Lipocone clade was computed as a weighted average of its occurrence within each lineage in the form of the mediant:
where nj is the number of species in lineage j with a Lipocone domain of the ith Lipocone clade and Nj is the total number of species sampled in lineage j. Si and 𝐷𝑖 are plotted as a bar graph with 𝑆𝑖 as its width and 𝐷𝑖 its height.
Contextual network construction
Each domain architecture and conserved gene neighborhood was decomposed into its constituent domains. These domains were then labeled for their biochemical function and stored as a YAML file (Supplementary Data). The contextual connections were then rendered as a graph with the domains as its nodes and the adjacency relationships as its edges. Cliques containing a given Lipocone domain were detected in this graph and merged to constitute their respective dense subgraphs. These subgraphs were then examined for the statistically significant prevalence of particular labeled functions using the Fisher exact test. Network analysis was performed using the functions of the R igraph or Python networkX libraries.
Structure analysis
Protein structures were modeled using Alphafold3 (265), with visualization and manipulation performed using either MOL* (266) or PyMOL. Structural similarity searches were conducted using the DALIlite (267) and FOLDSEEK (268) programs. DALIlite was also used to generate structural alignments.
Hydrophobicity analysis
To create the membrane propensity plots, for each protein Pi in a given family, we compute the average TM-propensity of its amino acids using the TM tendency scale (38). This score Hi for Pi is calculated as:
where hj is the TM tendency of the j-th amino acid in the protein Pi, and n is its total length in amino acids. The Kruskal–Wallis nonparametric test was applied to assess whether TM propensity scores differed across the 30 groups. As the Kruskal–Wallis test indicated a significant difference (p<0.05), we performed post-hoc pairwise comparisons using Dunn’s test with Bonferroni correction to control for multiple testing. Group-wise visualizations were presented using critical difference diagrams, where groups not connected by horizontal bars are significantly different (adjusted p<0.05) (Figure S3).
Acknowledgements
This research was supported by the Division of Intramural Research at the National Library of Medicine (NLM), National Institutes of Health (NIH). This research was supported in part by an appointment to the NLM Research Participation Program administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and the NLM.
Additional files
References
- 1.The dawn of developmental signaling in the metazoaCold Spring Harb Symp Quant Biol 74:81–90Google Scholar
- 2.Pygopus and the Wnt signaling pathway: a diverse set of connectionsBioessays 30:448–456Google Scholar
- 3.The development of wingless, a homeotic mutation of DrosophilaDev Biol 56:227–240Google Scholar
- 4.The Drosophila homolog of the mouse mammary oncogene int-1 is identical to the segment polarity gene winglessCell 50:649–657Google Scholar
- 5.Wingless a new mutant in Drosophila melanogasterDrosophila information service 134Google Scholar
- 6.Many tumors induced by the mouse mammary tumor virus contain a provirus integrated in the same region of the host genomeCell 31:99–109Google Scholar
- 7.Wnt genesCell 69:1073–1087Google Scholar
- 8.Wnt signal transduction pathwaysOrganogenesis 4:68–75Google Scholar
- 9.Cell division orientation and planar cell polarity pathwaysSemin Cell Dev Biol 20:972–977Google Scholar
- 10.Calcium signaling in vertebrate embryonic patterning and morphogenesisDev Biol 307:1–13Google Scholar
- 11.The Wnt signaling pathway in development and diseaseAnnu Rev Cell Dev Biol 20:781–810Google Scholar
- 12.Wnt signaling: relevance to beta-cell biology and diabetesTrends Endocrinol Metab 19:349–355Google Scholar
- 13.BMP, Wnt and Hedgehog signals: how far can they go?Curr Opin Cell Biol 12:244–249Google Scholar
- 14.The Frizzled family of unconventional G-protein-coupled receptorsTrends Pharmacol Sci 28:518–525Google Scholar
- 15.A new member of the frizzled family from Drosophila functions as a Wingless receptorNature 382:225–230Google Scholar
- 16.Monounsaturated fatty acid modification of Wnt protein: its role in Wnt secretionDev Cell 11:791–801Google Scholar
- 17.Post-translational palmitoylation and glycosylation of Wnt-5a are necessary for its signallingBiochem J 402:515–523Google Scholar
- 18.Dishevelled: The hub of Wnt signalingCell Signal 22:717–727Google Scholar
- 19.The Drosophila segment polarity gene dishevelled encodes a novel protein required for response to the wingless signalGenes Dev 8:118–130Google Scholar
- 20.Wnt/wingless signaling requires BCL9/legless-mediated recruitment of pygopus to the nuclear beta-catenin-TCF complexCell 109:47–60Google Scholar
- 21.Constitutive scaffolding of multiple Wnt enhanceosome components by Legless/BCL9eLife 6Google Scholar
- 22.How do they do Wnt they do?: Regulation of transcription by the Wnt/beta-catenin pathway.Acta Physiol (Oxf) 204:74–109Google Scholar
- 23.The evolution of the Wnt pathwayCold Spring Harb Perspect Biol 4:a007922Google Scholar
- 24.Identification of Uncharacterized Components of Prokaryotic Immune Systems and Their Diverse Eukaryotic ReformulationsJ Bacteriol 202Google Scholar
- 25.Discovering Biological Conflict Systems Through Genome Analysis: Evolutionary Principles and Biochemical NoveltyAnnu Rev Biomed Data Sci 5:367–391Google Scholar
- 26.Cloning of a Chinese hamster ovary (CHO) cDNA encoding phosphatidylserine synthase (PSS) II, overexpression of which suppresses the phosphatidylserine biosynthetic defect of a PSS I-lacking mutant of CHO-K1 cellsJ Biol Chem 272:19133–19139Google Scholar
- 27.A Chinese hamster cDNA encoding a protein essential for phosphatidylserine synthase I activityJ Biol Chem 266:24184–24189Google Scholar
- 28.Cloning and expression of murine liver phosphatidylserine synthase (PSS)-2: differential regulation of phospholipid metabolism by PSS1 and PSS2Biochem J 342:57–64Google Scholar
- 29.A broadly distributed toxin family mediates contact-dependent antagonism between gram-positive bacteriaeLife 6Google Scholar
- 30.Serum amyloid A - a reviewMol Med 24:46Google Scholar
- 31.The vanZ gene of Tn1546 from Enterococcus faecium BM4147 confers resistance to teicoplaninGene 154:87–92Google Scholar
- 32.Moderate-level resistance to glycopeptide LY333328 mediated by genes of the vanA and vanB clusters in enterococciAntimicrob Agents Chemother 43:1875–1880Google Scholar
- 33.Structural architecture and functional evolution of WntsDev Cell 23:227–232Google Scholar
- 34.Structural basis of Wnt recognition by FrizzledScience 337:59–64Google Scholar
- 35.structural Studies of Wnts and identification of an LRP6 binding siteStructure 21:1235–1242Google Scholar
- 36.Cryo-EM structure of human Wntless in complex with Wnt3aNat Commun 12:4541Google Scholar
- 37.From Louvain to Leiden: guaranteeing well-connected communitiesSci Rep 9:5233Google Scholar
- 38.An amino acid “transmembrane tendency” scale that approaches the theoretical limit to accuracy for prediction of transmembrane helices: relationship to biological hydrophobicityProtein Sci 15:1987–2001Google Scholar
- 39.Highly accurate protein structure prediction with AlphaFoldNature 596:583–589Google Scholar
- 40.Revealing the hidden functional diversity of an enzyme familyNat Chem Biol 10:42–49Google Scholar
- 41.Evolution of enzyme superfamiliesCurr Opin Chem Biol 10:492–497Google Scholar
- 42.Resilience of biochemical activity in protein domains in the face of structural divergenceCurr Opin Struct Biol 26:92–103Google Scholar
- 43.Purification and characterization of human phosphatidylserine synthases 1 and 2Biochem J 418:421–429Google Scholar
- 44.Immunochemical identification of the pssA gene product as phosphatidylserine synthase I of Chinese hamster ovary cellsFEBS Lett 395:262–266Google Scholar
- 45.Phosphatidylserine synthase-1 and -2 are localized to mitochondria-associated membranesJ Biol Chem 275:34534–34540Google Scholar
- 46.Topology of phosphatidylserine synthase 1 in the endoplasmic reticulum membraneProtein Sci 30:2346–2353Google Scholar
- 47.Biosythesis of the peptidoglycan of bacterial cell walls. II. Phospholipid carriers in the reaction sequenceJ Biol Chem 242:3180–3190Google Scholar
- 48.Structure of a lipid intermediate in cell wall peptidoglycan synthesis: a derivative of a C55 isoprenoid alcoholProc Natl Acad Sci U S A 57:1878–1884Google Scholar
- 49.Phosphatidic acid: a lipid messenger involved in intracellular and extracellular signallingCell Signal 8:341–347Google Scholar
- 50.Discovering the deep evolutionary roots of serum amyloid A protein familyInt J Biol Macromol 252:126537Google Scholar
- 51.Examination of the Clostridioides (Clostridium) difficile VanZ ortholog, CD1240Anaerobe 53:108–115Google Scholar
- 52.Specific Inhibition of VanZ-Mediated Resistance to Lipoglycopeptide AntibioticsInt J Mol Sci 23Google Scholar
- 53.Molecular mechanisms of vancomycin resistanceProtein Sci 29:654–669Google Scholar
- 54.The peptidoglycan recognition proteins (PGRPs)Genome Biol 7:232Google Scholar
- 55.Circular permutation in proteinsPLoS Comput Biol 8:e1002445Google Scholar
- 56.Guilt by association: contextual information in genome analysisGenome Res 10:1074–1077Google Scholar
- 57.Historical perspective: phosphatidylserine and phosphatidylethanolamine from the 1800s to the presentJ Lipid Res 59:923–944Google Scholar
- 58.Recent advances in structural research on ether lipids from archaea including comparative and physiological aspectsBiosci Biotechnol Biochem 69:2019–2034Google Scholar
- 59.Archaeal phospholipids: Structural properties and biosynthesisBiochim Biophys Acta Mol Cell Biol Lipids 1862:1325–1339Google Scholar
- 60.Archaeal lipidsProg Lipid Res 91:101237Google Scholar
- 61.Sorting of bacterial lipoproteins to the outer membrane by the Lol systemMethods Mol Biol 619:117–129Google Scholar
- 62.Molecular basis of SMC ATPase activation: role of internal structural changes of the regulatory subcomplex ScpABStructure 21:581–594Google Scholar
- 63.Kleisins: a superfamily of bacterial and eukaryotic SMC protein partnersMol Cell 11:571–575Google Scholar
- 64.Discovery of two novel families of proteins that are proposed to interact with prokaryotic SMC proteins, and characterization of the Bacillus subtilis family members ScpA and ScpBMol Microbiol 45:59–71Google Scholar
- 65.The many faces of the helix-turn-helix domain: transcription regulation and beyondFEMS Microbiol Rev 29:231–262Google Scholar
- 66.glc locus of Escherichia coli: characterization of genes encoding the subunits of glycolate oxidase and the glc regulator proteinJ Bacteriol 178:2051–2059Google Scholar
- 67.Functional interactions of HslV (ClpQ) with the ATPase HslU (ClpY)Proc Natl Acad Sci U S A 99:7396–7401Google Scholar
- 68.Lipid Cell Biology: A Focus on Lipids in Cell DivisionAnnu Rev Biochem 87:839–869Google Scholar
- 69.The role of lipid domains in bacterial cell processesInt J Mol Sci 14:4050–4065Google Scholar
- 70.Oxidative stress and lipotoxicityJ Lipid Res 57:1976–1986Google Scholar
- 71.A second Escherichia coli protein with CL synthase activityBiochim Biophys Acta 1483:263–274Google Scholar
- 72.Discovery of a cardiolipin synthase utilizing phosphatidylethanolamine and phosphatidylglycerol as substratesProc Natl Acad Sci U S A 109:16504–16509Google Scholar
- 73.Phosphoesterase domains associated with DNA polymerases of diverse originsNucleic Acids Res 26:3746–3752Google Scholar
- 74.Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymesJ Mol Biol 361:1003–1034Google Scholar
- 75.Biological functions of carotenoids--diversity and evolutionBiofactors 10:99–104Google Scholar
- 76.Carotenoids. HandbookPhotosynthetica 42:186Google Scholar
- 77.New functional assignment of the carotenogenic genes crtB and crtE with constructs of these genes from Erwinia speciesFEMS Microbiol Lett 69:253–257Google Scholar
- 78.Molecular cloning and expression in Escherichia coli of a cyanobacterial gene coding for phytoene synthase, a carotenoid biosynthesis enzymeFEBS Lett 296:305–310Google Scholar
- 79.Three monophyletic superfamilies account for the majority of the known glycosyltransferasesProtein Sci 12:1418–1431Google Scholar
- 80.Endogenous synthesis of coenzyme Q in eukaryotesMitochondrion 7:S62–71Google Scholar
- 81.The mechanism of glutamine-dependent amidotransferasesCell Mol Life Sci 54:205–222Google Scholar
- 82.Crystal structures of RidA, an important enzyme for the prevention of toxic side productsSci Rep 6:30494Google Scholar
- 83.The HD domain defines a new superfamily of metal-dependent phosphohydrolasesTrends Biochem Sci 23:469–472Google Scholar
- 84.Analysis of the Rhizobium meliloti genes exoU, exoV, exoW, exoT, and exoI involved in exopolysaccharide biosynthesis and nodule invasion: exoU and exoW probably encode glucosyltransferasesMol Plant Microbe Interact 6:735–744Google Scholar
- 85.Lipid Flippases for Bacterial Peptidoglycan BiosynthesisLipid Insights 8:21–31Google Scholar
- 86.Bioinformatics identification of MurJ (MviN) as the peptidoglycan lipid II flippase in Escherichia coliProc Natl Acad Sci U S A 105:15553–15557Google Scholar
- 87.The initial stage in peptidoglycan synthesis. IV. Solubilization of phospho-N-acetylmuramyl-pentapeptide translocase.Biochemistry 8:1474–1481Google Scholar
- 88.Identification of the glmU gene encoding N-acetylglucosamine-1-phosphate uridyltransferase in Escherichia coliJ Bacteriol 175:6150–6157Google Scholar
- 89.Characterization of the essential gene glmM encoding phosphoglucosamine mutase in Escherichia coliJ Biol Chem 271:32–39Google Scholar
- 90.Lipid intermediates in the biosynthesis of bacterial peptidoglycanMicrobiol Mol Biol Rev 71:620–635Google Scholar
- 91.Structural basis of lipopolysaccharide maturation by the O-antigen ligaseNature 604:371–376Google Scholar
- 92.The wzz (cld) protein in Escherichia coli: amino acid sequence variation determines O-antigen chain length specificityJ Bacteriol 180:2670–2675Google Scholar
- 93.ElyC and Cyclic Enterobacterial Common Antigen Regulate Synthesis of Phosphoglyceride-Linked Enterobacterial Common AntigenmBio 12:e0284621Google Scholar
- 94.Bacterial exopolysaccharides--a perceptionJ Basic Microbiol 47:103–117Google Scholar
- 95.Formation of the glycan chains in the synthesis of bacterial peptidoglycanGlycobiology 11:25–36Google Scholar
- 96.Repeat-Unit Elongations To Produce Bacterial Complex Long Polysaccharide Chains, an O-Antigen PerspectiveEcoSal Plus 11:eesp00202022Google Scholar
- 97.Enterobacterial Common Antigen: Synthesis and Function of an Enigmatic MoleculemBio 11Google Scholar
- 98.Bacterial cell wall. MurJ is the flippase of lipid-linked precursors for peptidoglycan biogenesisScience 345:220–222Google Scholar
- 99.Insight into Elongation Stages of Peptidoglycan Processing in Bacterial Cytoplasmic MembranesSci Rep 8:17704Google Scholar
- 100.The glycosyltransferase domain of penicillin-binding protein 2a from Streptococcus pneumoniae catalyzes the polymerization of murein glycan chainsJ Bacteriol 185:4418–4423Google Scholar
- 101.The lipid linked oligosaccharide polymerase Wzy and its regulating co-polymerase, Wzz, from enterobacterial common antigen biosynthesis form a complexOpen Biol 13:220373Google Scholar
- 102.Defining function of lipopolysaccharide O-antigen ligase WaaL using chemoenzymatically synthesized substratesJ Biol Chem 287:5357–5365Google Scholar
- 103.Bacterial carbohydrate diversity - a Brave New WorldCurr Opin Chem Biol 53:1–8Google Scholar
- 104.Diversity-Generating Machines: Genetics of Bacterial Sugar-CoatingTrends Microbiol 26:1008–1021Google Scholar
- 105.Identification of a Porphyromonas gingivalis novel protein sov required for the secretion of gingipainsMicrobiol Immunol 51:483–491Google Scholar
- 106.X-ray structure determination at 2.6-A resolution of a lipoate-containing protein: the H-protein of the glycine decarboxylase complex from pea leavesProc Natl Acad Sci U S A 91:4850–4853Google Scholar
- 107.Outer membrane active transport: structure of the BtuB:TonB complexScience 312:1396–1399Google Scholar
- 108.A novel outer membrane protein, Wzi, is involved in surface assembly of the Escherichia coli K30 group 1 capsuleJ Bacteriol 185:5882–5890Google Scholar
- 109.Structural classification of thioredoxin-like fold proteinsProteins 58:376–388Google Scholar
- 110.Bioinformatic analysis of alpha/beta-hydrolase fold enzymes reveals subfamily-specific positions responsible for discrimination of amidase and lipase activitiesProtein Eng Des Sel 25:689–697Google Scholar
- 111.Amino acid sequence of the penicillin-binding protein/DD-peptidase of Streptomyces K15. Predicted secondary structures of the low Mr penicillin-binding proteins of class ABiochem J 279:223–230Google Scholar
- 112.An evolutionary classification of the metallo-beta-lactamase fold proteinsSilico Biol 1:69–91Google Scholar
- 113.Papain-like peptidases: structure, function, and evolutionBiomol Concepts 4:287–308Google Scholar
- 114.One fold, many functions-M23 family of peptidoglycan hydrolasesFront Microbiol 13:1036964Google Scholar
- 115.Comparative genomics, evolution and origins of the nuclear envelope and nuclear pore complexCell Cycle 3:1612–1637Google Scholar
- 116.The versatile beta-barrel membrane proteinCurr Opin Struct Biol 13:404–411Google Scholar
- 117.The structural biology of beta-barrel membrane proteins: a summary of recent reportsCurr Opin Struct Biol 21:523–531Google Scholar
- 118.Sorting of lipoproteins to the outer membrane in E. coliBiochim Biophys Acta 1693:5–13Google Scholar
- 119.POTRA: a conserved domain in the FtsQ family and a class of beta-barrel outer membrane proteinsTrends Biochem Sci 28:523–526Google Scholar
- 120.Structure and function of an essential component of the outer membrane protein assembly machineScience 317:961–964Google Scholar
- 121.The plug domain of a neisserial TonB-dependent transporter retains structural integrity in the absence of its transmembrane beta-barrelFEBS Lett 564:294–300Google Scholar
- 122.The structure of TolB, an essential component of the tol-dependent translocation system, and its protein-protein interaction with the translocation domain of colicin E9Structure 8:57–66Google Scholar
- 123.Properties of the Group IV phospholipase A2 familyProg Lipid Res 45:487–510Google Scholar
- 124.Identification of a novel phosphatase sequence motifProtein Sci 6:469–472Google Scholar
- 125.The inositol polyphosphate 5-phosphatases and the apurinic/apyrimidinic base excision repair endonucleases share a common mechanism for catalysisJ Biol Chem 275:37055–37061Google Scholar
- 126.New biochemistry in the Rhodanese-phosphatase superfamily: emerging roles in diverse metabolic processes, nucleic acid modifications, and biological conflictsNAR Genom Bioinform 5:lqad029Google Scholar
- 127.Novel type of murein transglycosylase in Escherichia coliJ Bacteriol 124:1067–1076Google Scholar
- 128.Identification of MltG as a potential terminase for peptidoglycan polymerization in bacteriaMol Microbiol 99:700–718Google Scholar
- 129.Structure of cytochrome c nitrite reductaseNature 400:476–480Google Scholar
- 130.Functional diversity of the phosphoglucomutase superfamily: structural implicationsProtein Eng 12:737–746Google Scholar
- 131.The crystal structure of leucyl/phenylalanyl-tRNA-protein transferase from Escherichia coliProtein Sci 16:528–534Google Scholar
- 132.The stereoisomers of alpha epsilon-diaminopimelic acid. III. Properties and distribution of diaminopimelic acid racemase, an enzyme causing interconversion of the LL and meso isomersBiochem J 65:448–459Google Scholar
- 133.Lytic transglycosylases in macromolecular transport systems of Gram-negative bacteriaCell Mol Life Sci 60:2371–2388Google Scholar
- 134.PDZ Domains Across the Microbial World: Molecular Link to the Proteases, Stress Response, and Protein SynthesisGenome Biol Evol 11:644–659Google Scholar
- 135.Cloning, mapping, and characterization of the Escherichia coli prc gene, which is involved in C-terminal processing of penicillin-binding protein 3J Bacteriol 173:4799–4813Google Scholar
- 136.Tubular lipid binding proteins (TULIPs) growing everywhereBiochim Biophys Acta Mol Cell Res 1864:1439–1449Google Scholar
- 137.Remote homology searches identify bacterial homologues of eukaryotic lipid transfer proteins, including Chorein-N domains in TamB and AsmA and Mdm31pBMC Mol Cell Biol 20:43Google Scholar
- 138.X-ray structure of a hydroxamate inhibitor complex of stromelysin catalytic domain and its comparison with members of the zinc metalloproteinase superfamilyStructure 4:375–386Google Scholar
- 139.Crystal structure of Escherichia coli thioesterase I/protease I/lysophospholipase L1: consensus sequence blocks constitute the catalytic center of SGNH-hydrolases through a conserved hydrogen bond networkJ Mol Biol 330:539–551Google Scholar
- 140.A superfamily of membrane-bound O-acyltransferases with implications for wnt signalingTrends Biochem Sci 25:111–112Google Scholar
- 141.The cell envelope-associated phospholipid-binding protein LmeA is required for mannan polymerization in mycobacteriaJ Biol Chem 292:17407–17417Google Scholar
- 142.Of zones, bridges and chaperones - phospholipid transport in bacterial outer membrane assembly and homeostasisMicrobiology (Reading) 168Google Scholar
- 143.Involvement of water in carbohydrate-protein binding: concanavalin A revisitedJ Am Chem Soc 130:16933–16942Google Scholar
- 144.Ligand-mediated dimerization of a carbohydrate-binding molecule reveals a novel mechanism for protein-carbohydrate recognitionJ Mol Biol 337:417–426Google Scholar
- 145.OB(oligonucleotide/oligosaccharide binding)-fold: common structural and functional solution for non-homologous sequencesEMBO J 12:861–867Google Scholar
- 146.The immunoglobulin superfamily--domains for cell surface recognitionAnnu Rev Immunol 6:381–405Google Scholar
- 147.The many blades of the beta-propeller proteins: conserved but versatileTrends Biochem Sci 36:553–561Google Scholar
- 148.A database of bacterial lipoproteins (DOLOP) with functional assignments to predicted lipoproteinsJ Bacteriol 188:2761–2773Google Scholar
- 149.Lipid modification of bacterial prolipoprotein. Transfer of diacylglyceryl moiety from phosphatidylglycerolJ Biol Chem 269:19701–19706Google Scholar
- 150.The potential active site of the lipoprotein-specific (type II) signal peptidase of Bacillus subtilisJ Biol Chem 274:28191–28197Google Scholar
- 151.Crystal structures of the family 9 carbohydrate-binding module from Thermotoga maritima xylanase 10A in native and ligand-bound formsBiochemistry 40:6248–6256Google Scholar
- 152.Analysis of glycoside hydrolase family 98: catalytic machinery, mechanism and a novel putative carbohydrate binding moduleFEBS Lett 579:5466–5472Google Scholar
- 153.Carbohydrate-binding modules: fine-tuning polysaccharide recognitionBiochem J 382:769–781Google Scholar
- 154.Adaptations of the helix-grip fold for ligand binding and catalysis in the START domain superfamilyProteins 43:134–144Google Scholar
- 155.The crystal structure of Bacillus subtilis YycI reveals a common fold for two members of an unusual class of sensor histidine kinase regulatory proteinsJ Bacteriol 189:3290–3295Google Scholar
- 156.X-ray crystal structure of MTH938 from Methanobacterium thermoautotrophicum at 2.2 A resolution reveals a novel tertiary protein foldProteins 45:486–488Google Scholar
- 157.Human junctophilin-2 undergoes a structural rearrangement upon binding PtdIns(3,4,5)P3 and the S101R mutation identified in hypertrophic cardiomyopathy obviates this responseBiochem J 456:205–217Google Scholar
- 158.Ligand binding by TPR domainsProtein Sci 15:1193–1198Google Scholar
- 159.Phase variable O antigen biosynthetic genes control expression of the major protective antigen and bacteriophage receptor in Vibrio cholerae O1PLoS Pathog 8:e1002917Google Scholar
- 160.Three Capsular Polysaccharide Synthesis-Related Glucosyltransferases, GT-1, GT-2 and WcaJ, Are Associated With Virulence and Phage Sensitivity of Klebsiella pneumoniaeFront Microbiol 10:1189Google Scholar
- 161.Cyclic nucleotide-gated channels: an expanding new family of ion channelsProc Natl Acad Sci U S A 91:3481–3483Google Scholar
- 162.The FHA domain is a modular phosphopeptide recognition motifMol Cell 4:387–394Google Scholar
- 163.Ribbon-helix-helix transcription factors: variations on a themeNat Rev Microbiol 5:710–720Google Scholar
- 164.The phage shock protein (PSP) envelope stress response: discovery of novel partners and evolutionary historymSystems 9:e0084723Google Scholar
- 165.Structure of a putative ClpS N-end rule adaptor protein from the malaria pathogen Plasmodium falciparumProtein Sci 25:689–701Google Scholar
- 166.The reduction of beta-hydroxy-beta-methyl-glutaryl coenzyme A to mevalonic acidJ Biol Chem 235:2572–2578Google Scholar
- 167.Mevalonic kinase: purification and propertiesJ Biol Chem 233:1100–1103Google Scholar
- 168.Isoprenoid biosynthesis via the methylerythritol phosphate pathway: the (E)-4-hydroxy-3-methylbut-2-enyl diphosphate reductase (LytB/IspH) from Escherichia coli is a [4Fe-4S] proteinFEBS Lett 541:115–120Google Scholar
- 169.Crystal structures of undecaprenyl pyrophosphate synthase in complex with magnesium, isopentenyl pyrophosphate, and farnesyl thiopyrophosphate: roles of the metal ion and conserved residues in catalysisJ Biol Chem 280:20762–20774Google Scholar
- 170.RNA damage in biological conflicts and the diversity of responding RNA repair systemsNucleic Acids Res 44:8525–8555Google Scholar
- 171.Role of HPF (hibernation promoting factor) in translational activity in Escherichia coliJ Biochem 143:425–433Google Scholar
- 172.Ribosome dimerization is essential for the efficient regrowth of Bacillus subtilisMicrobiology (Reading) 162:448–458Google Scholar
- 173.Structure and Mechanism of the Lipid Flippase MurJAnnu Rev Biochem 91:705–729Google Scholar
- 174.Polymorphic toxin systems: Comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomicsBiol Direct 7:18Google Scholar
- 175.Evolution of the deaminase fold and multiple origins of eukaryotic editing and mutagenic nucleic acid deaminases from bacterial toxin systemsNucleic Acids Res 39:9473–9497Google Scholar
- 176.Polymorphic Toxins and Their Immunity Proteins: Diversity, Evolution, and Mechanisms of DeliveryAnnu Rev Microbiol 74:497–520Google Scholar
- 177.Intraspecies Competition in Serratia marcescens Is Mediated by Type VI-Secreted Rhs Effectors and a Conserved Effector-Associated Accessory ProteinJ Bacteriol 197:2350–2360Google Scholar
- 178.Structure and function of MARTX toxins and other large repetitive RTX proteinsAnnu Rev Microbiol 65:71–90Google Scholar
- 179.Phylogenomic analysis of the cystatin superfamily in eukaryotes and prokaryotesBMC Evol Biol 9:266Google Scholar
- 180.The Biology of Colicin M and Its OrthologsAntibiotics (Basel) 10Google Scholar
- 181.Evolutionary history, structural features and biochemical diversity of the NlpC/P60 superfamily of enzymesGenome Biol 4:R11Google Scholar
- 182.Sequence of the vanY gene required for production of a vancomycin-inducible D,D-carboxypeptidase in Enterococcus faecium BM4147Gene 120:111–114Google Scholar
- 183.The cystatins: protein inhibitors of cysteine proteinasesFEBS Lett 285:213–219Google Scholar
- 184.Cloning and sequence analysis of the gbpC gene encoding a novel glucan-binding protein of Streptococcus mutansInfect Immun 65:668–675Google Scholar
- 185.Crystal structure of a mucus-binding protein repeat reveals an unexpected functional immunoglobulin binding activityJ Biol Chem 284:32444–32453Google Scholar
- 186.Proteinaceous determinants of surface colonization in bacteria: bacterial adhesion and biofilm formation from a protein secretion perspectiveFront Microbiol 4:303Google Scholar
- 187.Structural and functional assays of AtTLP18.3 identify its novel acid phosphatase activity in thylakoid lumenPlant Physiol 157:1015–1025Google Scholar
- 188.The type VII secretion system of Staphylococcus aureus secretes a nuclease toxin that targets competitor bacteriaNat Microbiol 2:16183Google Scholar
- 189.Molecular Basis for Immunity Protein Recognition of a Type VII Secretion System Exported Antibacterial ToxinJ Mol Biol 430:4344–4358Google Scholar
- 190.Structure and function of BamE within the outer membrane and the beta-barrel assembly machineEMBO Rep 12:123–128Google Scholar
- 191.The reconstituted Escherichia coli Bam complex catalyzes multiple rounds of beta-barrel assemblyBiochemistry 50:7444–7446Google Scholar
- 192.Structural basis of the 9-fold symmetry of centriolesCell 144:364–375Google Scholar
- 193.Gene flow and biological conflict systems in the origin and evolution of eukaryotesFront Cell Infect Microbiol 2:89Google Scholar
- 194.A novel mechanism of programmed cell death in bacteria by toxin-antitoxin systems corrupts peptidoglycan synthesisPLoS Biol 9:e1001033Google Scholar
- 195.Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved coreNat Struct Biol 3:133–140Google Scholar
- 196.The von Willebrand factor D’D3 assembly and structural principles for factor VIII binding and concatemer biogenesisBlood 133:1523–1533Google Scholar
- 197.Histidine kinases and response regulator proteins in two-component signaling systemsTrends Biochem Sci 26:369–376Google Scholar
- 198.Interaction between coat morphogenetic proteins SafA and SpoVIDJ Bacteriol 188:7731–7741Google Scholar
- 199.The physical spacing between the von Willebrand factor D’D3 and A1 domains regulates platelet adhesion in vitro and in vivoJ Thromb Haemost 16:571–582Google Scholar
- 200.Structural basis for selective recognition of pneumococcal cell wall by modular endolysin from phage Cp-1Structure 11:1239–1249Google Scholar
- 201.Structure and function of the DUF2233 domain in bacteria and in the human mannose 6-phosphate uncovering enzymeJ Biol Chem 288:16789–16799Google Scholar
- 202.Structure of a Zn2+-containing D-alanyl-D-alanine-cleaving carboxypeptidase at 2.5 A resolutionNature 299:469–470Google Scholar
- 203.Structural studies suggest a peptidoglycan hydrolase function for the Mycobacterium tuberculosis Tat-secreted protein Rv2525cJ Struct Biol 188:156–164Google Scholar
- 204.Characterization of vanY, a DD-carboxypeptidase from vancomycin-resistant Enterococcus faecium BM4147Antimicrob Agents Chemother 36:1514–1518Google Scholar
- 205.Requirement of the VanY and VanX D,D-peptidases for glycopeptide resistance in enterococciMol Microbiol 30:819–830Google Scholar
- 206.Contribution of VanY D,D-carboxypeptidase to glycopeptide resistance in Enterococcus faecalis by hydrolysis of peptidoglycan precursorsAntimicrob Agents Chemother 38:1899–1903Google Scholar
- 207.Ribosome biogenesis factor Tsr3 is the aminocarboxypropyl transferase responsible for 18S rRNA hypermodification in yeast and humansNucleic Acids Res 44:4304–4316Google Scholar
- 208.Analysis of two domains with novel RNA-processing activities throws light on the complex evolution of ribosomal RNA biogenesisFront Genet 5:424Google Scholar
- 209.Two glutamate residues, Glu 208 alpha and Glu 197 beta, are crucial for phosphorylation and dephosphorylation of the active-site histidine residue in succinyl-CoA synthetaseBiochemistry 41:537–546Google Scholar
- 210.Amidoligases with ATP-grasp, glutamine synthetase-like and acetyltransferase-like domains: synthesis of novel metabolites and peptide modifications of proteinsMol Biosyst 5:1636–1660Google Scholar
- 211.Partial purification and properties of phosphatidylserine synthetase from Escherichia coliJ Biol Chem 249:5083–5045Google Scholar
- 212.Deciphering the tRNA-dependent lipid aminoacylation systems in bacteria: Novel components and structural advancesRNA Biol 15:480–491Google Scholar
- 213.Peptide antibioticsLancet 349:418–422Google Scholar
- 214.X-ray crystal structure of Staphylococcus aureus FemAStructure 10:1107–1115Google Scholar
- 215.In vitro assembly of a complete, pentaglycine interpeptide bridge containing cell wall precursor (lipid II-Gly5) of Staphylococcus aureusMol Microbiol 53:675–685Google Scholar
- 216.Dissection of floral induction pathways using global expression analysisDevelopment 130:6001–6012Google Scholar
- 217.The Arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genomeGenesis 53:474–485Google Scholar
- 218.Highly oxygenated isoprenoid lipids derived from fungi and fungal endophytes: Origin and biological activitiesSteroids 140:114–124Google Scholar
- 219.Crystal structure of the A domain from the alpha subunit of integrin CR3 (CD11b/CD18)Cell 80:631–638Google Scholar
- 220.Identification of serum N-acetylmuramoyl-l-alanine amidase as liver peptidoglycan recognition protein 2Biochim Biophys Acta 1752:34–46Google Scholar
- 221.Role of mouse peptidoglycan recognition protein PGLYRP2 in the innate immune response to Salmonella enterica serovar Typhimurium infection in vivoInfect Immun 80:2645–2654Google Scholar
- 222.Review: Mammalian peptidoglycan recognition proteins (PGRPs) in innate immunityInnate Immun 16:168–174Google Scholar
- 223.Mammalian PGRPs: novel antibacterial proteinsCell Microbiol 8:1059–1069Google Scholar
- 224.The plant immune systemNature 444:323–329Google Scholar
- 225.The origin and evolution of Wnt signallingNat Rev Genet 25:500–512Google Scholar
- 226.Induction of hepatic synthesis of serum amyloid A protein and actinProc Natl Acad Sci U S A 78:4718–4722Google Scholar
- 227.Serum amyloid A protein binds to outer membrane protein A of gram-negative bacteriaJ Biol Chem 280:18562–18567Google Scholar
- 228.Serum amyloid A is an innate immune opsonin for Gram-negative bacteriaBlood 108:1751–1757Google Scholar
- 229.Molecular analysis of the human serum amyloid A (SAA) gene familyScand J Immunol 29:113–119Google Scholar
- 230.Evolution of the serum amyloid A (SAA) protein superfamilyGenomics 19:228–235Google Scholar
- 231.Serum amyloid A1: Structure, function and gene polymorphismGene 583:48–57Google Scholar
- 232.Contact-dependent growth inhibition requires the essential outer membrane protein BamA (YaeT) as the receptor and the inner membrane transport protein AcrBMol Microbiol 70:323–340Google Scholar
- 233.Class II contact-dependent growth inhibition (CDI) systems allow for broad-range cross-species toxin delivery within the Enterobacteriaceae familyMol Microbiol 111:1109–1125Google Scholar
- 234.CdiA Effectors Use Modular Receptor-Binding Domains To Recognize Target BacteriamBio 8Google Scholar
- 235.The amino acid sequence of a major nonimmunoglobulin component of some amyloid fibrilsJ Clin Invest 51:2773–2776Google Scholar
- 236.Diversity, biogenesis and function of microbial amyloidsTrends Microbiol 20:66–73Google Scholar
- 237.Ecology and Biogenesis of Functional Amyloids in PseudomonasJ Mol Biol 430:3685–3695Google Scholar
- 238.Responses to amyloids of microbial and host origin are mediated through toll-like receptor 2Cell Host Microbe 6:45–53Google Scholar
- 239.Mechanistic insights into the role of amyloid-beta in innate immunitySci Rep 14:5376Google Scholar
- 240.Regulation of peptidoglycan synthesis and remodellingNat Rev Microbiol 18:446–460Google Scholar
- 241.Biosynthesis of ether-type polar lipids in archaea and evolutionary considerationsMicrobiol Mol Biol Rev 71:97–120Google Scholar
- 242.A study of archaeal enzymes involved in polar lipid synthesis linking amino acid sequence information, genomic contexts and lipid compositionArchaea 1:399–410Google Scholar
- 243.Roles of phosphatidate phosphatase enzymes in lipid metabolismTrends Biochem Sci 31:694–699Google Scholar
- 244.An unexpected structural relationship between integral membrane phosphatases and soluble haloperoxidasesProtein Sci 6:1764–1767Google Scholar
- 245.The Prokaryotic Roots of Eukaryotic Immune SystemsAnnu Rev Genet 58:365–389Google Scholar
- 246.Diversification of AID/APOBEC-like deaminases in metazoa: multiplicity of clades and widespread roles in immunityProc Natl Acad Sci U S A 115:E3201–E3210Google Scholar
- 247.A novel immunity system for bacterial nucleic acid degrading toxins and its recruitment in various eukaryotic and DNA viral systemsNucleic Acids Res 39:4532–4552Google Scholar
- 248.Highly regulated, diversifying NTP-dependent biological conflict systems with implications for the emergence of multicellularityeLife 9Google Scholar
- 249.Bacterial death and TRADD-N domains help define novel apoptosis and immunity mechanisms shared by prokaryotes and metazoanseLife 10Google Scholar
- 250.Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Res 25:3389–3402Google Scholar
- 251.Hidden Markov model speed heuristic and iterative HMM search procedureBMC Bioinformatics 11:431Google Scholar
- 252.Database resources of the national center for biotechnology informationNucleic Acids Res 50:D20–D26Google Scholar
- 253.MMseqs software suite for fast and deep clustering and searching of large protein sequence setsBioinformatics 32:1323–1330Google Scholar
- 254.MAFFT multiple sequence alignment software version 7: improvements in performance and usabilityMol Biol Evol 30:772–780Google Scholar
- 255.HH-suite3 for fast remote homology detection and deep protein annotationBMC Bioinformatics 20:473Google Scholar
- 256.The Igraph Software Package for Complex Network Research. InterJournalComplex Systems Google Scholar
- 257.Exploring Network Structure, Dynamics, and Function using NetworkXIn: Proceedings of the Python in Science Conference Google Scholar
- 258.The Pfam protein families database: towards a more sustainable futureNucleic Acids Res 44:D279–285Google Scholar
- 259.IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matricesBioinformatics 15:1000–1011Google Scholar
- 260.Accelerated Profile HMM SearchesPLoS Comput Biol 7:e1002195Google Scholar
- 261.The HHpred interactive server for protein homology detection and structure predictionNucleic Acids Res 33:W244–248Google Scholar
- 262.The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB dataNucleic Acids Res 35:D301–303Google Scholar
- 263.FastTree 2--approximately maximum-likelihood trees for large alignmentsPLoS One 5:e9490Google Scholar
- 264.IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic EraMol Biol Evol 37:1530–1534Google Scholar
- 265.Accurate structure prediction of biomolecular interactions with AlphaFold 3Nature 630:493–500Google Scholar
- 266.Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structuresNucleic Acids Res 49:W431–W437Google Scholar
- 267.Benchmarking fold detection by DaliLite v.5Bioinformatics 35:5326–5327Google Scholar
- 268.Fast and accurate protein structure search with FoldseekNat Biotechnol 42:243–246Google Scholar
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.108061. This DOI represents all versions, and will always resolve to the latest one.
Copyright
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Metrics
- views
- 386
- downloads
- 4
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.