Introduction

The canonical Wnt signaling network is central to developmental decisions across animals relating to axis patterning, cell fate, cell migration and proliferation, and systems morphogenesis at many levels (17). Other crucial pathways, dubbed non-canonical Wnt signaling pathways, include those that regulate planar cell polarity and intracellular calcium levels (810). With these roles in development and homeostasis, dysfunction of Wnt signaling is causally associated with a range of diseases, including diverse cancer types and type II diabetes (11,12). Wnt signaling networks are centered on the secreted Wnt proteins acting as both paracrine and autocrine diffusible, extracellular messenger molecules (13). Wnt proteins are ligands for the N-terminal, cysteine-rich CBD/Fz domains of the Frizzled class of G-protein coupled receptors (GPCRs) (14,15). Modifications of the Wnt proteins via palmitoleoylation and glycosylation at internal sites are associated with their secretion (16). Palmitoleoylation of Wnt occurs at a conserved serine residue and is also required for recognition by the Frizzled receptors (17). Binding of the Frizzled receptor by Wnt recruits the Disheveled (Dsh) protein to its cytoplasmic face, in turn triggering a bevy of downstream responses, resulting in β-catenin stabilization in canonical pathways (18,19). When β-catenin concentrations cross a threshold, it is translocated into the nucleus, where it acts as a transcriptional coactivator, usually with an HMG domain transcription factor, to stimulate multiple transcriptional programs (2022).

Despite its initial discovery over 40 years ago, the evolutionary origins of the Wnt protein have, until recently, been mysterious (23). In 2020, our group reported the discovery of the first prokaryotic versions of the Wnt domain (24). Using comparative genomics, we showed that these bacterial Wnt domains present contexts characteristic of toxins or effectors in biological conflict systems (24,25). Prompted by these initial observations, we set out to comprehensively identify and computationally characterize the evolutionary relationships of these newly identified Wnt homologs in an effort to understand their evolutionary history and predict their functions.

Consequently, we were able to unify the Wnt family with several other domains into a large superfamily described for the first time herein. These include two biochemically characterized families that were hitherto not known as Wnt homologs: the phosphatidylserine synthase (PTDSS1/2, EC: 2.7.8.29) (2628) and the toxin domain of TelC from Streptococcus intermedius (29). However, the majority of the families we unify are either reported for the first time or are functionally poorly understood, including the animal Serum Amyloid A (SAA) (30) and the vancomycin resistance protein VanZ families (31,32). Our comparative genomics analyses, paired with existing experimental evidence, suggest that the superfamily is broadly comprised of enzymes operating on lipid head groups (e.g., transesterification reactions) in a diversity of biochemical contexts, notably including the regulation of membrane composition, extracellular biopolymer metabolism and as effectors in biological conflicts. Thus, we identify a unifying theme across diverse aspects of lipid metabolism.

Results

Identification of the structural core of the Wnt domain

Although the structure of Wnt was described over a decade prior (3335), its origins have been a mystery as it is phyletically restricted to Metazoa. Much attention has been focused on the three extended β-hairpins and a poorly structured loop extruding out of the core, their stabilizing cysteine residues, and the absolutely conserved serine residue, the site of palmitoleoylation (34,36) (Figure 1A). Our discovery of the first prokaryotic Wnt domains helped define its ancestral α-helical core, revealing the cysteine-rich extensions as Metazoa-specific insertions. Comparison of the core of the metazoan Wnt with AlphaFold structural models of the prokaryotic versions (24) revealed a shared globular domain composed of five α-helices (Figure 1A). As the prokaryotic homologs retained just the conserved core of the Wnt proteins, we named these the minimal Wnt (Min-Wnt) family. The core helices of the Min-Wnt family contained absolutely conserved sequence motifs (Figure 2), consistent with the enzymatic function we had earlier proposed for them (24) (see below).

(A) The four individual helices forming the core of the Lipocone superfamily are consistently colored across the illustrated representatives. The inter-helix linkers are colored gray, and lineage-specific synapomorphic insertions and extensions are colored light brown. Active site and other residues of interest are rendered as ball-and-stick. Protein Data Bank (PDB) IDs or Genbank accessions used to generate AF3 models are provided. (B) Relationship network of the Lipocone families. The thickness of the edges is scaled by the negative-log HHalign p-values. Families are colored according to the community identified by the Leiden algorithm (37) (see Methods). (C) Box plots displaying core helix transmembrane propensity scores of individual sequences within different Lipocone families. The horizontal divider represents the boundary between typical TM and soluble sequences.

Sequence logo of conserved core elements of the Lipocone families.

These correspond to the core helices H2, H3, and H4. The three conserved active site residue positions are boxed in dotted lines with the inferred ancestral residue indicated at the top of the alignment. Families are grouped and labeled on the left in their higher-order clades.

Having defined this shared core, we initiated sequence-based homology searches in an effort to identify remote homologs. Iterative position-specific sequence matrix (PSSM)-based searches (see Methods) initially recovered animal and bacterial versions of the Serum Amyloid A (SAA) proteins, and further rounds of searching initiated from this set of sequences further recovered a vast collection of additional homologous families. As an example, a search initiated with a bacterial SAA-like sequence from Bdellovibrio bacteriovorus (Genbank acc: AHZ84906.1) retrieved a sequence overlapping with the Pfam models for “Domain of unknown function”, DUF2279 (acc: WP_146898260.1, iteration: 5, e-value: 0.004), DUF4056 domain (acc: MBW8016507.1, iteration: 5, e-value: 0.005), and sequences automatically annotated as “YfiM” in the GenBank database (acc: WP_019077413.1, iteration: 4, e-value: 0.004). Sequence profile-profile searches with HHpred confirmed these relationships and captured more distant ones. For instance, a HHpred search initiated with the Bacteriovorax stolpii Min-Wnt domain (acc: WP_102242990.1, residues 1-109) recovered the Pfam Wnt profile (PF: PF00110.23, p-value: 1.5e-6) and the Pfam SAA profile (PF: PF00277.22, p-value: 3.7e-5). Similarly a HHpred search initiated with a Gemmatimonadetes sequence (acc: PYP94660.1, residues 75..170) recovered the DUF2279 Pfam profile (PF: PF10043.12, p-value: 5.4E-21), the DUF2238 profile (PF: PF09997.12, p-value: 1.3E-07), and the DUF4056 Pfam profile (PF: PF13265.9, p-value: 2.5E-05), among others.

Exhaustion of these searches, followed by clustering and manual inspection of the multiple sequence alignments of the retrieved sequences (see Methods), revealed a shared four-helix core across all of them, hereinafter referred to as H1 through H4 (Figure 1A). This 4-helix core of the domain was further confirmed by inspection of AlphaFold structural models constructed for representatives of the individual families, along with the rare instances of experimentally determined structures. These comparisons established that the above-mentioned fifth C-terminal helix in the Wnt core is a synapomorphy (shared derived character) restricted to the Wnts and closely related families like SAA (Figure 1A). In all, the results of our clustering analysis tallied 30 distinct families constituting a large superfamily. Remarkably, of these, 17 families had no pre-existing annotations. Phyletic analysis of individual families revealed a range of distributions, ranging from broad conservation in multiple superkingdoms of Life to those restricted to a small number of lineages (see below, Figure S1). A relationship network for the superfamily was constructed based on p-value and e-value scores using alignments of each family as a query in HHalign profile-profile searches against the rest (see Methods, Figure 1B). The Leiden community detection algorithm (37) was then applied to this network to identify higher-order assemblages (see Methods). These groupings were also supported by structural synapomorphies, such as a circular permutation and versions with a two-stranded ‘handle’ (see below).

The four helices conserved across the superfamily constitute a cone-like structure (Figure 1A), with the helices tending to coalesce on one end and opening out into a pocket on the other, lined by the conserved sequence positions (Figures 1A, S2). The core is also marked by a linker between H1 and H2, which adopts characteristic extended conformations in certain families and higher-order groups. While the linkers joining H2 and H3 and H3 and H4 tend to be more constrained, there are some exceptions; for example, the extended loop insert housing the palmitoleoylated serine residue between H2 and H3 in the metazoan Wnt family (Figure 1A).

Dramatic variability in hydrophobicity of the conserved core across the superfamily

We observed that these Wnt-related families dramatically varied in their hydrophobicity. Using an index for transmembrane propensity (38) (see Methods) and comparing that to known transmembrane (TM) segments, we predict that the α-helices in 18 of the 30 families are hydrophobic enough to qualify as TM domains, and show a statistically significant tendency to group to the exclusion of the other families (Figure 1C, S3). Thus, these are predicted to be integral membrane domains. Further, these ‘hydrophobic families’ often evince a broader and deeper phyletic distribution pattern than the less-hydrophobic families (Figure S1, methods), implying that the ancestral version of the superfamily was likely an integral membrane domain. Thus, their association with the lipid membrane, combined with the cone-like shape of the conserved core (Figure 1A), leads us to refer to the whole superfamily hereinafter as the Lipocone superfamily.

Alphafold3-assisted transmembrane topology prediction (39) revealed that 14 of the 17 integral membrane families are consistently oriented with the aperture of the cone-like structure opening toward the outer face of the membrane. This predicted TM topology is also generally consistent with the domain fusions when present: e.g., domains that are typically cytoplasmic and those that have extracellular or periplasmic functions are respectively predicted as projecting either inside or outside the membrane (see below). However, three families in the cpCone clade (see below) did not yield consistent orientation predictions, potentially owing to the diversity of structural variations observed in the clade, including a circular permutation event.

A unified biochemistry for the Lipocone superfamily

Of the 30 identified families, 26 display a striking conservation pattern of polar residues associated with the pocket of the Lipocone domain (Figure 2, S2). Of these, a set of three positions, one mapping to each of H2, H3, and H4, can be inferred as being ancestrally present and were likely occupied by a histidine (H2), glutamate (H3), and aspartate (H4), though in some families their identities have secondarily changed (Figures 2, S2). A fourth well-conserved polar position is observed at or near the end of H3; while its ancestral identity is difficult to establish, it is frequently an aspartate or glutamate (Figure 2). Two further well-conserved positions are often seen in H4: a polar position downstream of the broadly conserved aspartate residue and a glycine residue near the C-terminus of H4 (Figure 2) that likely caps the said helix. Although the ancestral pattern is noticeably degraded in the metazoan Wnt (Met-Wnt) family, it is strongly preserved in the prokaryotic Min-Wnt family (Figure 1A). In experimentally determined and modeled structures, the above set of 4 conserved positions forms a predicted active site in the aperture of the Lipocone domain. This, in turn, implies a shared biochemistry across the superfamily, with secondary inactivation in some families like Met-Wnt (see below, Figure 2). At the same time, the differences in the specific residues in the conserved positions between different families point to a range of distinct but related activities across the superfamily (4042).

Consistent with these observations, two of the families with intact active sites, the PTDSS1/2 (28,43) and TelC (29), which we identified in this work as members of the Lipocone superfamily, have been characterized as active enzymes operating on different lipid substrates (Figure 3A). The eukaryotic PTDSS1/2 localizes to the endoplasmic reticulum (ER) membrane and catalyzes a reaction on the polar head group of phosphatidylcholine or phosphatidylethanolamine (4446). PTDSS1 and PTDSS2, respectively, exchange the phosphate-linked choline or ethanolamine head groups with L-serine (28) (Figure 3A). The toxin domain of TelC acts on lipid II (29), the final intermediate in peptidoglycan biosynthesis, which couples an undecaprenyl diphosphate tail to a head group comprised of a N-acetylmuramic acid-N-acetylglucosamine disaccharide, with a pentapeptide further linked to the former sugar (47,48). TelC cleaves the bond between the undecaprenol and the diphosphate coupled to the head group (29) (Figure 3B). The reaction is comparable to that catalyzed by PTDSS1/2, as both attack phosphate linkages in lipid head groups. However, TelC apparently directs a water molecule for the attack in lieu of the hydroxyl group of serine directed by PTDSS1/2 (Figure 3B).

Known and predicted Lipocone reaction mechanisms.

Experimentally supported reactions are boxed in blue (A-B), while a predicted reaction based on genome displacement by a Lipocone domain of an experimentally characterized enzyme is boxed in orange (C). The remaining reactions (D-G) are suggested based on the contextual inferences in this work. Attacking and leaving groups are denoted by dashed green and red circles, respectively.

Combining the above observations, we infer the unified biochemistry for the catalytically active families thus: 1) They act on the head groups of lipids either by removing or swapping phosphate-linked head groups (Figure 3A-B). These would be comparable to the phospholipase D (PLD), transphosphatidylation or polyisoprenol phosphoesterase reactions (49). 2) Given the cone-like cavity and the hydrophobicity of the helices, the lipid tail is predicted to be housed within the lipocone with the head group positioned in the active site. 3) In the case of the integral membrane versions, their orientation would predict the targeting of the head groups of the outer leaf of the bilayer.

Major clades of the Lipocone superfamily

The extreme sequence divergence of the superfamily, coupled with the small size of the domain, prevents the use of simple phylogenetic tree analyses to resolve its deep evolutionary history. Hence, we combined community finding algorithms applied on profile-profile similarity networks, comparison of structural features and motifs, and phyletic patterns (Figures 1B, 2, S1) to reconstruct the most parsimonious evolutionary scenario for the diversification of the Lipocone superfamily (Figure 4, see Methods). In the below sections, we survey the higher-order clades, highlighting their specific features.

Reconstructed evolutionary scenario for the Lipocone superfamily.

The relative temporal epochs are demarcated by vertical lines and labeled at the bottom. The clades are represented by colored lines indicating the maximum depth to which the families listed to the right can be traced. Colors track the superkingdom-level phyletic distribution of the family. Dashed-line circles indicate uncertainty in the origin of lineage(s). Inferred or experimentally characterized functions for families are indicated to the left of family names. Asterisks denote newly described families.

SAW (SAA-Wnt) clade

This clade consists of four families, with the two prokaryotic families (Min-Wnt and prok-SAA) (24,50), respectively, giving rise to their counterpart eukaryotic families (Met-Wnt and Met-SAA; Figures 1A,4). This clade is structurally unified by the presence of a fifth helix that stacks in the space between the H2 and H4 helices (Figure 1A, S2). In the Wnt families, this helix is comparable in length to the core helices, while in the SAA families it is usually shorter (Figure 1A). The clade is further unified by the pronounced conservation of a sNxxGR motif (where ‘s’ is a small residue) encompassing the conserved active site position in H4 (Figure 2). SAW clade Lipocones show low overall hydrophobicity and are known or predicted to be soluble domains. Outside of the clearly inactive eukaryotic Wnt family, the remaining three families largely conserve the core active site residues (Figure 2).

VanZ-Skillet clade

This clade unites seven families: the two VanZ families, VanZ-1 and VanZ-2, prototyped by the bacterial VanZ protein originally identified in the context of vancomycin resistance and the five Skillet families, which form a distinct subclade. These are unified by a “handle”-like structure (hence, “Skillet”), adopting a helical conformation in the H1-H2 linker (Figure 1A, S2). Strikingly, a symmetric helical handle is present in the H3-H4 loop of the Skillet-DUF2809 and Skillet-3 families (Figure S2) of this clade. VanZ-1 features a conserved asparagine residue in the H2 position and a DxDDxxxN motif in H4, while VanZ-2 features RKxxH and DxxxD motifs in these respective positions (Figure 2). The Skillet families are largely unifiable in their conservation of an ExxQ motif in H3, an aspartate three positions upstream of the canonical H4 aspartate, and another aspartate in the H2 contributing to the active site. These first two features specifically ally them with the VanZ-1 family (Figure 2).

While the VanZ domain was previously reported as including a fifth TM helix, which is C-terminal to the 4-helix Lipocone core defined here (51,52), our survey instead reveals a striking diversity of configurations around the core 4-helix Lipocone domain (Supplemental Material). These range from standalone Lipocone configurations to one or more TM-helices adorning the domain at its N- and/or C-terminus. This variation is consistent with a further tendency for the VanZ families to feature an extensive diversity of domain fusions to both soluble globular domains and discrete TM modules (see below).

The VanZ families are deep-branching, as suggested by their wide phyletic spread (Figure S1). VanZ-2 is the most widespread individual Lipocone family in bacteria, with several genomes encoding multiple paralogs (Supplemental Material) (51,53). It is also found in certain eukaryotes, including a pan-fungal presence and in some representatives of the SAR clade. Both VanZ-1 and VanZ-2 are particularly well-represented in Gram-positive bacterial lineages like Actinomycetota and Firmicutes, while VanZ-2 is nearly universally conserved in the Bacteriodetes/Chlorobi lineage (Figure S1). In contrast, only one of the Skillet families, Skillet-DUF2809, is widely but sporadically distributed, with the four others being more restricted (Figures 4, S1).

YfiM clade

This clade includes three families that are consistently centrally located in the profile-profile similarity network (Figure 1B). This is likely due to their being close in sequence conservation to the ancestral state of the superfamily (Figure 2). Consistent with this, the YfiM-1 family also presents a structurally minimal Lipocone domain, comprised of just the 4-helix configuration with no further elaborations. Notably, this also extends to a lack of domain fusions in this family. In contrast, YfiM-DUF2279 and YfiM-Griddle (DUF3943) are structurally distinguished by an unusual H1-H2 linker (Figure 1A), which wraps around the outside and stacks against the H3-H4 linker (Figure S2). The YfiM-Griddle family further features a unique ‘flattened’ surface around the aperture of the Lipocone formed by protruding loops (hence, “Griddle”; Figure S2). This leaves the active site pocket more accessible relative to families with more elaborately structured inter-helix linkers. The Griddle family also features a C-terminal extension with a two-helix hairpin (with a hhsP motif in the turn between the two helices, where ‘h’ is a hydrophobic residue and ‘s’ is a small residue) (Figure S2). The three YfiM families straddle the membrane-propensity boundary in the plot (Figure 1C). Further, the YfiM-DUF2279 and Griddle families are strikingly absent in Gram-positive bacterial lineages (Figure S1). Concurrent with these features, they are often predicted by the deep-learning-based localization predictor deepTMHMM as outer-membrane proteins, suggesting a role in this subcellular location (see below).

ClaspCone-CapCone-TelC clade

Members of this clade are unified by an elaborated H1-H2 linker that often contains one or more helical segments that are typically predicted to guard the aperture of the Lipocone domain (Figure S2). This linker ends in a “clasp”-like element, which forms a range of structures in different families of the clade before leading into H2 (Figure S2). The clade is also unified by a striking reduction of overall hydrophobicity, predicting that the members of this clade are soluble domains (Figure 1C). Outside of the divergent TelC subclade, most of the families in this clade conserve a serine residue three positions upstream of the active site aspartate in H4, often preceded by an aromatic residue, which is typically phenylalanine. H4 also usually features a conserved asparagine four positions downstream of the conserved aspartate active site position, immediately preceded by a small residue (Figure 2). The second H3 active site position is generally poorly conserved, though when present, it is usually an aspartate residue. Finally, H2 contains either a DK or xD motif four positions upstream of the canonical H2 active site histidine residue (Figure 2).

The most rudimentary clasps are found in the ClaspCone-1, -2, and -3 families, where it is little more than a rounded loop, though, in ClaspCone-1, a small β-hairpin emerges within it. The three ClaspCone families are further unified by the presence of a two-helix insert leading into H2 that stacks against the Lipocone core (Figure S2). The three CapCone families, CapCone-DUF4056, CapCone-1, and CapCone-2, are named so for an encasing structure over the active site resembling a cap (Figure S2). They share a conserved glycine residue six positions upstream of the active site H2 histidine and a S/GxxSxx motif upstream of the conserved H4 aspartate (Figure 2). They are further unified by a pronounced β-hairpin clasp augmented by an additional strand (Figure S2). They also display varying degrees of degeneration of H1, along with family-specific structural elaborations.

The TelC group of this clade, prototyped by the streptococcal TelC toxin (29), is divided into two families featuring prokaryotic (prok-TelC) (29) and metazoan versions (Met-TelC) (54). Both TelC families feature a “cap” with contributions from inserts in the H1-H2 and H3-H4 loops (Figures 1A,S2). Unique to these families is the conservation of an aspartate residue located six positions downstream of the canonical active site aspartate of H4 (Figure 2). This aspartate points away from the center of the Lipocone and interacts with a conserved arginine from a synapomorphic C-terminal helical extension.

cpCone clade

A widespread yet sporadically distributed clade of seven families emerging as a stable community in the profile-profile similarity network (Figure 1B) is defined by a unique structural synapomorphy: a circular permutation (55) (hence, cpCone) placing the normally N-terminal H1 at the C-terminus of H4 (Figure S2, S4). This clade is also united by unique sequence features, viz., a polar residue (typically aspartate) six positions upstream of the conserved H2 histidine and a second glutamate three positions downstream of the conserved H3 glutamate (Figure 2). While the circular permutation is shared across the clade, several structural variations are seen, often within the same family (Figure S4). These include: 1) versions containing a duplication of the Lipocone domain. While the second copy in these versions is catalytically inactive, the H1’ from the second duplicate displaces the H1 from the first copy, suggestive of an intermediate to the circular permutation. 2) Versions retaining a candidate H1 that has been displaced by H1’ in a five-helix arrangement. 3) Those containing just the circularly permuted core. 4) Versions showing a degradation of the H1 helix, preserving just a 3-helix core (Figure S4). Despite this propensity for structural variation, the active site residues are strongly conserved, with the exception of the cpCONE-i family, which we infer to be catalytically inactive (Figure 2). The core helices of the cpCone clade are strongly hydrophobic, and they are all predicted to be integral membrane domains (Figure 1C). Consistent with this, the eukaryotic PTSSD1/2 domains reside in the ER membrane (44,45).

Wok family

The Wok family (partly covered by the Pfam DUF2238 model) shows a higher order grouping with the above circularly permuted clade (Figures 1B,4) but has a phyletic distribution only rivaled by the VanZ-2 family (Figure S1), suggesting a deep-branching origin. The shape of this family is reminiscent of a wok formed by two distinguishing structural synapomorphies: a 2-TM helix N-terminal extension and a unique “handle” formed by the linker between the H3 and H4 (Figure S2). It additionally features a C-terminal, rapidly diversifying cytoplasmic tail. Despite these elaborations, it retains the inferred ancestral active site configuration (Figure 2). The strongly hydrophobic core helices of the Wok family predict it to be an integral membrane enzyme (Figure 1C).

Functional themes in the Lipocone superfamily

Given our inference of shared general biochemistry across the Lipocone superfamily in targeting phosphate-containing linkages in head groups of both classic phospholipids and polyisoprenoid lipids, we next used contextual information from conserved gene-neighborhoods, domain architectures and phyletic pattern vectors, a powerful means of deciphering gene function (56), to narrow down the predictions for specific families (Figure 5, Table S1, Supplementary Data). To this end, we constructed a graph (network) wherein the nodes are individual domains and edges indicate adjacency in domain architectures or conserved gene-neighborhoods (Figure 6, see Methods). We then identified cliques in these networks and merged the individual cliques containing a particular Lipocone domain to define its dense subgraph (Figures S5-S7). We then analyzed these subgraphs to identify statistically significant functional categories represented in them (Table S2; see Methods). This data was combined with existing experimental results and the sequence and structure analyses outlined above to arrive at the functional themes surveyed in the below sections.

Representative contexts for the Lipocone superfamily, grouped by shared functional themes.

Genes are depicted by box arrows, with the arrowhead indicating the 3’ end of genes. Genes encoding proteins with multiple domains are broken into labeled sections corresponding to them. Domain architectures are depicted by the individual domains represented by distinct shapes. TMs, lipoboxes (LPs), and SPs are depicted as unlabeled, narrow yellow, blue, and red rectangles, respectively. All Lipocone domains are consistently colored in orange. Genes marked with asterisks are labeled by the Genbank accession number below each context. Colored labels above genes denote well-known gene names or gene cluster modules. Abbreviations: PTase, peptidase; TFase, transferase; GlycosylTFase, Glycosyltransferase; MPTase, metallopeptidase; TGase, transglycosylase; SLP, serine-containing lipobox; cNMPBD, cNMP-binding domain; NCPBM, novel putative carbohydrate binding module; (w)HTH, (winged) helix-turn-helix; ZnR, Zinc ribbon; PPTs, pentapeptide repeats; Imm, immunity protein; βPs, β-propeller repeats; Cystatin-FD, Cystatin fold domain; MTase, methylase; PGBD, peptidoglycan-binding domain; MβL, metallo-β-lactamase; L12-ClpS, ClpS-ribosomal L7/L12 domain; TA, teichoic acid.

Lipocone contextual network.

The network represents the conserved contextual associations of Lipocone domains (hexagonal nodes). Nodes and edges are colored based on known or inferred functional categories of the domains. The nodes are scaled by their degree. Gray coloring indicates domains without specific functional assignments. Examples of conserved gene neighborhoods and domain architectures supplementing those in Figure 5 illustrate contexts that bridge functional themes. Here, individual domains are colored to match network coloring. Additional abbreviations to those in Figure 5: APH-Pkinase, aminoglycoside phosphotransferase-like kinase; HUP, HIGH, UspA and PP-ATPase superfamily-like domain; Alk-phosphatase, Alkaline phosphatase; dehyd, dehydrogenase; TPRs, tetratricopeptide repeats; PMM/PGM, phosphomannomutase/phosphoglucomutase; ZnF, zinc finger; APC-transporter, amino acid-polyamine-organocation transporter; LPS, lipopolysaccharide.

Lipocone domains in membrane lipid, peptidoglycan and exopolysaccharide modifications

Across different Lipocone families, we found statistically significant connections to roles in modifying lipid head groups in various membranes and in lipids involved in the synthesis of extracellular matrix polymers such as peptidoglycan and lipopolysaccharides (Figure 6, Table S2, Supplementary Data).

Archetypal lipid head group exchange reactions catalyzed by the cpCone clade

One of the few experimentally characterized Lipocone families is the eukaryotic PTDSS1/2 family of the cpCone clade, members of which exchange the head group of essential membrane phospholipids to generate phosphatidylserine from phosphatidylethanolamine or phosphatidylcholine (Figure 3A) (28,57). Given the pervasive presence of this clade in archaea (Figure S1), it is thus tempting to speculate that these archaeal cpCones may play a role in the modification of Archaea-specific lipids (5860) through a comparable head group exchange reaction (see below).

In bacteria, the related cpCone-1 family shows operonic association with a LolA-like lipoprotein which shuttles lipoproteins to the outer membrane (61) and a novel 4TM protein (Figure 5A). This raises the possibility that cpCone-1 might mediate the formation of membrane domains featuring lipids with a modified head group that act as foci for the trafficking of lipoproteins. Curiously, the cpCone-1 gene might also be inserted between the bacterial chromosome segregation and condensation complex subunits the Kleisin ScpA and the wHTH ScpB (6265). The bacterial cpCONE-DUF2585 is operonically coupled to a GNAT family NH2-group-acetyltransferase and further linked to genes for the glycolate oxidase GlcE and GlcF (66) and the bacterial proteasome subunits HslV and HslU (67) (Figure 5A, Supplementary Data). These might point to the coupling of membrane lipid head group modifications with disparate processes, such as chromosome segregation during cell division or different responses to stress (6870).

The Wok and YfiM-1 families in cardiolipin and modified isoprenoid lipid pathways

We observed a set of conserved gene neighborhoods displaying the mutually exclusive presence of a synaptojanin-like phosphatase gene, with one encoding either a member of the Wok family or a cardiolipin synthase of the HKD superfamily (71) (Figure 5B, Supplementary Data). This suggested that the latter two are analogous enzymes catalyzing equivalent reactions. The cardiolipin synthase utilizes two phosphatidyl glycerol molecules as substrates to generate cardiolipin with the release of one of the glycerol head groups (72). This is comparable to the head group exchange reaction catalyzed by PTDSS1/2 from the cpCone clade (Figure 3A). Hence, we propose that these members of the Wok clade are cardiolipin synthases (Figure 3C). Distinct phosphoesterases, namely the synaptojanin-like, calcineurin-like (73) and HAD (74) enzymes, are also observed in gene-neighborhood associations with the Wok, suggesting that they might together regulate membrane lipid composition by acting on the phospholipids or their precursors (Figure 5C). In a distinct neighborhood, the Wok clade enzyme is coupled to carotenoid biosynthesis genes (75,76). (Figures 5D,S5). This raises the possibility that these members might also catalyze a comparable reaction to the above on isoprenoid lipids: for instance, they could synthesize a carotenoid from two geranylgeranyl-diphosphate molecules (77,78). In both these contexts, the actinobacterial operons often include genes for GT-A family glycosyltransferases, suggesting the further synthesis of glycosylated derivatives of the lipids or carotenoids (79) (Supplementary Data). In several bacteria, a YfiM-1 family Lipocone is operonically coupled to a UbiA-like prenyltransferase (80). This gene neighborhood additionally codes for a slew of enzymes, such as an amidophosphoribosyltransferase (81), a RidA-like deaminase (82), and a pair of structurally distinct phosphoesterases, respectively, containing an HD and a PHP domain (73,83) (Figures 5E, S5 Supplementary Data). This suggests a role for the YfiM-1 Lipocone and the associated enzymes in generating a modified polyisoprenoid metabolite.

VanZ families modifying lipid head groups in peptidoglycan and exopolysaccharide metabolism

The widespread VanZ-1 and VanZ-2 families (Figure 1A) frequently show either gene neighborhood associations or direct domain fusions, with diverse genes involved in both peptidoglycan and other extracellular polysaccharide pathways. Chief among these are the lipid carrier flippase (Pfam: MviN_MATE clan) (8486), the UDP-GlcNAc/MurNAc lipid transferases, which generate the lipid-linked exopolysaccharide precursors (lipid I) (48,87), and UDP-N-acetylglucosamine (UDP-GlcNAc) biosynthesis enzymes (88,89). Despite certain examples of crossover in functional themes, the gene-neighborhood contexts of VanZ-1 and VanZ-2 suggest a metabolic partitioning, with VanZ-2 significantly associating specifically with peptidoglycan-related genes and VanZ-1 significantly linking with biosynthesis genes for other exopolysaccharides (e.g., the outer-membrane-associated lipopolysaccharide) (90) (Figures 5F,6,S6, Table S2). The latter include WaaL-like lipid A transferase (91), the polysaccharide chain-length determination domain Wzz (92), the Wzc kinase and the “extracellular antigen”-regulating ElyC-like domain (Pfam: DUF218) (93), and numerous nucleotide-diphosphate sugar biosynthesis and modification enzymes (94) (Figures 5F,6,S6).

The precursors of both peptidoglycan and exopolysaccharides are synthesized in the cytosol, linked to lipid carriers via a diphosphate linkage, e.g., the polyisoprenoid lipid undecaprenol (bactoprenol) (90,9497). A key step in their maturation is the flipping by the flippase of the lipid-linked intermediates associated with the inner membrane to the outer membrane. These flipped units are then incorporated into the maturing chain (98,99) by the peptidoglycan glycosyltransferase (GTase) (100) and the chain length determination protein, WzzE/polymerase (WzyE) (92,101), in peptidoglycan and other exopolysaccharide maturation pathways, respectively. Based on the precedence of the TelC-catalyzed reaction (Figure 3B), we propose that VanZ-1 and VanZ-2 comparably act on the flipped lipid II head groups bearing the modified sugar intermediates to release the undecaprenol via phosphoester cleavage (Figure 3F). Such activity could modulate the concentration of available peptidoglycan intermediates and allow formation of peptidoglycan with varying thickness and composition during different phases of the life cycle, e.g., sporulation versus vegetative growth in Bacillota. Such a reaction could also possibly modulate exopolysaccharide biosynthesis by comparably acting on their precursors.

The terminal transfer from the lipid carrier of the Gram-negative bacterial O-antigen (as well as other exopolysaccharides attached to the lipid A carrier) has been attributed to the WaaL-like enzymes (91,102). However, bacteria generate further lineage-specific polysaccharide decorations, capsule structures, and other exopolysaccharides (e.g., xanthan, enterobacterial common antigen (ECA), alginate, colonic acid), as well as teichoic acids (e.g., wall teichoic acids, WTA) (103, 104). Notably, the analogs of WaaL, i.e., the terminal transferases for several exopolysaccharides, including ECA and WTA, have to date escaped identification (93). Hence, it is possible that, by analogy to the PTDSS1/2 reaction (Figure 3A), the VanZ families act on the lipid carrier-linked sugar head groups to catalyze either the extension of the polysaccharide chains through transesterification or the terminal release of the mature chain through phosphoester cleavage (Figure 3E).

Atypical VanZ domains in uncharacterized modifications of peptidoglycan and the outer membrane

Certain representatives of the two VanZ families also show operonic associations indicative of outer membrane-associated or peptidoglycan modification functions distinct from those described above (Figures 5G,6 Supplementary Data): 1) An operon in FCB group bacteria couples a VanZ-2 gene with those coding for a SprA secretin-like channel protein (105), a glycine cleavage H (GCVH)-like lipoyl-group carrier protein (106), a 2TM protein fused via a proline-rich linker to a C-terminal TonB-C domain (107), and a secreted, second TonB-C domain fused to a Wzi-like outer membrane protein (OMP) superfamily β-barrel (108) (Figures 5G, 6). 2) In betaproteobacteria, certain VanZ-1 domains are duplicated with the C-terminal copy being inactive (VanZ-i) and found in an unusual four-gene operon with a thioredoxin-fold [2Fe-2S] ferredoxin (109), a possible lipase of the α/β-hydrolase superfamily (110), and a metallo-β-lactamase (MβL) fold D-Ala-D-Ala cross-linking transpeptidase (111,112). 3) A patescibacterial operon encodes a VanZ-2 domain with an ABC ATPase transporter system, either of two structurally distinct peptidases, namely a Papain-like or glycine-glycine peptidase (113,114), fused to the same membrane-anchored N-terminal coiled-coil region, and a further TM protein containing one or more external Lamin-Tail domains (LTDs) predicted to bind extracellular DNA or polysaccharides (115) (Figures 5G,6,S6). The associations in the first of the above neighborhoods point to a distinct outer membrane-associated lipid modification, while the other two might be involved in lineage-specific decorations/modifications of peptidoglycan, accompanied by peptide-crosslinking or cleavage activities.

Lipocone domains operating in the outer membrane

Contextual associations, phyletic patterns, and localization predictions support the action of two Lipocone families directly in the outer membrane. Notably, the YfiM-Griddle and YfiM-DUF2279 families are found nearly obligately directly fused or operonically linked to several distinct OMP β-barrels (116,117) (Figures 5H,6,S5). Up to three YfiM-Griddle Lipocones, usually with a cognate OMP β-barrel, might be encoded next to each other in the genome. Additionally, YfiM-Griddle family genes are often encoded in operons with several components of the outer membrane lipid and protein trafficking apparatus, including the LolA-like chaperone (118), the POTRA domain (119,120), the channel-blocking Plug domains (121), and the TolA-binding TolB-N domain (122). Further, these operons might encode a Patatin-like lipase (123), GT-B family glycosyltransferases (79), and a range of phosphoesterases (e.g., an integral membrane phosphatidic acid phosphatase PAP2 (124), a lipobox-containing synaptojanin superfamily phosphoesterase (125) and a secreted R-P phosphatase (126) (see Figures 5H,6, and Supplementary Data)). In addition to the fusion to the OMP β-barrel, the YfiM-DUF2279 family (Figure 5H) shows operonic associations with a secreted MltG-like peptidoglycan lytic transglycosylase (127,128), a lipid-anchored cytochrome c heme-binding domain (129), a phosphoglucomutase/phosphomannomutase enzyme (130), a GNAT acyltransferase (131), a diaminopimelate (DAP) epimerase (132), and a lysozyme-like enzyme (133). In a distinct operon, YfiM-DUF2279 is combined with a GT-A glycosyltransferase domain (79), a further OMP β-barrel, and a secreted PDZ-like domain fused to a ClpP-like serine protease (134,135) (Figure 5H).

The strong linkage to the OMP β-barrel, together with their predicted localization, suggests that these YfiM-Griddle and YfiM-DUF2279 Lipocone domains operate in the outer membrane, potentially in concert with both cytoplasmic carbohydrate biosynthetic modules and periplasmic lipid- and carbohydrate-processing enzymes. As with the inner membrane lipids, they could potentially catalyze modifications of head groups through transesterification and/or linkage/release of outer membrane-associated polysaccharide chains through action on lipid-head group phosphoesters.

Lipocone domains acting on lipids in transit to the outer membrane

The ClaspCone-1 and ClaspCone-3 families lack the hydrophobicity indicative of direct residence in the membrane (Figure 1C); instead, they are predicted to localize to the periplasmic space. In the ClaspCone-1 family, the Lipocone domain is fused at the extreme N-terminus to either a single TM or a 5TM domain predicted to anchor it to the cell membrane. Between this TM element and the Lipocone domain, we detected a previously uncharacterized version of the Tubular lipid binding protein (TULIP) domain (136,137) or an Ig-like and a Zincin-like metallopeptidase (MPTase) domain (138) (Figures 5I, 6). These ClaspCone-1 genes may also show operonic associations with genes encoding a lipase of the SGNH family (139) and a membrane-bound O-acyltransferase (MBOAT; Figure 5I, Supplementary Data) (140). The TULIP domain superfamily has recently been characterized as a lipid-binding domain (136,137), which in proteobacteria functions in outer membrane lipid transport (141,142). Thus, we propose that the ClaspCone-1 family is likely to act in the periplasmic space on the head groups of outer-membrane targeted lipids bound to the TULIP or potentially to the Ig-like domains occupying an equivalent position in the domain architecture.

A Lipocone domain catalyzing a predicted lipoprotein lipid linkage reaction

The Skillet-1 Lipocone is strongly coupled in an operon with a downstream gene coding for a protein with an unusual lipobox-like sequence followed by one of several extracellular domains (e.g., concanavalin, β-jelly roll, OB-fold, Ig-like, β-propeller) predicted to bind carbohydrates or other ligands (143147) (Figures 5J,6,S6 Supplementary Data). The lipobox-like sequence features a conserved GS motif at its C-terminus instead of the usual GC of the classic lipobox of bacterial lipoproteins (148) (Figure S8). In the canonical lipoprotein processing pathway, a thioether linkage is formed between the sulfhydryl of the cysteine and a diacylglycerol lipid embedded in the inner membrane by the lipoprotein diacylglyceryl transferase (lgt) enzyme, followed by the cleavage of the signal peptide at the GC motif junction by the signal peptidase (149,150). Given the serine in place of the cysteine in these lipobox-like sequences, we propose that it undergoes non-canonical lipidation by the associated Skillet-1 Lipocone protein in lieu of lgt. We propose that, comparable to PTDSS1/2, which act on free serine, the Skillet-1 family links the conserved serine from the lipobox-like sequence to a phospholipid (Figure 3A,D).

Lipocone domains in predicted lipid-associated signaling systems

Systems defined by standalone proteins with Lipocone domains

Several representatives of the two VanZ and Skillet-3 families are fused to a diverse array of known or predicted extracellular ligand-binding domains (Figure 5K), where the architecture takes the form of SP+X+TM+Lipocone or Lipocone+TM+X, where ‘X’ is the extracellular ligand-binding domain and SP is a signal peptide. The ligand binding domains include: (i) carbohydrate-binding lectin domains such as jelly-roll, concanavalin-like, NPCBM-like, CBD9-like, and other β-sandwiches (143,144,151153)); (ii) a lipid-binding helix-grip superfamily domain (154)); (iii) those binding other potential ligands (e.g., Ig, OB-fold, YycI-like, DUF498-like, PepSY-like, β-helix, TPR, MORN, and β-propeller repeats (145147,153,155158)) (Figure 5K, Supplementary Data). We interpret these architectures as implying signaling, wherein the binding of the cognate ligand by one of the above domains regulates the catalytic activity of the associated Lipocone domain. Among these, the extracellular domains fused to the Skillet-3 family are particularly notable for their extreme variability (Figure 5K). This suggests their diversification under an arms race scenario (also see below) in a biological conflict. Further, the genes coding for the above are sporadically associated with exopolysaccharide metabolism genes (Supplementary Data). Hence, it is conceivable that this signaling is associated with exopolysaccharide variation (e.g., O-antigen phase-variation (159,160)), which might play a role in evading bacteriophage attachment.

Additionally, VanZ-1 Lipocone domains are also fused to several known signaling domains confidently predicted to reside in the cytoplasm, including the cyclic nucleotide-binding domain (cNMPBD), phosphopeptide-binding FHA, and DNA-binding RHH and HTH domains (65,161163) (Figure 5K). These associations suggest potential VanZ regulation via a cytoplasmic cyclic nucleotide (sensed by cNMPBD) or, conversely, VanZ acting as an allosteric regulator of a transcriptional program via the HTH or RHH domain. One of the most common yet enigmatic fusions to VanZ is with the integral membrane RDD domain (53). The role of this domain is unknown; however, our analysis indicates that it contains a conserved intra-membrane binding site oriented towards the cytoplasmic face of the membrane (Nicastro GN, Burroughs AM, Aravind L, manuscript in preparation). The VanZ-RDD fusion is sometimes further fused to other domains (Figure 5K), the most notable being a highly derived but active novel Histidine Kinase domain (Figure 5K). Together, these associations point to the coupling of lipid modification with a signaling event on the cytoplasmic face of the membrane, which might relate to the dynamic regulation of lipid-carrier-bound exopolysaccharide precursors.

Multi-component associations of the Lipocone proteins in signaling

These systems resemble the above-discussed versions but are encoded by conserved gene neighborhoods that separate the Lipocone and the signaling elements (typically predicted transcription regulators) into distinct genes. Our analysis recovered at least three such systems: 1) A VanZ-1 Lipocone in the recently described HAAS/PadR-HTH two-component systems, which sometimes replace classical Histidine kinase-Receiver two-component systems (164). In these systems, the detection of an extracellular or intramembrane stimulus by a sensor domain releases the PadR-HTH transcription regulator bound to the sensor-fused HAAS domain. Here, VanZ-1 occupies the sensor position (Figure 5L). 2) A Skillet-2 Lipocone is coupled in a core two-gene system to a conserved upstream gene (Figure 5L,S5). That gene encodes a single TM protein with either a zinc ribbon (ZnR) fused to a conserved helix or an HTH domain fused to a ClpS-ribosomal L7/L12 domain in its cytoplasmic region (165). These neighborhoods might also code for an HMG-CoA reductase and GHMP kinase that catalyze successive reactions in the production of phosphomevalonate, a precursor of isoprenoid lipids (166,167). 3) A Skillet-DUF2809 Lipocone protein is operonically coupled with a 6TM protein and a further predicted transcription factor with a wHTH protein. These operons are further elaborated via additional embedded and flanking genes, either coding for components of isoprenoid lipid (e.g., undecaprenol) (168,169) or exopolysaccharide (e.g., ECA and related polysaccharides) metabolism (94,97) (Figures 5L,6,S6 Supplementary Data).

The Lipocone domains in these systems are predicted to be active enzymes, which, together with their operonic associations, point to functions involving the modification or transesterification of isoprenoid lipid head groups, sometimes in the context of exopolysaccharide biosynthesis. However, their associations with the intracellular HTH domains suggest that the Lipocone enzymatic activity is potentially coupled with the transcriptional regulation of the production of precursors of the lipids or exopolysaccharides. Given the high variability in the associated genes related to exopolysaccharide/lipopolysaccharide biosynthesis, we anticipate that the associated transcriptional regulation potentially relates to functional categories showing high diversity across bacteria, such as responses to environmental stress, phages, predatory bacteria attacks, or host immune response.

Lipocone domains as effectors in biological conflicts

Lipocone domains in antiviral immunity

The Min-Wnt domains (Figure 1A) that we originally identified were predicted to play a role in biological conflicts with invasive selfish elements, such as viruses (24). In this work, we better explain their potential mechanism of action. These versions show no fusions to extracellular domains or secretory signals, suggesting that they are deployed from within the bacterial cell (Figure 1C). These Min-Wnts are typically fused to the DUF3892, which displays a fold characterized by a three-stranded meander followed by a helix also seen in the dsRNA-binding domain and the ribosome hibernation factors (HPF) (170,171) (Figure 5M). Hence, we propose that these versions might potentially act to sense virally induced RNAs or modified ribosomes (24) to trigger a dormancy or suicide response to limit viral infection via the Min-Wnt effector. Specifically, the Min-Wnt might attack peptidoglycan precursors, such as lipid II, prior to their ‘flipping’ to restrict cell wall synthesis (85,172,173) or other such carrier lipids.

One other Min-Wnt domain, N-terminally fused to a three-stranded β-meander, is pervasive in the Bacteroidetes clade. This is operonically coupled with genes encoding a TM-linked run of pentapeptide repeats and two structurally distinct, secreted glycosyl hydrolase enzymes, respectively, containing a TIM barrel domain and a run of β-helix repeats (Figure 5M, Supplementary Data). Further, cyanobacteria show a standalone prok-TelC domain without any secretory signals. These could again act as effectors targeting lipid-linked precursors of peptidoglycan or expolysaccharides in response to intracellular invaders or stress (Figure 5M). Interestingly, some tailed bacteriophages also code for intracellular Min-Wnt domains, suggesting that they might also be deployed on the virus side in biological conflicts such limiting superinfection (Supplementary Data).

Lipocones as toxin domains in polymorphic and allied conflict systems

Polymorphic toxins and related systems, widespread across bacteria and certain archaea, are characterized by a highly variable C-terminal toxin domain (“toxin tip”) that is preceded by a range of more conserved domains typically required for autoproteolytic processing of the toxin, its packaging and trafficking (e.g., RHS repeats), adhesion and secretion via one of several secretory systems (174,175). The toxin might be delivered via one of the secretory systems into a target cell or else via direct contact between interacting cells. Classical polymorphic toxins are usually involved in kin discrimination and are accompanied by genomically linked cognate immunity proteins that protect against self-intoxication (174,176). Keeping with the principle of effector sharing between systems involved in distinct types of biological conflicts, we had originally identified a Min-Wnt domain closely related to those described in the above subsection as a toxin tip in polymorphic toxin systems (24). In the current work, we extend these findings to show that several distinct Lipocone families have been independently recruited as toxin tips of polymorphic toxins and related systems, namely Min-Wnt, prok-SAA, prok-TelC, CapCone-1, CapCone-2, ClaspCone-2 and VanZ-1 (Figures 4,5N,6,S7).

Certain CapCone-2 and Min-Wnt toxins from Gram-positive bacteria define some of the simplest of these toxin systems. Here, a standalone Lipocone domain is coupled to a signal peptide or lipobox via a poorly structured linker. These are usually encoded in a two-gene configuration with their cognate immunity protein (Figure 5O). More complex versions present, in addition to adhesion, peptidoglycan-binding, lipid-binding and proteolytic processing domains, multiple hallmarks of delivery through specific secretion systems. These include T4SS (VirD4-binding domain), T6SS (PAAR domain), T7SS (WXG/LXG domain), T9SS, and MuF domains (174,176) (Figure 5P, Supplementary Data). Additionally, we recovered standalone CapCone-1 domains encoded in an operon with a PsbP/MOG1 superfamily domain diagnostic of secretion via the T6SS (174,177) (Figure 5P). Further, we also found Min-Wnt domains fused to the N-termini of RTX-like β-roll repeats, suggestive of T1SS-mediated export (178) (Figure 5P).

Our analysis also uncovered multiple, previously uncharacterized trafficking/packaging systems associated with different Lipocone polymorphic toxins. Several Min-Wnt and CapCone-1 domains with lipoboxes are fused to an N-terminal Cystatin-like superfamily domain (179) (Figure 5Q). The same domain is also comparably fused to several other C-terminal toxin domains in related organisms, some of which are also predicted to target lipid head groups: (i) a novel toxin domain we unified with the lipid-targeting Colicin M fold (180); (ii) a lipid-binding START-domain-like helix-grip fold domain (154); (iii) a papain-like fold fatty acyltransferase (181); (iv) a domain related to the VanY-like D-Ala-D-Ala carboxypeptidase (182) (Figure 5Q, Supplementary Data). In all these cases, the toxins are coupled to a related immunity protein (see below), suggesting that they define a distinct polymorphic toxin system. We propose that this Cystatin-like domain specifies a novel packaging or deployment system upon secretion for the C-terminal toxin domain, analogous to Cystatin domains in functioning with eukaryotic proteases (183). The prok-TelC family Lipocones are found in distinctive architectures in two poorly characterized, predicted polymorphic toxin systems. In one of them, they are fused to an N-terminal glucan-binding GbpC β-sandwich domain (184) and repeats of MucBP-like Ig domains (185), which might anchor them to exopolysaccharides (Figures 5Q,S7 Supplementary Data). The second variant found in association with T9SS components (186) shows fusions to one or more copies of a previously undetected TPM domain (Figure 5Q). While the domain has been claimed to be a phosphatase (187), our recent analysis indicates that this is unlikely to be the case (164). Instead, we propose that the TPM domain might assist in assembling membrane-linked protein complexes, a role that might be relevant to the trafficking of these toxins (164).

To date, the only experimentally characterized Lipocone domain from polymorphic toxins is of the prok-TelC family that are secreted via T7SS (29,188) (Figures 5P,6). Notably, prok-TelC has been shown to be active only outside the cell and not in the cytoplasm (29). As noted above, it attacks lipid II to cleave off the peptide-linked disaccharide pyrophosphate head group from the undecaprenol tail (Figure 3B). Prok-TelC has also been speculated to similarly attack WTA-lipid II linkages (29). These findings provide a template for other Lipocone superfamily effectors in potentially targeting lipid carrier linkages in peptidoglycan and exopolysaccharide intermediates. However, given the diversity within the family (Figure 3F), it is conceivable that they also target other lipids.

Immunity proteins of Lipocone polymorphic toxins indicate periplasmic/intramembrane action

To date, only a single immunity protein has been reported for Lipocone toxins, viz., TipC, which counters prok-TelC toxin in the periplasm (29,189) (Figure 5P,6). Here, we uncovered a range of immunity proteins belonging to structurally distinct folds that counter the remaining Lipocone toxins (Figures 5N-Q,S9, Supplementary Data). The most widespread of these is a rapidly evolving, membrane-anchored member of the BamE-like superfamily that associates with not only Min-Wnt and CapCone toxins but also other above-mentioned lipid-head-group targeting toxins (e.g., the novel Colicin M-like domain). The BamE-like fold features a core two-helix hairpin followed by a run of three β-strands (Figure S9). The classical BamE operates in a pathway for the assembly of OMP β-barrels (190,191), suggesting that these immunity proteins emerged from an ancestral BamE and, like it, function in the periplasm. Additional candidate immunity proteins with more restricted phyletic spreads include (Figure S9): (i) a β-jelly-roll fold-containing protein (144); (ii) an integral membrane protein with a 4-TM core. These two are observed with Min-Wnt toxins. (iii) A novel domain combining an α-helix with a run of 4 β-strands stabilized by four absolutely conserved cysteine residues. This is coupled to both Min-Wnt and prok-SAA toxins; (iv) a protein with an OB-fold domain (145) (v) a protein with a β-sandwich related to the eukaryotic centriolar assembly SAS-6 N-terminal domain (192). The last two are coupled to CapCone-2 toxins (Supplementary Data). Notably, despite their structural diversity, these immunity proteins are all TM or lipoproteins and, like TipC (29,189), are predicted to operate at the membrane or in the periplasm (Figure 5N-Q, Supplementary Data). This suggests that they intercept their cognate Lipocone toxin domain outside cells or in the membrane rather than within the cell.

Lipocone toxins in predator-prey and other interspecific conflicts

In contrast to polymorphic toxins, which are typically deployed in intraspecific conflict between competing strains of the same species, other toxin systems are deployed against more distantly related target cells, such as prey and eukaryotic hosts (193). While some of these closely parallel polymorphic toxins in their domain architecture, they are usually distinguished by the lack of an accompanying immunity protein. The simplest of these systems are secreted Min-Wnt proteins from bacteria and fungi. These present just a standalone Min-Wnt domain or one fused to a novel domain with a half β-barrel wrapping around a helix (Figure 5R, Supplementary Data). These are probably deployed as diffusible toxins that target rival organisms in the environment.

Another architectural theme is defined by Min-Wnt and prok-SAA Lipocones fused to an enigmatic, novel, short C-terminal domain, which is comprised of a long β-hairpin with a characteristic “break” in its central region, causing it to acquire an arch-like appearance (Figures 5S, S10). Hence, we refer to this domain as the broken-hairpin. We found the broken-hairpin domain to be fused to a wide array of predicted toxin domains across the bacterial superkingdom. These include effector domains otherwise found in polymorphic toxin and allied systems that target peptidoglycan, carrier lipids and the membrane, such as members of the Colicin M (180), Zeta toxin-kinase (194), lysozyme (195), an α/β-hydrolase superfamilies (110) and nuclease toxins such as members of the HNH, HipA, SNase, and BECR superfamilies (174) (Figures 5S, S10C). Remarkably, these proteins with the broken-hairpin tend to lack a signal peptide or association with any other secretion system or immunity proteins (Figure 5S, Supplemental Material). Hence, we propose that the broken-hairpin domain itself serves as a trafficking mechanism for the externalization of these toxins in conflicts with rival environmental organisms.

Some predicted secreted Lipocones are found predominantly in predatory bacteria. The first of these are CapCone-2 domains from lineages like Bdellovibrionota, which are encoded in two-gene systems, with the second gene coding for a further secreted effector such as an α/β-hydrolase, Patatin, or acyltransferase or an OMP β-barrel domain (110,116,117,123) (Figures 5T,S7). Myxobacteria and some other lineages code for secreted prok-SAA domains fused to a N-terminal Zincin-like metallopeptidase domain, and the first bacterial example of the von Willebrand Factor D (vWD) and Ig domains at the C-terminus (196) (Figure 5T, Supplemental Material). In the recently described predatory Patescibacterial branch of Omnitrophota species, Skillet-clade Lipocone domains are found in gigantic proteins combined with several other domains and TM segments. Domains found in these proteins include polysaccharide biosynthesis enzymes (94,97), signaling proteins involved in Histidine kinase-Receiver relays (197), peptidases of the MPTase and Papain-like superfamily (113,138), diverse methylases, and extracellular ligand-binding domains like the peptidoglycan-binding LysM domain (198) (Figures 5T,6). Given the concentration of the above systems in predatory bacteria (Supplementary Data), we posit that the above Lipocones might function as toxins targeting prey membranes alongside a battery of effectors targeting other cellular components. In particular, the CapCone-2 systems might play a role in the breaching of outer membranes by Bdellovibrionota. Animal vWD domains are involved in adhesion (199); hence, the bacterial versions might play a similar role in adhering to prey cells, while the MPTase in these proteins potentially releases the associated Prok-SAA toxin through autoproteolysis. Finally, the giant proteins from the Patescibacteria are likely to combine signaling prey presence with overcoming prey defenses and breaching prey membranes.

Certain prok-TelC proteins are observed as part of several distinctive systems that could be involved in as-yet-undiscovered predatory interactions or in targeting environmental competitors. One such, defined by large proteins from spore-forming Bacillota, combines a diversifying set of extracellular ligand-binding domains (e.g., Ig-like, Cell-wall-binding β-hairpins and β-propellers (146,147,200)) with a two-enzyme core formed by a prok-TelC and a N-acetylglucosamine (GlcNAc)-1-phosphodiester alpha-N-acetylglucosaminidase (NAGPA). NAGPA catalyzes phosphoric-diester hydrolysis to release phosphodiester-linked sugars (Figures 5U,6,S7) (201). Some of these proteins feature an additional NlpC/p60 superfamily peptidase domain predicted to target peptidoglycan (181). The recombinational diversity of ligand-binding domains in this system, even among closely related Bacillota species, supports a possible arms race and involvement in a biological conflict. Other TelC domains in some Bacillota, Actinomycetota, and fungi are fused to peptidoglycan-binding domains (PGBD) (202) and an Rv2525c-like TIM-barrel (203) (Figures 5U,S7). In Actinomycetota, this protein is further combined in operons with either of two mutually exclusive genes coding for rapidly evolving proteins (Figure 5U): (i) a secreted protein containing a pair of Ig domains (200); (ii) a 3-TM protein (3TM-CCDN) with two conserved cysteines, an aspartate and asparagine residues predicted to be located between the TM segments outside the cell. This version is further coupled to a gene for a secreted VanY superfamily peptidase (182) (Figure 5U, Supplementary Data). Common to these contexts are rapidly evolving and variable domains on the one hand and peptidoglycan/exopolysaccharide binding or degrading domains on the other. Hence, we interpret these as potential conflict systems that engage the cell wall and target it and associated membranes in rival bacteria.

Lipocone domains in resistance to antimicrobial agents

VanZ-1 proteins (Figure 1A) were initially identified as encoded by a gene linked to that coding for the VanY D-alanyl-D-alanine carboxypeptidase involved in resistance to glycopeptide antibiotics like vancomycin and teicoplanin (31,204206) (Figure 6). These antibiotics bind the terminal D-Ala-D-Ala in the peptide moiety of peptidoglycan, preventing the transpeptidase cross-linking reaction necessary for its maturation. Upon detection of these antibiotics, enzymes encoded by the core vancomycin resistance operon re-engineer the exported peptidoglycan by inserting a D-Ala-D-Lac in place of the D-Ala-D-Ala linkage, precluding antibiotic binding (53). The VanY peptidase, while not strictly required for antibiotic resistance, acts as an accessory to this system by cleaving any remaining D-Ala-D-Ala linkages generated via the canonical pathway (53,205). However, the role of VanZ in this system has so far remained unknown. While only a small fraction of the VanZ-1 genes are found in these antibiotic resistance contexts (Supplementary Data), interestingly, other Lipocone genes, namely those of the VanZ-2 and the Skillet families, might also be linked to VanY in lieu of VanZ-1. Further, VanY might be replaced by a structurally unrelated secreted D-Ala-D-Ala carboxypeptidase of the metallo-beta-lactamase fold (111) in operonic contexts with VanZ-1 (Figure 5V). Hence, given our above prediction regarding VanZ acting in peptidoglycan and/or exopolysaccharide metabolism, VanZ-1 and the Lipocones displacing it might indeed play an accessory role with VanY at the membrane (204,205) in antibiotic resistance. We posit that, in these contexts, it likely acts on the head group of Lipid II to recycle canonical peptidoglycan intermediates for their accelerated or more thorough replacement with the resistant versions (Figure 3G).

We also identified a conserved five-gene operon featuring a YfiM-1 family Lipocone that might play a role in resistance to antibacterial agents (Figures 5W,S5). Other than YfiM-1, this operon contains genes for: (i) a thioredoxin domain protein (109); (ii) A DTW clade RNA modifying enzyme of the SPOUT superfamily (207,208); (iii) a protein with acyl-CoA ligase, GNAT superfamily N-acetyltransferase and ATP-grasp domains (131,209,210); (iv) a PssA-like phosphatidylserine synthetase of the HKD superfamily (211) (Figure 5W, Supplementary Data). Of these enzymes, the phosphatidylserine synthetase is predicted to act in its usual capacity to generate a lipid with a serine head group (211). We propose that this would then function as a substrate for the YfiM-1 Lipocone domain, which might exchange the serine for another moiety via a reaction paralleling PTDSS1/2 (Figure 3A). This moiety could then be modified by aminoacylation, further acylation and a redox modification by the third protein listed above, together with the thioredoxin. Indeed, such peptide modifications of lipid head groups by lysine, alanine, or arginine aminoacylation catalyzed by derived tRNA synthetases fused to GNATs have been shown to be a key resistance mechanism against breaching of the membrane by antibacterial peptides (212,213). Hence, we predict the modifications catalyzed by this system might play a comparable role. The presence of a tRNA-modifying DTW domain suggests that in parallel to the tRNA synthetases, the GNAT in this system might use a tRNA-linked acyl group as a substrate, as seen in peptidoglycan biosynthesis (214,215).

Eukaryotic recruitments of the Lipocone superfamily

Lipocone domains have been transferred on several occasions from bacteria to eukaryotes (Figures 4,S1). While there is predicted functional overlap with the above-described, predominantly bacterial versions, we discuss these separately as the inferred biological contexts of their deployment are often distinct from the above.

Plant YfiM-1 and eukaryotic VanZ-2 proteins

A conserved YfiM-1 family protein typified by the Arabidopsis AT1G15900 was acquired from the bacteroidetes lineage of bacteria at the base of the plant lineage prior to the chlorophyte-streptophyte (including land plants) split and is predicted to be catalytically active (Figure S2, Supplementary Data). In Arabidopsis, this gene is widely expressed across different tissue types, developmental stages, and other tested conditions (216,217). Given the above-predicted roles for bacterial YfiM-1 proteins, it is conceivable that the plant version plays a comparable role in the metabolism of a conserved plant-specific lipid. In a similar vein, a distinct clade of standalone VanZ-2 domains typified by the Saccharomyces cerevisiae YJR112W-A was acquired early in the fungal lineage. A similar transfer is also seen in the SAR clade of eukaryotes (Figure S1). Since these eukaryotes lack peptidoglycan and other bacterial-type isoprenoid lipid-borne exopolysaccharide intermediates, we suggest that this version was recruited for modifications of a fungus-specific lipid (e.g., highly oxygenated isoprenoid lipids) (218).

The Met-TelC proteins

The Met-TelC clade is comprised of versions of the TelC family with a reconfigured active site transferred from bacteria to Metazoa prior to the divergence of the cnidarians, and most members are predicted to be catalytically inactive (Figure 2). In cnidarians and arthropods, the Met-TelC domain is found in a secreted protein fused to C-terminal adhesion-related vWA (219) and Ig domains, followed by a TM helix (Figure 5X, Supplementary Data). The chordate version, typified by human PGLYRP2 (220), is also secreted and is fused to a C-terminal Amidase targeting the N-acetylmuramoyl-L-alanine linkage (Figure 5X, Supplementary Data). PGLYRP2 is a key innate immunity factor against bacterial pathogens degrading sugar-peptide linkages in peptidoglycan via the Amidase domain (221223). As most Met-TelC proteins lack the active site residues but are modeled to retain the substrate-binding pocket, we propose that they participate in anti-bacterial immunity as a Pathogen-Associated Molecular Pattern (PAMP) receptor (224). Specifically, they could recognize polyisoprenoid pyrophosphate-linkage-containing lipid intermediates of bacterial cell-surface molecules like peptidoglycan or exopolysaccharides.

Eukaryotic Wnt proteins

Wnt family Lipocones were transferred on multiple occasions to eukaryotes. The best-known of these are Met-Wnt proteins, which were acquired from bacteria at the base of Metazoa after they had separated from their closest sister group, the choanoflagellates. These lost the ancestral active site residues and function as well-studied secreted signaling molecules and will not be detailed further in this work (for review, see (1,225)). Independently of the Met-Wnt proteins, catalytically active, secreted versions closely related to the bacterial Min-Wnt proteins were transferred to fungi and, within Metazoa, to the rotifers and the hemichordate acorn worm Saccoglossus kowalevskii, where they are lineage-specifically expanded (Figure S1, Supplementary Data). These versions are primarily standalone versions of the Min-Wnt domain, lacking the large inserts typical of the Met-Wnt proteins (Figure 1A). We predict that these eukaryotic Min-Wnt proteins retain their ancestral toxin role and might participate in anti-bacterial immunity.

Met-SAA proteins

Met-SAA proteins (Figure 1A) were acquired from bacteria prior to the divergence of the cnidarians from the rest of Metazoa. However, unlike the Met-Wnt and Met-TelC proteins, they often conserve the ancestral active site residues, indicating that they are usually enzymatically active (Figure 3). Human SAA has been recognized as a key immune marker that dramatically increases in blood during the Acute Phase Response (226). It has been reported to bind the E. coli outer membrane protein OmpA (227) and claimed to function as an opsonin in innate immunity (228). Like Met-TelC, but in contrast to Met-Wnts, Met-SAAs appear to have been lost or pseudogenized in several animal lineages (229231) (Figure 3). This is consistent with an arms-race scenario in immunity and the development of pathogen resistance against the Met-SAAs, leading to loss. Keeping with an immune role for the Met-SAAs, we propose a catalytic function for the active versions in severing lipid head groups of outer-membrane lipids or of isoprenoid lipid carrier intermediates. Such action could also generate PAMPs that could explain the activation of neutrophil- and macrophage-based immunity by SAA (228). Pertinent to these observations, diverse OMP β-barrels have been linked to the translocation of polymorphic toxin domains across the outer membrane of target cells (232234). Given this and the origin of Met-SAA from bacterial polymorphic toxin-related systems (Figure 4), its interaction with OmpA might help it cross over into the periplasmic space and act on maturing peptidoglycan or teichoic acid intermediates.

SAA was first reported as a component of secondary amyloid deposits (235), and its capacity to form amyloid fibrils upon protease cleavage was theorized as a potential PAMP activating the immune response (30). Indeed, bacteria produce their own secreted amyloids, such as Curli and Fap, believed to contribute to biofilm formation (236,237), and might be PAMPs recognized by animal immune systems (238). Further, other animal amyloids, such as the β-amyloid, have been proposed to play a role as physical barriers in immunity against bacteria (239). Thus, amyloid formation by protease cleavage (including potentially by bacterial proteases) may represent a second line of defense mediated by Met-SAA proteins.

Discussion

Early Evolution of the Lipocone superfamily

No single well-defined Lipocone clade is universally conserved across the three superkingdoms of Life (Figures 4, S1). However, the VanZ and Wok clades are both found across all major bacterial phyla (notwithstanding sporadic losses in certain lineages) and in some archaeal lineages (Figure S1). At the same time, the cpCone clade is found across most major archaeal lineages and is nearly universally conserved in the eukaryotes (absent in Ascomycota and some choanoflagellates) (Figure S1). Notably, the cpCone and Wok clades tend to group together in the profile-profile similarity network (Figure 1B). These observations suggest that at least a single version of the Lipocone superfamily was likely present in the Last Universal Common Ancestor (LUCA). The phyletic patterns suggest that the LUCA Lipocone gave rise to the VanZ/Wok precursor in the bacterial lineage on the one hand and the cpCONE clade via a circular permutation event in the archaeo-eukaryotic lineage on the other (Figure 4). Based on the features of these deep-branching clades, the LUCA version is inferred to feature a hydrophobic domain with a 4TM helix core, with the active site facing the outer leaf of the lipid bilayer (Figure 1C). Given that extant versions operate both on classic phospholipids and isoprenoid lipids, it is difficult to infer which of these might have been substrates for the LUCA version. It is not impossible that this early version had a generic specificity that became specialized in the descendant clades.

Subsequent diversification of the Lipocone domain

The early diversification of the Lipocone domain appears to have had different drivers in the two prokaryotic superkingdoms. The presence of an extensive repertoire of exopolysaccharides in the cell wall (peptidoglycan, teichoic acids), cell surface (e.g., ECA), and outer membrane (e.g., lipopolysaccharide), synthesized via isoprenoid lipid-linked intermediates, like lipid-II, was the primary driver in the bacterial superkingdom (240). Here, this diversification yielded 4 monophyletic groups: the VanZs, Wok, YfiM and Skillet (Figure 4). The deeper VanZ and Wok branches, which were likely recruited first for lipid-II-related functions, were probably the predecessors of the more restricted bacterial families with specialized functions. For instance, the emergence of the outer membrane in certain bacteria was potentially coupled with the origin of the YfiM-like clade (Figure 4). Similarly, our predictions suggest that within these clades, further diversification accompanied the acquisition of specialized functional roles in antibiotic resistance, secondary sensor roles in single and multicomponent signaling and lipoprotein processing. The interoperability of Lipocone domains on lipid carriers shared across different biosynthetic pathways (see above, Figures 3,S5-S6) appears to have been a key factor leading to this versatility.

In the ancestral archaeo-eukaryotic lineage, the absence of peptidoglycan and an apparently lower diversity of structures with exopolysaccharides was reflected in the lesser diversification of the Lipocone clades (Figure 4). There are open questions regarding the biochemical functions of the primary archaeo-eukaryotic Lipocone clade, the cpCONE. Although the eukaryotic cpCone PTDSS1/2 family has been shown to swap serine for ethanolamine or choline in lipid head groups (28,57), their archaeal counterparts remain uncharacterized. Archaea have their own lipid with a serine in the head group (archaeophosphatidyserine), but to date, its synthesis has been shown to depend on a patchwork of different CDP-alcohol phosphatidyltransferase enzymes (CaPs) in different archaeal species (58,241,242). While the CaPs are also integral membrane enzymes with a 6TM helix core, catalyzing comparable reactions as the Lipocones on lipid head groups in archaea and eukaryotes (242), they are evolutionarily unrelated. Nevertheless, we suggest that the archaeal cpCones, like their eukaryotic counterparts, could contribute to distinct, as yet uncharacterized, pathways for the generation of cell membrane phospholipids like archaeophosphatidylserine or those with other head groups.

Emergence of diffusible versions of the Lipocone domain and their repeated recruitment in biological conflicts

One of the remarkable aspects of the Lipocone superfamily is the loss of ancestral hydrophobicity in several families (Figure 1C), transforming them from integral membrane proteins to diffusible domains. While unexpected, such a transition in integral membrane enzymes acting on lipid substrates is not unprecedented. The PAP2 superfamily of integral membrane enzymes (e.g., diacylglycerol diphosphate phosphatase) (124,243) also contains several soluble versions (244) that appear to have emerged from an integral membrane ancestor (AMB and LA, unpublished observations). Most of the soluble Lipocone domains retain their active site conservation (Figure 2) and, at least in one experimentally characterized case, catalyze a comparable reaction as the TM version (29) (Figure 3B). The weight of the evidence presented here, including the profile-profile similarity network (Figure 1B), phyletic patterns (Figure S1), functional contexts (Figure 5-6), and the broadly shared structural features (Figures 1A, S2, S4), suggests that the loss of hydrophobicity occurred on a single occasion in the Lipocone superfamily, followed by diversification of these diffusible versions.

Our analysis of the diffusible Lipocone families reveals repeated recruitment as toxins/effectors in anti-viral and polymorphic toxin and allied systems (174), suggesting that their diversification was driven by the arms races arising from the biological conflicts where they are deployed. Recruitment of a representative of the VanZ-1 family as a polymorphic toxin on rare occasions (Figures 5N,6) suggests a possible evolutionary pathway for their recruitment as toxins: the effector version of Lipocones attacking lipids in competing bacteria likely emerged from an ancestral version that catalyzed endogenous lipid-head-group modifications on the same lipids in metabolic pathways. Once versions with reduced hydrophobicity emerged, they could be deployed as diffusible effectors that were shared across extracellular and intracellular conflict systems, a trend previously recognized in many other effector domains (25).

Repeated acquisition of Lipocones of bacterial origin by eukaryotes

Unlike bacteria, eukaryotes as a whole do not possess a rich repertoire of Lipocone domains. The PTDSS1/2 family, vertically inherited from the archaeal progenitor, is the only version that can be inferred as being present in the Last Eukaryotic Common Ancestor (Figure 4). However, distinct Lipocone families of ultimately bacterial provenance were acquired early and fixed in certain eukaryotic lineages: (i) YfiM-1 in the plant lineage; (ii) the fungal VanZ-2 domains typified by the Saccharomyces cerevisiae YJR112W-A; (iii) Met-Wnt (discussed further below) (Figure S1). The early fixation of these versions in the eukaryotic lineages possessing them suggests that they were recruited for definitive “housekeeping” or developmental in the respective lineages. Beyond these, the fungal and metazoan lineages show more sporadically distributed versions, which have all been acquired from bacterial secreted-toxin or antiviral systems: (i) Min-Wnt independently in fungi and certain Metazoa; (ii) SAA; (iii) TelC; the latter two are absent in the basal-most metazoans, the sponges, but are present in Cnidaria, suggesting a relatively early acquisition (Figure 4). The weight of the evidence suggests that they have retained certain aspects of the ancestral bacterial effector function for anti-pathogen immunity in eukaryotes. This is consistent with both their episodic loss and lineage-specific expansion, the tendency to show rapid sequence divergence and, in the Met-TelC family, loss of catalytic activity (Figures 2,S1, Supplementary Data).

This independent acquisition of at least 3 distinct Lipocone families in metazoan immunity from polymorphic and allied effector systems of prokaryotes points to a persistent evolutionary trend. Notably, the Lipocone domains participating in animal immunity have been drawn from secreted effectors rather than the intracellular versions (bacterial intracellular Min-Wnts) predicted to participate in bacterial anti-selfish element immunity. More generally, this adds to a growing list of components drawn from secreted effector systems of prokaryotes in eukaryotic immune systems (174,193,245). For example, this closely parallels another structurally unrelated effector domain, the Zn-dependent deaminase (e.g., metazoan AID/APOBEC deaminases) (246). Hence, these observations add further support to our hypothesis that the extensive expansion of effectors in diverse prokaryotic inter-organismal conflict systems served as a reservoir from which eukaryotic immune systems repeatedly acquired components (193,245). We propose that symbiotic associations between the early animals and bacteria resulted in potential interactions via secreted effectors of the latter that aided the former against antagonistic bacteria. This probably led to their eventual acquisition by animals and incorporation into their immune processes.

Origin of Wnt as a signaling molecule

Earlier considerations on the evolution of Wnt signaling indicated that it emerged at the base of the metazoan lineage and incorporated a wide range of components of different origins (e.g., the HMG domain transcription factor TCF/LEF, the HEAT repeat protein β-catenin and the 7TM receptor Frizzled) (1). However, the provenance of Met-Wnt itself had been mysterious and was seen as a possible example of a metazoan innovation (225). While the Met-Wnt domains possess peculiar structural elaborations (34), its conserved core is a Lipocone domain (Figure 1A). We establish that the progenitor of Met-Wnt emerged as part of the radiation of Lipocone domains in bacteria as effectors deployed in both intracellular and inter-organismal conflict – the Min-Wnt proteins.

Whereas the Min-Wnt proteins are predicted to be secreted toxins, the Met-Wnts underwent an ancestral inactivation through loss of the catalytic residues (Figure 2). However, they retained their ancient involvement in cell-cell interactions as secreted agents. The Met-Wnt residues recognized as essential for the receptor (Frizzled) binding, including the absolutely conserved palmitoleoylated serine residue, are found in the aforementioned Metazoa-specific hairpins and loops (34,36). However, despite their inactivation, the Met-Wnts retain the ancestral substrate-binding pocket (Figures 1A, S2). This raises the possibility that they might be involved in as-yet unexplored interactions with ligands such as lipids.

Our tracing of the provenance of Wnt back to an effector in secreted bacterial toxin systems adds it to a growing list of components in metazoan signaling networks that have been acquired from such systems. For instance, this is also the case with components of the other key metazoan signaling pathway, Hedgehog (247). Here, the Hedgehog protein itself contains an autoproteolytic HINT peptidase domain that was likely drawn from a structurally and functionally cognate domain observed in polymorphic toxin systems (174,247). Further, an intracellular component of the same signaling pathway, Supressor of Fused (SuFu), was derived from a common immunity protein found in polymorphic toxin systems (247). Similarly, the Teneurin/Odd Oz proteins mediating signaling in cell migration, neuronal pathfinding, and fasciculation in Metazoa descended from a polymorphic toxin protein with a C-terminal HNH endonuclease toxin tip (174). In a similar vein, the immunity protein of certain CapCone toxins identified in this study might have given rise to the β-sandwich domain in the eukaryotic centriolar assembly factor SAS-6. These observations suggest that, in addition to immune system components, interactions with symbiotic bacteria also potentially furnished the progenitors of components of eukaryotic signaling and cytoskeletal networks that were central to the emergence of Metazoa as a clade of multicellular eukaryotes (248,249).

Conclusions

Using sensitive sequence and structure analysis, we unify a large, hitherto unrecognized superfamily of enzymatic domains, the Lipocone. By combining analysis of the active site and the structure of the Lipocone domain with contextual information from conserved gene-neighborhoods and domain architectures, we present evidence that members of this superfamily target phosphate linkages in head groups of both classical phospholipids and polyisoprenoid lipids. Specifically, they catalyze reactions such as head group exchange or severing of the head group-diphosphate linkage from the polyisoprenol. We present evidence that these activities have been recruited in a wide range of biochemical contexts, including cell membrane lipid modification, metabolism of peptidoglycan and exopolysaccharide lipid-carrier linked intermediates, lipoprotein modifications, bacterial outer membrane modification, sensing of membrane-associated signals, effector activity in antiviral and inter-organismal conflicts and resistance to antimicrobials. Further, catalytically inactive versions like Met-Wnt have been recruited for signaling roles in Metazoa. We predict the catalytic activity and potential biochemical pathways of numerous representatives for the first time, including some proteins that have remained enigmatic for over two decades, like VanZ.

We identify three notable trends in Lipocone evolution. First, although we reconstruct the ancestral member of the superfamily as being a 4TM integral membrane domain, a large monophyletic subset underwent a dramatic loss of hydrophobicity, transforming them into diffusible versions, including the Wnts and the SAAs (Figure 1C). Second, the superfamily expanded in two major functional niches in bacteria, namely peptidoglycan/exopolysaccharide metabolism and effector domains of both secreted toxins and immune systems (Figure 4). Finally, members of the Lipocone superfamily were acquired on multiple occasions from bacteria by Metazoa and were reused in new functional contexts as signaling messengers and immune factors (Figure 4).

Importantly, our predictions in this regard underscore that much remains unexplored in terms of lineage-specific cell wall and membrane metabolism in prokaryotes. We present several testable biochemical, functional hypotheses for the many poorly understood branches of the superfamily, several of which are being recognized as enzymatic for the first time here. We hope this will also open new avenues of research to fill key gaps in our understanding of lipid metabolism.

Methods

Sequence analysis

Sequence similarity searches were performed using PSI-BLAST (250) and JackHMMER (251) against the NCBI non-redundant protein database (nr) (252) or a version clustered down to 50% sequence identity (nr50). The searches were initiated using the previously identified prokaryotic Wnt (24), with multiple rounds of searches conducted, each using seeds collected from the preceding searches. Clusters based on sequence similarity (percentage identity or bit-score) were generated using MMseqs (253). The clustering parameters were adjusted according to specific goals, enabling redundancy removal, the definition of homologous groups, and the creation of new profiles. Multiple sequence alignments (MSA) were generated using the MAFFT program (254) with the local-pair algorithm, combined with the parameters –maxiterate 3000, –op 1.5, and –ep 0.2, and were manually refined based on structural superpositions and profile-profile comparisons.

Sequence similarity network analysis

The HHalign program (255) was used to perform profile-profile comparisons, with the resulting p-value and e-value scores serving as edges for constructing a superfamily relationship network. This was then analyzed using the Leiden community finding algorithm (37) to detect sub-networks. Network analysis and visualization were performed using the R igraph (256) or Python networkX libraries (257).

Comparative genomics, domain identification, and phylogenetic analysis

Genomic neighborhoods were obtained from genomes available in the NCBI Genome database (252) using in-house scripts written in Perl and Python. Conservation analysis of these genomic neighborhoods was performed by clustering the protein products of neighboring genes. Domain identification was conducted using a collection of HMMs and PSSMs maintained by the Aravind lab, along with HMMs from the Pfam database (258), utilizing the RPSBLAST (259) and HMMSCAN (260) programs. To further refine detection, domain identification was extended through remote homology analysis using the HHpred (261) program, against profiles built from the Pfam (258) and PDB70 (262) databases. Phylogenetic analyses were performed using FastTree (263) and iqTREE2 (264). Experimental functional data for characterized members of the superfamily were collected with the assistance of the ChatGPT language model (https://chat.openai.com). Structural comparisons, along with shared genomic associations, were used to further refine the interrelationships within and between the groups of the superfamily.

Families with broader presence across multiple major lineages (“phyla”) and deeper conservation within each of those lineages were inferred to be more ancient. In contrast, those with a more limited phyletic spread and/or limited depth of occurrence within each major lineage were likely later derivations (Figures 4,S1). We formalized this inference by calculating a phyletic metric for the Lipocone clades (Figure S1) comprised of both the phyletic spread and depth. The phyletic spread 𝑆𝑖 of the ith Lipocone clade was computed thus:

Where 𝑚𝑖 is the number of lineages with at least one representative of the Lipocone clade 𝑆𝑖, and M is the total number of lineages examined. The phyletic depth 𝐷𝑖 of the ith Lipocone clade was computed as a weighted average of its occurrence within each lineage in the form of the mediant:

where nj is the number of species in lineage j with a Lipocone domain of the ith Lipocone clade and Nj is the total number of species sampled in lineage j. Si and 𝐷𝑖 are plotted as a bar graph with 𝑆𝑖 as its width and 𝐷𝑖 its height.

Contextual network construction

Each domain architecture and conserved gene neighborhood was decomposed into its constituent domains. These domains were then labeled for their biochemical function and stored as a YAML file (Supplementary Data). The contextual connections were then rendered as a graph with the domains as its nodes and the adjacency relationships as its edges. Cliques containing a given Lipocone domain were detected in this graph and merged to constitute their respective dense subgraphs. These subgraphs were then examined for the statistically significant prevalence of particular labeled functions using the Fisher exact test. Network analysis was performed using the functions of the R igraph or Python networkX libraries.

Structure analysis

Protein structures were modeled using Alphafold3 (265), with visualization and manipulation performed using either MOL* (266) or PyMOL. Structural similarity searches were conducted using the DALIlite (267) and FOLDSEEK (268) programs. DALIlite was also used to generate structural alignments.

Hydrophobicity analysis

To create the membrane propensity plots, for each protein Pi in a given family, we compute the average TM-propensity of its amino acids using the TM tendency scale (38). This score Hi for Pi is calculated as:

where hj is the TM tendency of the j-th amino acid in the protein Pi, and n is its total length in amino acids. The Kruskal–Wallis nonparametric test was applied to assess whether TM propensity scores differed across the 30 groups. As the Kruskal–Wallis test indicated a significant difference (p<0.05), we performed post-hoc pairwise comparisons using Dunn’s test with Bonferroni correction to control for multiple testing. Group-wise visualizations were presented using critical difference diagrams, where groups not connected by horizontal bars are significantly different (adjusted p<0.05) (Figure S3).

Acknowledgements

This research was supported by the Division of Intramural Research at the National Library of Medicine (NLM), National Institutes of Health (NIH). This research was supported in part by an appointment to the NLM Research Participation Program administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and the NLM.

Additional files

Supplementary tables and figure