Sequence clustering reveals similarities among the 7TMs of GPCRs from choanoflagellates, metazoans, and other eukaryotes.

(A) Choanoflagellates are the closest living relatives of metazoans. Shown is a consensus phylogeny of the major lineages analyzed in this study. We use the term “close relatives of metazoans” (CRM) to denote the paraphyletic group of non-metazoan holozoans that includes choanoflagellates, filastereans, ichthyosporeans, and corallochytreans. Organism silhouettes are from PhyloPic (http://phylopic.org/). (B) Most choanoflagellate GPCRs cluster with GPCRs from metazoans and other eukaryotes. The 918 choanoflagellate GPCRs (circles) identified in this study were sorted into clusters based on sequence similarity of their 7TM domains and the 7TM domains of metazoan, amoebozoan, chlorophyte, stramenopile, and alveolate GPCRs. Connecting lines (light grey) correspond to pairwise BLAST scores of p-value <1e-6. With the exceptions of RSF, GPCR TKL/K, GPRch1, and GPRch2, all choanoflagellate GPCR clusters contained metazoan GPCRs and, in most cases, GPCRs from other eukaryotes. No choanoflagellate GPCRs clustered with metazoan Secretin GPCRs, Frizzled GPCRs, or GPR143 GPCRs. The collection of choanoflagellate GPCRs shown as open circles did not meet the statistical threshold for designation as clusters. All the GPCR sequences used in this analysis are provided in Supplementary File 6.

© 2021, Thibaut Brunet. The Syssomonas multiformis organism silhouette is available under a CC BY-SA 3.0 license. Reproduction of this icon must abide by the terms of this license.

List of 18 GPCR clusters identified in choanoflagellates.

Evolutionary history of GPCR families detected in choanoflagellates, metazoans, and other eukaryotes. (Left)

Taxonomic distribution of GPCR families. Shown in the table are the numbers of GPCR family members (columns) detected in each lineage (rows). The GPCR families are grouped based on their inferred phylogenetic distributions in choanoflagellates: pan-choanoflagellate GPCRs (light blue), GPCR families found in craspedids but not acanthoecids (orange), GPCR families found in acanthoecids but not craspedids (pink), and GPCRs found only in S. macrocollata and S. punica (dark blue). Adhesion, Glutamate, Rhodopsin, and Frizzled, four of the five GRAFS GPCR families, are indicated in bold. Asterisks indicate GPCR families that were not known to exist in choanoflagellates prior to this study. For those species in which only transcriptomes are available, gene numbers represent a minimum. nd = not detected in lineages for which only transcriptome data are available. For species with both transcriptome and genome data (S. rosetta, B. monosierra, and M. brevicollis; G enclosed within a black circle) (King et al. 2008; Fairclough et al. 2013; Hake et al. 2024), failure to detect a GPCR subfamily member is indicated with a “0”. (a) only found in nucleariids, (b) lost in vertebrates, (c) only found in chlorophytes, (d) only found in sponges, (e) only found in amoebozoans, (f) only found in Syssomonas multiformis. (Right) A consensus phylogeny shows the relationships among the 23 choanoflagellates included in this study, metazoans, filastereans, ichthyosporeans, corallochytreans, holomycotans, and diverse other eukaryotes. The inferred origins (vertical green rectangle) and subsequent losses (red cross) of GPCR families are indicated at relevant branches on the consensus phylogeny. GPCR families inferred to have originated in the Last Eukaryotic Common Ancestor (LECA) are represented at the root of the phylogeny (box). The presence of additional GPCR families in LECA, not covered in our study, is depicted by three dots. “GRA” indicates three of the GRAFS GPCR families: Glutamate, Rhodopsin, and Adhesion. “GOST” indicates subfamilies of GOST GPCRs. Additional GPCR families are listed under “Other.” Uncertainty about the ancestry of GPRch2 is indicated with a question mark.

Evolution of Glutamate Receptor protein domain architecture.

(A) Schematic representation of the main protein domains found in metazoan the Glutamate Receptor family. GluR/T1R/CaSR receptors have a conserved extracellular module containing an ANF ligand-binding domain fused to a cysteine-rich NCD3G domain. GABABR GPCRs contain only the ANF ligand binding domain. mGlyR/GPR179 GPCRs contain a Cache ligand-binding domain fused to an EGF-like domain. (B) Phylogenetic analysis of the 7TM domains of metazoan, choanoflagellate, filasterean, ichthyosporean, and amoebozoan Glutamate Receptors yielded three well-supported clades of GPCRs corresponding to the receptors depicted in panel A. Filasteran and ichthyosporean sequences (orange) clustered as the sister group to metazoan mGluR/CaSR/T1R, mGlyR/GPR179, and GABAB GPCRs, or branch separately from the rest of GPCRs used in this analysis. Choanoflagellate sequences (light blue) branch as the sister group to the metazoan mGluR/CaSR/T1R GPCRs. No amoebozoan sequence (pink) clustered with metazoan or CRMs. UFboot support values are indicated for each three major nodes of this unrooted maximum-likelihood phylogenetic tree. All the sequences used to build this phylogeny are listed in Supplementary Files 8 and 9. (C) Phylogenetic distribution of ECDs detected in Glutamate Receptors from diverse opisthokonts. While the ANF and EGF protein domains evolved before the diversification of opisthokonts, the presence of the NCD3G and Cache protein domains in GPCRs was not detected in any non-metazoan. Similarly, the ANF/NCD3G and Cache/EGF protein domain modules were only detected in metazoans. Glutamate Receptors from non-metazoans contain diverse extracellular protein domains that are not found in metazoan GPCRs. Protein domains (left) or modules containing pairs of protein domains (right) are indicated in the columns. Taxonomic distribution is indicated in the rows. Protein domains found in metazoan Glutamate Receptors – ANF, EGF, NCD3G, and Cache – are indicated in bold. See Supplementary File 10 for a list of all Glutamate Receptors screened in this analysis.

Sequence and structural similarities and differences of choanoflagellate Rhodopsins and metazoan opsins.

(A) Rhodopsin diversity in metazoans; adapted from (Fredriksson et al. 2003; Cardoso et al. 2012; Lv et al. 2016). Rhodopsins cluster into four main groups – α (light blue), β (green), γ (yellow), and δ (maroon). The Somatostatin/Opiod/Galanin group of Rhodopsins is indicated as SOG. Opsins and SOG receptors are indicated in bold for reference. (B) Choanoflagellate Rhodopsins cluster with opsins and SOG receptors from metazoans. All-against-all pairwise comparison of the choanoflagellate Rhodopsins with the complete Rhodopsin repertoire of representative metazoans revealed that choanoflagellate Rhodopsins are most similar to opsins and SOG. Shown is the local sequence similarity network, with nodes indicating Rhodopsin sequences and lines indicating BLAST connections of p-value < 1e-12. Choanoflagellate Rhodopsins are shown in grey (S.macrocollata_m.143379 and S.punica_m.44256), metazoan opsins in blue ((Placopsin 1_T.adherens (A0A369S8C3), Placopsin 2_Trichoplax sp (XP_002114592.1), Tmt opsin a_D.rerio (A0A2R8Q4C0), Tmt opsin b 1_M.mola (ENSMMOP00000004804.1), Tmt opsin b 2 _D.rerio (A0A2R8Q4C0) Opsin_3 A.platyrhynchos (XP_012958902.3), Arthropsin 8_D.pulex (EFX84032.1), Go Opsin_B.floridae (DAC74052.1)), and metazoan SOG receptors in orange (Somatostatin receptor_B.floridae (XP_032829507.1), Galanin 1_H.sapiens (P47211)).The complete dataset of 6149 Rhodopsins used in this analysis is provided in FASTA format in Supplementary File 11. See Supplementary File 12 for the full analysis. (C and D) Protein structure predictions link choanoflagellate Rhodopsins to metazoan opsins. (C) The predicted structure of S. macrocollata Rhodopsin most closely matches that of opn4a/Melanopsin-A from Danio rerio. Shown is the predicted structural similarity of S. macrocollata Rhodopsin (m.143379; pink) with Foldseek top hit (E-value: 6.26e- 13) opn4a/Melanopsin-A from Danio rerio (AF-Q2KNE5-F1-model_v4; blue). Low confidence regions (>70 pLDDT) were removed for clarity. Shown are views of the superimposed models from the plane of the membrane (top) and from the extracellular perspective (bottom). (D) The predicted structure of S. punica Rhodopsin most closely matches that of opn4/Melanopsin from humans. Predicted structural similarity of S. punica Rhodopsin (m.44256; pink) with Foldseek top hit (E-value: 8.34e-13) opn4/Melanopsin from humans (AF-Q9UHM6-F1-model_v4; blue). Low confidence regions (>70 pLDDT) were removed for clarity. Shown are views of the superimposed models from the plane of the membrane (top) and from the extracellular perspective (bottom). (E) Alignment showing the conservation of functionally important motifs in metazoan, choanoflagellate, ichthyosporean, and holomycotan Rhodopsins. The alignment includes diverse metazoan opsins, three consensus sequences of human non-opsin Rhodopsins (Aminergic, Peptide, and Lipid Rhodopsins; Supplementary File 14), the two choanoflagellate Rhodopsins identified in this study (highlighted in bold), one ichthyosporean Rhodopsin, and three representative Rhodopsins from holomycotans. Residues identified as being critical for Rhodospin protein structure and function are shown. These include: a conserved Aspartic acid (D) at position 83 in the transmembrane helix 2 (TM2); two conserved Cysteines (C; orange) at positions 110 and 187 that are involved in disulfide bond formation; and the conserved E/DRY, CWxPY, and NpxxY(x)5,6FR motifs (green), located in TM3, TM6, and TM7/H8, respectively (Davies et al. 2010; Nagata and Inoue 2021). These three motifs are essential for G protein interaction and to control the activity of the Rhodopsins. In addition, residues that are specific to opsins are also depicted: Glutamic acid (E)181, which acts as a counterion to the protonated Schiff base (Davies et al. 2010; Hankins et al. 2014; Nagata and Inoue 2021), Serine (S)186 in extra-cellular loop 2 (EL2), and the highly conserved Lysine (K) at position 296 (blue) in TM7, that is almost universally found across all metazoan opsins (Gühmann et al. 2022; McCulloch et al. 2023). Lys(K)296 is required for covalent binding to the 11-cis retinal chromophore (Devine et al. 2013). Notably, the two choanoflagellate Rhodopsins show a Lys296Ser (Salpingoeca_macrocollata_m.143379) and a Lys296Val (Salpingoeca_punica_m.44256) substitutions, suggesting that these Rhodopsins may not have light-responsive functions. Canonically conserved functional residues and positions follow bovine Rhodopsin numbering (Nathans and Hogness 1983). The consensus sequences of the three non-opsin subfamilies of human Rhodopsins (Aminergic R, Peptide R, and Lipid R) were downloaded from GPCRdb (https://gpcrdb.org/). (+) symbol is used when consensus sequences cannot be resolved at a given position (ambiguity). The 36 human Aminergic receptors, 76 human Peptide receptors, and 36 human Lipid receptors aligned to build these consensus sequences are provided in Supplementary File 14.

Evolution of aGPCR protein domain architecture.

(A) Although aGPCRs were present in LECA (Fig. 2), phylogenetic analysis of holozoan aGPCRs revealed that they diversified independently in metazoans and CMRs. Most metazoan (blue), choanoflagellate (orange), and filasterean (red) aGPCRs formed distinct clades in our analysis, suggesting an absence of orthologous relationships between the 7TM region of the aGPCRs from these three clades. A notable exception was the metazoan ADhesion G protein-coupled Receptor V (ADGRV) GPCRs that grouped with choanoflagellate 7TM sequences (dotted grey box; 83% bootstrap support for ancestral node, magenta circle), suggesting they are orthologous. In addition, we also observed that members of the metazoan ADhesion G protein-coupled Receptor A (ADGRA) subfamily (asterisks) tended to group with choanoflagellate aGPCRs but either lacked reliable confidence value support or were not systematically recovered in all the inferred phylogenies. A subset of filasterean aGPCRs clustered with a set of uncharacterized cnidarian and cephalochordate receptors (bootstrap support 98%, green circle). This maximum-likelihood phylogenetic tree infers the evolutionary history of the 7TM domain of 329 choanoflagellate aGPCRs identified in this study, along with 76 filasterean aGPCRs and 253 metazoan aGPCRs. The 7TM sequence of a ciliate aGPCR (Stentor_coeruleus_OMJ80129.1_19670) was also included in the analysis and used as an outgroup to root the tree. The width of branches scales with UFboot support for the ancestral node. Branch lengths do not scale with evolutionary distance in this rendering. All the 7TM sequences used in this phylogenetic reconstruction are found in Supplementary File 17, and the fully annotated version of this phylogenetic tree, including bootstrap values, branch lengths, and all species names, is found in Supplementary File 18. (B) Protein domain architecture of metazoan aGPCRs. Like all GPCRs, aGPCRs contain a 7TM domain (represented by seven barrels) that anchors the protein in a lipid bilayer. What separates aGPCRs from other GPCRs is their possession of a diagnostic autoproteolytic GAIN domain (dotted oval) containing a proteolysis site (GPS, blue) and a hydrophobic tethered agonist peptide (TA, orange). The cleavage site in the GPS represents the boundary between the N-terminal fragment (NTF) and the C-terminal fragment (CTF). Many, but not all, aGPCRs also contain an HRM domain (salmon) within a few hundred amino acids of the GPS (Fig. S10C) (Prömel et al. 2012b; Araç D, Sträter N 2016)In metazoans, the NTF often contains a diversity of additional extracellular protein domains (ECDs, represented by grey star, hexagon, and square) that likely contribute to the diversity of ligands bound by aGPCRs (Araç and Leon 2019; Knierim et al. 2019). (C) Diversity and conservation of protein domains in the NTFs of different choanoflagellate aGPCR families. Like metazoan aGPCRs, nearly all choanoflagellate aGPCRs contain a GAIN domain with conserved GPS and TA motif (Fig. S9). Many aGPCR subfamilies identified through their 7TM were also distinguishable through their characteristic combinations of N-terminal protein domains (Fig. S7A). Shown here are six subfamilies – I, V, VI, VIII, XIII, and XV – whose members frequently share one or more protein domains. While subfamilies I, V, VIII, XIII, and XIV are seemingly unique to choanoflagellates, subfamily VIII is notable because the full-length proteins (both the NTFs and CTFs) are homologous to those of the ADGRV subfamily in metazoans (Fig. 5A and Fig. S8). Percentages indicate the number of members within a given subfamily possessing a conserved protein domain (subfamilies I,V,VI,VIII, and XV) or a conserved combination of protein domains (subfamily XIII). All the choanoflagellate aGPCR NTF sequences used in these analyses are found in Supplementary File 19 (D) Hypothesized evolution of the HRM/GAIN/7TM module and additional ECDs in aGPCRs. For four nodes on the eukaryotic tree of life – (1) the last common ancestor of filastereans and choanozoans, (2) stem choanozoans, (3a) stem metazoans, and (3b) stem choanoflagellates – we reconstructed the phylogenetic distribution of diverse protein domains in the NTFs of aGPCRs. Most aGPCRs in metazoans, choanoflagellates, and filastereans contain a protein domain module composed of a GAIN domain and a 7TM. Additionally, the HRM/GAIN/7TM module, which is less common than the GAIN/7TM module in aGPCRs, is nonetheless conserved across most metazoan, choanoflagellate, and filasterean lineages. The linkage of the GAIN and 7TM domains likely occurred in stem holozoans, as today the two domains are restricted to the aGPCRs of extant holozoans, although the two domains are encoded in other genes in diverse non-holozoans (Araç et al. 2012; Krishnan et al. 2012; De Mendoza et al. 2014), suggesting that they originated in earlier branches of the eukaryotic tree. The linkage of the HRM domain with the GAIN/7TM module likely occurred in the last common ancestor of filastereans and choanozoans. The HRM domain is largely restricted to holozoans, although we found six non-aGPCR proteins containing HRM domains in the proteome of the nucleariid Fonticula alba (Fig. S10D). Therefore, HRMs either evolved in stem holozoans and were incorporated into the nucleariids by horizontal gene transfer, they are homologous in the two lineages and were lost from most holomycotans, or the similarities of HRMs in F. alba and holozoans are the result of convergent evolution. The repertoires of ECDs inferred for each node – (1), (2), (3a), and (3b) – are shown in grey boxes labeled accordingly. Finally, ADGVR likely evolved in the last common ancestor of filastereans and choanozoans.

Taxon sampling and data sources for genomes and transcriptomes analyzed in this study

Pipeline for identifying GPCRs in choanoflagellate genomes and transcriptomes.

(1) GPCR candidates were searched in the genome- and transcriptome-derived proteomes of 23 choanoflagellate species using Hidden Markov model(s) (HMMs) unique to 54 GPCR families in eukaryotes (GPCR_A Pfam clan CL0192) or a global HMM profile that matches the common topology of GPCRs irrespective or their families (GPCRHMM). The comparison of the two sets of candidate GPCRs obtained through these two independent approaches revealed 381 duplicate candidates, which were merged, leaving a final set of 1784 candidates. (2) The 1784 candidate choanoflagellate GPCRs were then subjected to a validation step where false negatives and highly truncated sequences were filtered out using the search tools BLASTp, Interproscan, CDvist, and the transmembrane domain predictor TMHMM (see Methods). Alphafold-predicted structures of the GPCR candidates and subsequent structure homology search complemented the analysis. Finally, we used CD-HIT to discard isoforms. A total of 1113 sequences were removed from our dataset, leaving 671 validated choanoflagellate GPCRs. (3a, 3c) To assess the diversity of choanoflagellate GPCRs, we then clustered the 671 GPCRs based on all-against-all pairwise sequence similarities using the clustering tool CLANS. Most GPCRs (629 receptors) were sorted into 18 clusters, with a minority of them (76 receptors) sharing no sequence similarities with any other GPCRs in the dataset. The same protocol (3a) was re-iterated in (3c), after refining the choanoflagellate GPCR dataset (3b) and including diverse metazoan, amoebozoan, stramenopile, alveolate, and chlorophyte GPCRs to the analysis. (3b) Because the HMMs used in our first round of screening were mostly based on seed alignments of metazoan sequences, we built choanoflagellate cluster-specific GPCR HMMs for each of the 18 choanoflagellate GPCR clusters previously identified (3a) to detect possible additional choanoflagellate GPCRs (Supplementary File 2). We also built individual HMMs for the GPCRs that did not fall into the 18 clusters. These HMMs were used to search the 23 choanoflagellate proteomes again. In parallel, we also selected choanoflagellate GPCR sequences from each GPCR cluster or single GPCR sequences in the case of GPCRs that did not group with other GPCRs, to use as BLAST queries against the 23 choanoflagellate proteomes. The resulting candidates were run through the filtration process in step (2), leaving 247 new GPCRs that were not predicted with the metazoan-biased HMMs. Added to the original 671, this gave a total of 918 GPCRs from the 23 choanoflagellate species (Supplementary File 1). (4) Phylogenetic trees were inferred for various choanoflagellate GPCR families, including aGPCRs and Glutamate Receptors, using IQ-TREE and MrBayes.

Total number of GPCRs across choanoflagellates varies by species and seems to correlate with phylogenetic affiliation.

GPCR numbers tend to be higher in acanthoecids than in craspedids. Shown are GPCR numbers per species, mapped to a consensus phylogeny adapted from (Carr et al. 2017b; Ginés-Rivas and Carr 2025b). Choanoflagellate GPCR numbers range from 16 in B. monosierra to 122 in A. spectabilis. These are approximate numbers, as the transcriptomes may be incomplete and/or the predicted proteomes may include splice variants.

Conservation of metazoan GPCR transducers in choanoflagellates.

(A) Schematic representation of the core GPCR signaling pathway and associated regulators. Upon binding to a ligand, GPCRs activate downstream heterotrimeric G proteins (blue) that dissociate, giving free Gα and Gβγ subunits that transduce the signal independently from each other (grey arrows). Additional regulators ensure the fine-tuning of GPCR signaling by either promoting G protein signaling (green) or inhibiting it (red) (see Supplementary text for more information). While heterotrimeric G proteins are the canonical signal transducers downstream of GPCR activation, Arrestins act as important signaling hubs (Gurevich and Gurevich 2019). (B) Phylogenetic distribution of heterotrimeric G proteins (Gα, Gβ, and Gγ; blue), negative regulators of G protein signaling (Arrestin, GRK, RGS, and GoLoco; red) and positive regulators of G protein signaling (Ric8 and Phosducin; green) within choanoflagellates. The number of proteins recovered in our analysis that correspond to each of these different categories is specified in the matrix for each choanoflagellate species. The GPCR signaling pathway is conserved in most choanoflagellate species. nd = not detected in lineages for which only transcriptome data are available. For species with both transcriptome and genome data (S. rosetta, B. monosierra, and M. brevicollis) (King et al. 2008; Fairclough et al. 2013; Hake et al. 2024), failure to detect a GPCR subfamily member is indicated with a “0”. All the choanoflagellate sequences identified in each of these categories are provided in Supplementary File 3. (C) Schematic representation of major pathways activated by different Gα subunits. Gα proteins can be subdivided into five main families (Gαs, Gαi/o, Gαq/11, Gα12/13, and Gαv) with different signaling properties (Marinissen and Gutkind 2001; Oka and Korsching 2009; Feng et al. 2022; Liu et al. 2024). Gαs proteins activate Adenylyl cyclase, resulting in an accumulation of intracellular cAMP and activation of protein kinase A (PKA). In contrast, the activated Gαi/o inhibits Adenylyl cyclase. Gαi/o proteins also stimulate phospholipase C-β (PLCβ) that generates diacylglycerol (DAG) and inositol trisphosphate (IP3), eventually regulating Ca2+ signal and protein kinase C (PKC) activity; a function shared with Gαq/11 subunits. Gα12/13 controls actomyosin contractility through the activation of the Rho/Rho kinase signaling pathway. Little is known about signal transduction downstream of Gαv which could control ion homeostasis (Abu Obaid et al. 2024). (D) Phylogenetic distribution of the five Gα subunit families within choanoflagellates. While Gαq/11 and Gαv are shared by all 23 choanoflagellate species, Gαi/o and Gα12/13 were not detected in our dataset. We identified Gαs subunits in a monophyletic clade encompassing S. kvevrii, S. urceolata, S. macrocollata, and S. punica. Failure to detect Gαs in S. helianthica and M. fluctuans could be due to false negatives or a secondary loss in the ancestor or these two sister species. Gα proteins that were not assigned to any of these five families were counted as unclassified. The assignment of choanoflagellate Gα proteins to one of the five Gα subunit families was based on the maximum likelihood phylogenetic tree inferred by the G protein alpha subunit from both choanoflagellates and known Gα metazoan subunits (Supplementary Files 4 and 5).

aGPCRs dominate the GPCRome of most choanoflagellates analyzed.

Donut charts show the number and proportion of genes in the main GPCR families – Rhodopsin, Adhesion, Glutamate, cAMP, GOST (including TMEM87, GPR107/108, TMEM145, GPR180, and Hi-GOLD), GPR137, GPR155, GPRch3, RSF, GPCR TKL/K and Other GPCRs – for each of the 23 choanoflagellate species analyzed. Adhesion receptors represent, on average, 47% of the GPCRs encoded in these choanoflagellate species. The presence of a large repertoire of GPCR TKL/Ks in acanthoecids contributes, in part, to the larger size of their GPCRomes. The total number of GPCRs detected is shown in the center.

Ichthyosporean Rhodopsin shares structural similarities with metazoan peptide receptor Rhodopsin.

Predicted structural homology of model Pirum_gemmata_evm1s17431 (green) with Foldseek top hit (E-value: 4.92e-9) Pyroglutamylated RFamide peptide receptor (QRFPR)_Piaya_cayana (AF-A0A850WKN5; pink). A cytoplasmic helix 8 (H8), found in most metazoan Rhodopsins, is also predicted in the ichthyosporean rhodopsin. Low confidence regions (>70 pLDDT) were removed for clarity in both models. View of the superimposed models from the plane of the membrane (left and center) and the top (right).

Diversity of aGPCR subfamilies in choanoflagellates.

(A) Phylogenetic analyses of the 7TMs of choanoflagellate aGPCRs reveal the presence of at least 19 subfamilies, labeled I-XIX. Maximum-likelihood inference (left tree) and Bayesian inference (right tree) include full 7TM sequences of 229 choanoflagellate aGPCRs along with a 7TM sequence of a ciliate aGPCR (Stentor_coeruleus_OMJ80129.1_SteCoe_19670), used as an outgroup to root the trees. The 7TM sequences were aligned with MAFFT V7.463 using the E-INS-I algorithm, and phylogenies were built using either IQ-TREE or MrBayes v3.2.7a (see Methods). The two phylogenies cross-verified the grouping of most aGPCRs into the same subfamilies (depicted with the same color code in the two trees; see legend). The evolutionary relatedness between most aGPCR subfamilies could not be determined with the Bayesian approach (polytomy). The width of branches indicates scales with UFboot support (left tree) or Bootstrap support (right tree) for the ancestral node. Branch lengths do not scale with evolutionary distance in this rendering. Clades poorly supported (< 70% Bootstrap support) or composed of sequences from a single choanoflagellate species were collapsed in this rendering (grey triangles). All the sequences used to build the trees are listed in Supplementary File 15. The fully annotated version of the two phylogenies, including bootstrap values, branch lengths, and all species names, are found in Supplementary File 16. (B) aGPCR subfamilies differ in their phylogenetic distribution across the choanoflagellate diversity. Subfamilies VIII, XIII, XI, IX, XVII, VII, and X are detected in diverse craspedid and acanthoecid choanoflagellates – therefore they were probably present in stem choanoflagellates. Notably, the subfamily VIII is the most widespread of all choanoflagellate aGPCR subfamilies and is conserved in filastereans and metazoans (Ansel et al. 2024). In contrast, subfamilies III, XV, XIX, II, I, XVIII, and XVI are restricted to craspedids. While subfamily III is broadly conserved across the diversity of craspedids, subfamilies XV, XIX, II, I, XVIII, XVI are confined to distinct subsets of craspedids. Finally, subfamilies IV, V, VI, XII, and XIV were only observed in acanthoecids and, therefore, might be acanthoecid-specific. Asterisks (*) indicate aGPCR subfamilies that were recovered in only one of the two phylogenetic reconstructions (see (A)).

Diversity and evolution of aGPCR extracellular protein domains in choanoflagellates and other opisthokonts.

(A) Subfamilies of choanoflagellate aGPCRs exhibit distinct N-terminal protein domain signatures. The matrix summarizes the systematic identification of the protein domains (rows) present in the extracellular region of the members of the 19 choanoflagellate aGPCRs subfamilies previously established (columns) (Fig. S6). Percentages represent the number of domain-containing genes per aGPCR subfamily. For example, 52% of aGPCRs from the subfamily I contain a Hyalin Repeat domain (HYR) in their N-termini while mostly absent or underrepresented in the other aGPCR subfamilies. The GAIN domain, present in all aGPCRs, is not shown in this matrix. All the aGPCR NTFs analyzed are found in Supplementary File 19. (B) Phylogenetic distribution of protein domains in the N-termini of aGPCRs in opisthokonts. While non-holozoan aGPCRs do not possess conserved extracellular protein domains, most holozoan lineages, with the possible exception of ichthyosporeans, encode additional domains in their N-termini. Clear recruitment of ECDs is observed in the NTF of filasterean, choanoflagellate, and metazoan aGPCRs, with domains being conserved either in most of these lineages (light grey), in a subset of these lineages (e.g black for choanozoan-specific domains), or being lineage-specific (dark grey).

Independent diversification of most aGPCR NTFs in metazoans and CRMs.

Pairwise similarity-based clustering analysis of the N-terminal fragment (NTF) of 1074 metazoan (blue), 337 choanoflagellate (orange), and 81 filasterean/ichthyosporean aGPCRs. The NTF of most metazoan aGPCRs cluster separately from choanoflagellate or other holozoan NTFs. A noticeable exception is observed for metazoan NTFs from the ADGRV family, which group with choanoflagellate and filasterean NTF sequences (delineated with a dotted oval). The choanoflagellate NTFs that cluster with metazoan ADGRV NTFs belong to the aGPCR subfamily VIII, which we found to be orthologous to metazoan ADGRV based on their 7TM domain (Fig. 5A). Edges correspond to BLAST connections of P value <1e-20. All the NTF sequences used in the clustering analysis are available in Supplementary File 19.

Conservation of the GAIN domain in metazoans, choanoflagellates, filastereans, and corallochytreans.

(A) Schematic of the 7TM-proximal GAIN domain, which extends into the extramembrane milieu. Highlighted are key tryptophans (orange Ws) and cysteines (green Cs) flanking the proteolytic GPS motif (red, cleavage site indicated with an arrow) and the hydrophobic tethered agonist peptide (TA, blue). (B) Sequence logos of the metazoan, choanoflagellate, filasterean, and corallochytrean C-terminal GAIN domains. The two canonical cysteines and tryptophans, important for the proper folding of the GAIN domain (Araç et al. 2012), are conserved in choanoflagellates, filastereans, and corallochytreans. Similarly, the autoproteolytic GPS motif and most of the TA consensus sequence, both required for the activation of aGPCRs, are conserved in all these clades (Prömel et al. 2012b; Barros-Álvarez et al. 2022). An alignment of the complete repertoire of aGPCRs (30 sequences) from Mus musculus was used to generate the metazoan logo; 301 aGPCR sequences were used for the choanoflagellate logo; and 89 aGPCR sequences were used for the filasterean logo. Because we recovered only one aGPCR sequence with a GAIN domain in corallochytreans, their logo has no real statistical value and is provided as a qualitative point of comparison. All the sequences used to create the logos are found in Supplementary File 20.

Pre-metazoan origin of the HRM domain and HRM-GAIN-7TM module.

(A) Evolution of HRM-containing aGPCRs and Secretin GPCRs in Holozoa. Some aGPCRs and all Secretin GPCRs possess a Hormone Receptor Motif (HRM) domain (blue oval) in their extracellular region; the HRM domain is always found in combination with a GAIN domain (yellow circle) in aGPCRs while the GAIN domain is absent from the Secretin receptors. Secretin GPCRs likely evolved from aGPCRs (Nordström et al. 2009b; Scholz et al. 2019) and are only found in Cnidarians (this study) and in Bilaterians (Cardoso et al. 2024) (right). In contrast, HRM-containing aGPCRs are more ancient and were detected in metazoans, choanoflagellates, and filastereans (left). (B) Representative illustration of the extracellular region of HRM-containing aGPCRs. The GAIN domain (light grey) sits on the top of the 7TM. The HRM domain (dark grey rectangle) is always positioned N-terminal to the GAIN domain. The distance separating the C-terminal end of the HRM domain from the start of the 7TM (HRM-7TM distance) is depicted with a double-headed arrow and is assessed in panel (C). Diverse additional protein domains (ECDs), distal to the HRM/GAIN/7TM module, are often found in HRM-containing aGPCRs and are not depicted here. (C) Conserved HRM-7TM distance in filasterean, choanoflagellate, and human HRM-containing aGPCRs. We measured an average HRM-7TM distance of 282 aa, 335 aa, and 326 aa in HRM-containing filasterean, choanoflagellate, and human aGPCRs, respectively. In contrast, human Secretin GPCRs show an average HRM-7TM distance of 14 aa due to these receptors’ absence of the GAIN domain. Four filasterean, 22 choanoflagellate, 12 human HRM-containing aGPCRs, and 15 human Secretin GPCRs were assessed in this analysis. (D) Phylogenetic distribution of HRM and HRM/7TM module across eukaryotes. HRM is found in Holozoa and in the nucleariid Fonticula alba, one of the closest relatives of Fungi (Galindo et al. 2019). In contrast, the association of an HRM domain with 7TM (HRM/7TM) is only observed in holozoans – metazoans, choanoflagellates, and filastereans. The five HRM-containing proteins identified in Fonticula alba are provided in Supplementary File 21.

GPR157 is an ancient GPCR family conserved in eukaryotes.

(A) Phylogenetic distribution of GPR157 across eukaryotes. GPR157 likely forms an ancient GPCR family in eukaryotes with homologs detected in Holozoa, Holomycota (only present in nucleariids), Amoebozoa, Apusozoa, Rhizaria, Discoba, and Cryptophyceae. Homologs were searched by BLASTing murine GPR157 (Q8C206) sequence against the entire dataset (993 species) of the EUKPROT v3 BLAST server (E-value: 1e-5) (https://evocellbio.com/eukprot/; (Richter et al. 2022)). We defined bona fide GPR157 hits as those with at least 70% query coverage and at least 30% sequence identity. All EUKPROT blast hit sequences are listed in Supplementary File 22. (B) Predicted structural homology of the 7TM region of H.sapiens_GPR157_Q5UAW9 (orange) and C.hollandica_m.352146 (blue) receptors. An additional helix 8 (H8) is predicted in both metazoan and choanoflagellate GPR157. All structural models shown here have a confidence score >70 pLDDT. View of the superimposed models from the plane of the membrane (left) and the top (right). (C) Multi-alignment of the 7TM/H8 domain of GPR157 from various eukaryotes reveals motif conservation, including a “LLxxLSL/V” motif in transmembrane helix 2 (TM2) and a “WCWI/V” motif in the extracellular loop 2 (ECL2) (dotted boxes).

Predicted structural similarities among metazoan and choanoflagellate GOST family receptors.

Experimentally determined human TMEM87A structure (PDB:8CTJ; (Hoel et al. 2022)) and AlphaFold3 (AF3) predicted structures of representative proteins from the five GOST subfamilies detected in choanoflagellates (GPR107/108, GPR180, TMEM87A, TMEM145, and the newly identified Hi-GOLD) shown as cartoons, viewed from the plane of the membrane. Low confidence (<70 pLDDT) regions of predicted structures have been removed. Both metazoan and choanoflagellate GOST GPCRs exhibit a GOLD domain (orange) on top of a 7TM domain (blue).

Features of choanoflagellate GPCR PIPKs.

(A) A schematic depiction of a GPCR PIPK. GPCR PIPKs present a canonical 7TM domain combined with a phosphatidylinositol phosphate kinase (PIPK; pink hexagon) at the C-terminus (Van Den Hoogen et al. 2018; van den Hoogen and Govers 2018). While experimental evidence is lacking, the PIPK domain of GPCR PIPK receptors is likely to signal via the production of phospholipid-based second messengers (e.g PIP2). (B) Alphafold-predicted structure of a choanoflagellate GPCR PIPK (S.urceolata_m_147488) with the 7TM and PIPK domains depicted in green and pink respectively. Regions with a low prediction score (<70 pLDDT) are depicted in white. (C) Multiple sequence alignment showing the conservation in choanoflagellates of the diagnostic GPCR PIPK motif “LR(x)9GI” in the linker region separating the 7TM domain from the PIPK domain.

Structural features of GPRch3 and GPRch1 GPCRs.

(A) Predicted structural homology of the 7TM domain from choanoflagellate GPRch3 GPCR S.kvevrii_m.11301 (orange) and its top structural metazoan hit A.vulgaris_A0A0B6Y979 (blue) (E-value: 4.64e-13). View of the superimposed models from the plane of the membrane (left) and the top (right). All structural models shown here have a confidence score >70 pLDDT. (B) Superimposed structures of full-length choanoflagellate GPRch1 GPCR S.urceolata_m.56126 (blue) with its top structural hit T.trahens_A0A0L0DRJ2 (orange) (E-value: 5.92e-10). While ciliates, along with other non-choanozoan eukaryotes that encode GPRch1 GPCRs, possess an additional PIPK domain in the C-terminal region of the GPCR, no GPRch1 GPCRs appeared to have this additional cytosolic enzymatic domain in choanoflagellates. Regions with a low prediction score (<70 pLDDT) are colored in white.

Conservation of metazoan GPR155 in choanoflagellates.

(A) Predicted structure of a full-length choanoflagellate GPR155 receptor (C. perplexa_m.118130). Similar to metazoan GPR155 (Shin et al. 2022; Bayly-Jones et al. 2024)), choanoflagellate GPR155 possesses a 7TM domain (pink) fused to a transporter domain (green), and an additional Dishevelled, EGL-10 and Pleckstrin (DEP) domain (orange) in the C-terminal region. Regions with a low prediction score (<70 pLDDT) are depicted in white. (B) Structural homology of the Transporter/7TM module from the alphafold-predicted choanoflagellate GPR155 structure (C. perplexa_m.118130 ; dark grey) and the experimentally solved human GPR155 structure (PDB: 8u54 ; light grey). View of the superimposed models from the plane of the membrane (top) and the top (bottom). (C) Superimposed structures of the DEP domain from choanoflagellate (C. perplexa_m.118130 ; dark grey) and human (PDB: 8u54 ; light grey) GPR155 GPCRs. The choanoflagellate DEP model has a confidence score >70 pLDDT.

Structural features of the newly established RSF GPCR family.

(A) Predicted structural homology of the RSF 7TM domain from S. diplocostata_m.693985 (red), C. owczarzaki_P006019 (blue), M.gubernata_dn949 (yellow), and V. vermiformis_dn7877 (green). A conserved short helix between TM6 and TM7 is predicted in all the structures depicted (dotted box). View of the superimposed models from the plane of the membrane (left) and the top (right). All structural models shown here have a confidence score >70 pLDDT. (B) A multi-alignment of sequences corresponding to the short extra helix depicted in (A), revealed a NxLQxxMN” motif conserved across eukaryotes. (C) Mild structural similarities are shared between predicted S. diplocostata_m693985 (pink) and top structural Foldseek hit Z. mays_TOM1_B7ZYE1 (green) (E-value:1.66e-5). The short extra helix (dotted box) is not observed in TOM1 GPCRs. View of the superimposed models from the plane of the membrane (left) and the top (right).

Features and evolution of choanoflagellate GPCR TK/TKL/Ks.

(A) A schematic depiction of a GPCR Kinase. GPCR Kinases present a canonical 7TM domain combined with a kinase domain (green oval) in their C-terminus (Van Den Hoogen et al. 2018; van den Hoogen and Govers 2018). The kinase domain possibly phosphorylates the GPCR itself or other proteins (van den Hoogen and Govers 2018). (B) All-against-all pairwise comparison of the catalytic domain of choanoflagellate GPCR Kinases reveals distinct kinase domain populations with 11 tyrosine kinase (TK) domains grouping together (blue) and sharing sequence similarities with a second cluster encompassing 86 tyrosine kinase-like (TKL); 6 additional domains (orange) do not share sequence similarities with any other kinase domains in this dataset (Other) (Supplementary Files 24 and 25). Each kinase domain (circle) was blasted against a local version of KinBase, the curated protein kinase dataset from www.kinase.com (Manning et al. 2002; Bradham et al. 2006; Goldberg et al. 2006)), for identification (Supplementary File 26). The E-values depicted summarize the lowest and highest e-values associated with top blast hits obtained for members of each cluster. While choanoflagellate GPCR TKs systematically recovered metazoan tyrosine kinases as their top BLAST hits, choanoflagellate GPCR TKL preferentially matched with amoebozoan tyrosine kinase-like proteins. No BLAST hits were recovered for the 6 isolated kinase domains. Connecting lines correspond to pairwise BLAST scores of p-value <1e-20. (C) Distinct families of GPCR Kinases are found in choanoflagellates. All-against-all pairwise comparison of the 7TM domain of choanoflagellate GPCR Kinases revealed that they cluster into different families, with the 86 GPCR TKL receptors (red) previously identified in (B) grouping with the 6 GPCR K (orange), and 84 additional choanoflagellate GPCRs that do not possess a C-terminal kinase domain (white), forming a cluster of 176 GPCRs in total (dotted line; Fig. 1). On the opposite, the 11 GPCR TKs (blue) previously identified in (B) grouped independently from the GPCR TKL/Ks and were either single GPCRs or grouped with other GPCR TKs. Connecting lines correspond to pairwise BLAST scores of p-value <1e-6. The 7TM sequences analyzed and the output of the clustering analysis are found in Supplementary Files 27 and 28. (D) Phylogenetic distribution of GPCR TKs (blue) and GPCR TKL/Ks (brown) within the choanoflagellate phylogeny. While GPCR TKs are found in a large range of choanoflagellate species, including both craspedids and acanthoecids, GPCR TKL/Ks appear to be restricted to acanthoecids (Fig. 2). (E) Schematic representation of selected sponge GPCR TKs recovered in our analysis by blasting the kinase domain of choanoflagellate GPCR kinases against the EUKPROT database (Richter et al. 2022).GPCRs are depicted at scale. Scale bar = 100 aa.

Structural features of GPRch2 GPCRs.

(A) Predicted structural homology of the 7TM domain from choanoflagellate GPRch2 GPCR S.diplocostata_m.238228 (blue) and its top structural chlorophyte hit T.chuii_A0A7S1SKX2 (orange) (E-value: 6.83e-12). A large intracellular loop 3 (ICL3) between TM5 and TM6 (indicated with an arrowhead) is observed in choanoflagellate and, to a lesser extent, in chlorophyte GPRch2 GPCRs. ICL3 is unstructured with a low confidence score found in both models (<70 pLDDT). (B) Multiple sequence alignment of the 7TM region of GPRch2 GPCRs from choanoflagellates and chlorophytes revealed a conserved “LLLLIS/AxG” motif in TM3, an “ITR” motif in ICL2, and a “FRLRxxNPY” motif in the C-terminal region (dotted boxes).

Metazoan Frizzled/Smoothened homologs detected in corallochytreans, fungi, and amoebozoans.

(A) Predicted structural similarities of the 7TM region of S.multiformis_p007117_colp12_trinity150504 (red), R.norvegicus_Q8CHL0 (blue), R.irregularis_a0a2i1gj93 (green), and P.levis_p001581_dn11856 (yellow) Frizzled/Smoothened receptors. An additional helix 8 (H8) C-terminal to the 7TM domain is also depicted. All predicted regions shown here have a confidence score >70 pLDDT. View of the superimposed models from the plane of the membrane (left) and the top (right). (B) Schematic representation of Frizzled/Smoothened GPCRs found in metazoans, corallochytreans, fungi, and amoebozoans. The structural homology extends beyond their 7TMs as a canonical Frizzled domain (Fz) is detected in the N-termini of all these receptors. The Frizzled/Smoothened homologs detected in S.multiformis are listed in Supplementary File 29. GPCRs are depicted at scale. Scale bar = 100 aa.