Research Article

Identification and classification of ion channels across the tree of life provide functional insights into understudied CALHM channels

Institute of Bioinformatics, University of Georgia, United States
Department of Biochemistry and Molecular Biology, University of Georgia, United States
Department of Molecular Biosciences, Northwestern University, United States
Department of Biochemistry & Molecular Biology, Thomas Jefferson University, United States
Department of Pharmacology, Northwestern University, United States
Chemistry of Life Processes Institute, Northwestern University, United States

May 18, 2026

https://doi.org/10.7554/eLife.106134.3

Open access
Copyright information

eLife Assessment

In this manuscript Taujale et al describe an interdisciplinary approach to mine the human channelome and further discover orthologues across diverse organisms. Further, this work provides evidence that supports a role for conserved residues in CALHM channel gating. Overall this important work presents findings that can be helpful to the ion channel community, as well as to those interested in improved methods for mining sequence space for their protein of interest. However, further validation of the improvements their approach shows over previous approaches is needed, making this a solid contribution to the literature in this field.

https://doi.org/10.7554/eLife.106134.3.sa0

Significance of the findings:

Important: Findings that have theoretical or practical implications beyond a single subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Solid: Methods, data and analyses broadly support the claims with only minor weaknesses

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Methods
Data availability
References
Article and author information
Metrics

Abstract

The ion channel (IC) genes encoded in the human genome play fundamental roles in cellular functions and disease, and are one of the largest classes of druggable proteins. However, limited knowledge of the diverse molecular and cellular functions carried out by ICs presents a major bottleneck in developing selective chemical probes for modulating their functions in disease states. The wealth of sequence data available on ICs from diverse organisms provides a valuable source of untapped information for illuminating the unique modes of channel regulation and functional specialization. However, the extensive diversification of IC sequences and the lack of a unified resource present a challenge in effectively using existing data for IC research. Here, we perform integrative mining of available sequence, structure, and functional data on 419 human ICs across disparate sources, including extensive literature mining by leveraging advances in LLMs to annotate and curate the full complement of the ‘channelome’. We employ a well-established orthology inference approach to identify and extend the IC orthologs across diverse organisms to above 48,000. We show that the depth of conservation and taxonomic representation of IC sequences can further be translated to functional similarities by clustering them into functionally relevant groups, which can be used for downstream functional prediction on understudied members. We demonstrate this by delineating co-conserved patterns characteristic of the understudied family of the calcium homeostasis modulator (CALHM) family of ICs. Through mutational analysis of co-conserved residues altered in human diseases and electrophysiological studies, we show that these evolutionarily constrained residues play an important role in channel gating functions. Thus, by providing new tools and resources for performing large comparative analyses on ICs, this study addresses the unique needs of the IC community and provides the groundwork for accelerating the functional characterization of dark channels for therapeutic intervention.

Introduction

Ion channels (ICs) are membrane-bound proteins critical to many physiological processes in the human body, including regulating cell volume, neurotransmitter release, muscle contraction, and glandular secretion (Como et al., 2021; Meir et al., 1999; Thorneloe and Nelson, 2005; Rajan et al., 1990). Abnormal channel functions have been causally associated with ‘channelopathies’ such as Parkinson’s disease, epilepsy, cardiac arrhythmia, cancer, and cystic fibrosis, to name but a few (Edelman and Saussereau, 2012; Kim, 2014; Rivolta et al., 2020; Bagal et al., 2013). The development of selective chemical probes for channels in disease states is currently hindered by the limited knowledge of the diverse gating mechanisms, ion selectivity, and pathway associations displayed by the various channels encoded in the human genome (Bagal et al., 2013; Braun et al., 2020). While IC sequences from diverse organisms encode critical information regarding underlying functions and effective mining of sequence data can provide important context for predicting and testing understudied IC functions, traditional bioinformatic approaches have had limited success due to the challenges in consistently defining the full IC complement (Wickenden et al., 2012; Fodor and Aldrich, 2006), and accurately aligning and mining large sequence datasets (Neuwald, 2009). Online resources like the collaborative platform Channelpedia (Ranjan et al., 2011), the Guide to Pharmacology (GtoP) database (Alexander et al., 2023), structure centric ChanFAD (Castro et al., 2022), and ChannelsDB (Špačková et al., 2024), and several other efforts have previously cataloged and curated ICs and provide extensive information on a number of IC sequences, but do not include all the human-encoded ICs, nor provide information on ICs from other organisms and taxa (Jegla et al., 2009; Gao et al., 2019; Li and Gallin, 2004; Moran et al., 2015). Most IC studies across taxa are largely confined to closely related families or limited taxonomic groups (Lara et al., 2023; Uribe et al., 2024).

The extensive diversification and widespread abundance of ICs across cell types and their similarities to other transmembrane (TM) protein families, such as transporters, have led to an inconsistent definition of the human IC complement. For example, previous literature has reported around 230 human ICs (Jegla et al., 2009). The Kyoto Encyclopedia of Genes and Genomes (KEGG) database has cataloged around 316 human ICs (Kanehisa and Goto, 2000; Kanehisa et al., 2023), the GtoP database lists 278 ICs (Alexander et al., 2023), while Pharos (Kelleher et al., 2023) and The Human Genome Organization (HUGO) (Seal et al., 2023) list around 344 and 330 proteins, respectively. Moreover, these resources classify the human ICs based on their function, which is not known for many. Furthermore, current classification rarely accounts for the pore-forming or auxiliary roles of ICs. Channel proteins often form large macromolecular complexes in association with other IC and auxiliary subunits. Separate studies have highlighted the myriad of roles that the non-conducting auxiliary subunits play in an IC complex that control channel trafficking, gating, and pharmacology (Kato and Bredt, 2007; Li et al., 2006; Abbott, 2022), making them suitable therapeutic targets for drugs to regulate IC activity. Unfortunately, many of the existing resources fail to identify, include, and annotate such auxiliary channels, leading to their exclusion from large-scale analyses for clinical outcomes. Due to these inconsistencies, downstream analyses have been hindered and limited to certain families or groups of closely related channels. Consequently, our current knowledge of IC functions is skewed toward a subset of well-studied ‘light’ channels, while a significant portion of the channelome remains understudied and is referred to as ‘dark’ channels by the Illuminating Druggable Genome (IDG) consortium (Sharma and Nadler, 2021; Sheils et al., 2020). Concerted efforts such as IDG and the Structural Genomics Consortium (Abbott, 2022) help address these gaps in knowledge by systematically identifying and prioritizing such understudied members. However, it is imperative to develop a functional and an evolutionary classification framework that provides tools for researchers to extend knowledge from light members to their dark counterparts, enabling them to make meaningful biological inferences and shed light on their critical roles in function and disease states.

The current classification of channel genes is largely based on experimentally determined electrophysiological parameters such as the type of ions they transport (Na⁺, K⁺, Ca²⁺, Cl^-) and the channel gating mechanism such as gating by voltage potential (voltage-gated) or ligand binding (ligand-gated), which has been quite helpful in placing ICs in a functional context (Alexander et al., 2023). One of the first comprehensive classifications of ICs was done by Jegla et al., 2009, where they described 235 human ICs distributed in 24 families, along with their conservation across select metazoan species, and family-specific phylogeny and evolutionary analyses. A more recent collection of ICs along with their classification can be obtained from the GtoP database (Alexander et al., 2023), a resource dedicated to providing detailed overviews of over 1900 human drug targets, spanning six major pharmacological targets – ICs being one of them. GtoP classifies 278 human ICs into 3 major groups (voltage-gated ion channels [VGICs], ligand-gated ion channels [LGICs], and other) and 31 families, and provides a detailed overview of these channels in a friendly interface. However, it does not include a lot of now well-known IC families such as calcium homeostasis modulator channels (CALHMs) or bestrophins (Best). Additionally, the resource being a general drug target resource is not IC-centric and lacks evolutionary and residue-level functional insights that might help hypotheses generation. The role or annotation for auxiliary channels, which play critical roles in the function and regulation of ICs, (Kato and Bredt, 2007; Vacher and Trimmer, 2011), has not been addressed in these resources either.

On the other hand, cryo-electron microscopy (cryo-EM) studies and analysis of IC structures have shown that despite extensive variation in primary sequences, several channels adopt common structural folds and mechanisms of action (Chen et al., 2023). For example, several ICs – potassium (K⁺), sodium (Na⁺), calcium (Ca²⁺), transient receptor potential, and hyperpolarization-activated cyclic nucleotide-gated – grouped together as voltage-gated-like, are proposed to have evolved from a common ancestor because they share key sequence and structural features (Yu et al., 2005; Huffer et al., 2020). Thus, quantitative comparisons of these commonly conserved regions at the sequence and structural level can provide new residue-level insights into IC function and a stronger basis for functional classification, much like in other large protein families such as kinases and glycosyltransferases (Shrestha et al., 2020; Picado et al., 2020; Taujale et al., 2020; Taujale et al., 2021). This is imperative since selective targeting of channels in diseases and a mechanistic understanding of disease-associated mutations in channelopathies will require a residue-level understanding of function-determining features in both well-studied ‘light’ and understudied ‘dark’ channels. Thus, there is a critical need for an updated classification that places ICs in functionally and evolutionarily related groups, enabling a deeper residue-level analysis of light and dark channels alike.

To develop a comprehensive classification of ICs, we first mined literature and sequence data sources to collect information on ICs stored across various databases and in disparate data sources and formats to build a well-curated knowledgebase toward defining the human ion ‘channelome.’ A comprehensive list encompassing established and putative IC sequences in the human proteome is provided as a unified table alongside other contextual data for functional annotations such as regulatory interactions, ion selectivity, and pore-lining residues. Leveraging the identification of a pore region and pore-lining residues, we make a clear distinction between pore-containing vs auxiliary ICs. Based on this curation, we then perform a comprehensive classification of ICs into well-defined groups and families and identify the extent of understudied channels in each family, providing a premise for prioritizing studies in IC families with more understudied members.

We further performed a comprehensive evolutionary analysis to define and collect more than 48,000 well-defined IC orthologous relationships from diverse taxonomic groups, spanning metazoans, fungi, protists, bacteria, and archaea. We demonstrate the application of these sequences in annotating understudied CALHMs at residue-level resolution using Bayesian statistical approaches (Neuwald, 2014) and in predicting and experimentally validating the structural and functional impact of disease-associated mutations. These studies support our working premise that integrative mining of available sequence, structure, and functional data on ICs from diverse organisms (well-studied and understudied) will provide an important context for defining sequence and structural features associated with understudied channel functions.

Results

Defining the IC complement of the human genome using informatics approaches

We first sought to collect and curate the full IC complement across the human proteome by systematically mining all the existing data sources, collating information from literature, and running various informatics tools (see Methods) to identify related sequences based on overall similarity and pore-defining regions. In brief, we first collected all the sequences listed as ICs in the comprehensive UniProt database (UniProt Consortium, 2018), as well as Pharos (Kelleher et al., 2023), KEGG (Kanehisa and Goto, 2000; Kanehisa et al., 2023), GtoP (Alexander et al., 2023), and HUGO Gene Nomenclature Committee (HGNC) (Seal et al., 2023) databases and subjected them to a series of annotations as described in Table 1. The complete annotation table for all human ICs is available in Supplementary file 1A. Primarily, we mined the literature for functional and structural annotations on these sequences. Broadly, we have included six annotation categories. (1) The identifier labels include the UniProt ID, name, and Target Development Level (TDL) designation from Pharos (as of July 3, 2025). The IDG consortium, through Pharos, assigns TDL, which can be Tclin, Tchem, Tbio, or Tdark, based on the literature data available for proteins in terms of their potential as drug targets. Including this designation in the annotation is especially relevant for identifying and prioritizing understudied ICs for further investigation. (2) The classification labels describe the Group, Class, and Family the IC falls into. This information has been mined from literature including UniProt, Pharos, GtoP, and HGNC, and the field ‘Family designation’ provides a consensus family label for the IC. (3) The functional labels have been manually mined from various sources, relying mostly on previous literature in the ‘Lit Resource’ column. An important column in this section is the ‘Unit’ column, which describes whether the IC is directly involved in forming the pore region, where it can be pore-forming (directly involved in forming an ion-conducting pore), two-pore (contains two tandem pore-conducting regions), or auxiliary (does not have its own pore domain but interacts with other pore-conducting IC subunits as part of a complex) (please see below and Methods for further definition of an auxiliary IC). We also provide the ions, gating mechanism, and UniProt functional description in this annotation category. To verify the ion and gating mechanism annotations, we mined the literature using large language models (LLMs) and retrieval augmented generation (RAG) systems (Methods, Figure 1—figure supplement 1). The RAG system was able to successfully verify about 40% of the ion and gating mechanism annotations, while for others, it was unable to find supporting evidence, either because no evidence was available in the existing literature or no relevant references were found. The annotations that were not verified by the RAG system are indicated with an asterisk in Supplementary file 1A. (4) The structure-related category includes the PDB ID of any experimentally generated crystal structure or an AlphaFold ID for a computationally predicted structure of the IC. (5) The Complex/Interaction label provides information on the types of complex the IC is involved in forming, along with the interactors. (6) The last section for TM and pore domain-related labels has been curated extensively using various sources and predictors. The TM regions and pore-containing functional domains are annotated based on literature references and prediction tools, as outlined in the Methods section. TM predictions by TMHMM (Krogh et al., 2001) and Phobius (Käll et al., 2007) are provided alongside the TM annotations provided in UniProt. First, we supplemented this information by adding TM information based on the literature review and the source. Then, we further analyzed any ICs with an experimentally determined structure using the MOLE software (Sehnal et al., 2013) to identify and annotate the pore region and pore-lining residues. The MOLE software starts by defining the membrane region of the protein, followed by the identification of cavities and computation of the pore boundaries to predict the pore-lining residues. These predictions are used as additional evidence for defining the pore- containing functional domain region and a pore-containing IC sequence. Sequences without an identified pore region were classified as auxiliary, as defined by Gurnett and Campbell, 1996, to provide functional context for this classification. A snapshot of the list of annotations we collected through this process is shown in Table 1, using Aquaporin-1 as an example.

Table 1

List of features annotated for the collected ion channel (IC) sequences.

The labels are grouped by their annotation category. Labels for Aquaporin-1 are shown as examples of each annotation label.

Identifier labels
UniProt	P29972
Name	Aquaporin-1
Symbol	AQP1 (CHIP28)
Target Development Level (Pharos)	Tbio
Length	269
Classification labels
Family designation	Aquaporin
Group	Other
Class
Family	Aquaporin
Subfamily
Functional labels
Unit	Pore-containing
Ion	Water
Gate mechanism	Ligand-gated (cGMP)
Lit Resource	PMID:26365508, PMID:16962972
UniProt function	Form water-specific channel/plasma membranes of red cells and kidney proximal tubules
Structure-related labels
PDB ID	8CT2
AlphaFold ID
Complex/Interaction-related labels
Auxiliary	No
Characterized domains	MIP
Auxiliary domain	no
Auxiliary protein
Notable interactors	EPHB2
TM and pore domain-related labels
Pore domain start	8
Pore domain end	228
Domain length	220
Does it pass through membrane at least once	Yes
# of TM domains (predicted by TMHMM)	6
# of TM domain (predicted by Phobius)	6
TM organization	1\|2\|3\|4\|5\|6
MOLE pore residue first	35
MOLE pore residue last	185
# of TMs	6
# of Transmembranes (TMs)+Intramembranes (IMs)	10
TMstart (UniProt)	8
TMend (UniProt)	228
TMsList	T:8–36,T:49–66,I:71–76,I:77–84,T:95–115,T:137–155,T:167–183,I:187–192,I:193–200,T:208–228
TM Lit Resource	PMID:10644652
Lit based TMstart_1	8
Lit based TMend _1	36
Lit based TMstart_2	49
Lit based TMend_2	66
Lit based TMstart_3	95
Lit based TMend_3	115
Lit based TMstart_4	137
Lit based TMend_4	155
Lit based TMstart_5	167
Lit based TMend_5	183
Lit based TMstart_6	208
Lit based TMend_6	228

In the final phase, we conducted sequence similarity searches to detect sequences exhibiting significant homology with any curated sequences, subsequently subjecting these candidates to all annotation procedures to determine their eligibility as ICs. By combining the results from this annotation process, we were able to achieve the following: (1) compile a list of human ICs based on the presence of distinct TM regions, a detectable pore, and evidence of ion conductivity, (2) distinguish between a pore-containing IC vs an auxiliary IC, (3) identify novel putative IC sequences, and (4) curate and classify the full IC complement in the human proteome. These results are summarized in Figure 1 and Supplementary file 1A.

Figure 1 with 3 supplements see all

Download asset Open asset

Distribution of human ion channels (ICs) across different families.

Each circle represents a human IC family, with the symbol at the center indicating its Group. The size of the circles is proportional to the number of sequences in that family, and the colored pies indicate the proportion of their Target Development Level (TDL) status as designated by Illuminating Druggable Genome (IDG). The placement of the bubbles is based on the average coordinates of all the members within that family in a distribution of uniform manifold approximation and projection (UMAP) embeddings generated using protein embedding-based pairwise sequence alignments. An embedding-based sequence alignment approach was used to overcome the vast divergence of IC sequences with minimal sequence similarity. A family-level abstraction was done to provide an intuitive view undeterred by the relationships of individual ICs across families. A detailed view of the full UMAP plot showing the placement of individual ICs is provided in Figure 1—figure supplement 3 for reference. Auxiliary IC families at the bottom were not part of the embedding-based analysis due to the lack of a pore-containing domain, and thus placed arbitrarily at the bottom of the figure.

419 human IC sequences were curated using this approach, representing an increase of 75 sequences compared to the previous consensus of 344 ICs in humans (Bagal et al., 2013; Kelleher et al., 2023). A comparison of the curated IC sequences in this study against the list of ICs in other resources – KEGG, GtoP, and Pharos, is provided in Figure 1—figure supplement 2 and Supplementary file 1B. These sequences were classified into 4 major groups – VGICs, LGICs, chloride channels, and others – defined previously (Neuwald, 2009) and 55 families. VGICs constitute the largest group, comprising 186 sequences distributed across 21 families, followed by LGICs with 82 sequences in 10 families. Of the 419 sequences, 62 could not be assigned to the four major groups and were categorized into outlier families. Among these, 28 are pore-containing ICs, with 19 of the 28 distributed across four families (CALHM, otopetrin [OTOP], TM channel-like (TMC), and tweety homolog [TTYH]). These families, collectively referred to as ‘Unclassified’ families, are labeled as such in Figure 1. The remaining nine pore-containing sequences could not be assigned to a distinct family and were grouped together under a single ‘Unclassified’ family. Based on the evaluation of the pore-containing regions, 343 out of 419 ICs were annotated as pore-containing ICs, while the remaining 76 were classified as auxiliary and fell into 17 different families. 23 of these auxiliary ICs are soluble with no detectable TM domains. Any auxiliary ICs that did not belong to a distinct family were collectively classified into an ‘Auxiliary unclassified’ family.

Next, we sought to use the curated set to define similarities across IC families. Traditional sequence-based bioinformatics approaches, such as pairwise sequence alignment (e.g. BLASTp) (Camacho et al., 2009) or profile hidden Markov models (e.g. HMMER) (Eddy, 1998), are limited in their ability to detect relationships among human ICs due to the extensive divergence in their primary sequences. These methods often fail to identify homologous ICs across different families, as the sequence similarity falls below detectable thresholds. Consequently, such approaches are inadequate for comprehensive classification or comparison of IC families in humans. Instead, we relied on representations (embeddings) derived from evolutionary scale protein language models (Lin et al., 2023) to capture sequence, structure, and evolutionary information, and use them to generate pairwise sequence alignment. Specifically, for the 343 pore-containing ICs, we passed their pore-containing functional domains to a protein sequence embedding model called DEDAL (Llinares-López et al., 2023) to perform a pairwise sequence alignment. Given that sequence similarity is largely restricted to the pore-containing functional domains of ICs, we computed protein embeddings using only these regions. As auxiliary ICs lack such domains, they were excluded from this part of the analysis. The resulting all-vs-all sequence similarity scores were used to generate uniform manifold approximation and projection (UMAP) embedding scores (Figure 1—figure supplement 3). An average of all IC sequences within a family was calculated to place the IC family bubbles in Figure 1. Thus, families placed close to each other in Figure 1 indicate their similarities in sequence embeddings, which capture learned representations of sequence, structure, evolutionary conservation, and functional properties. Since the auxiliary IC families did not have UMAP embedding scores, they are arbitrarily placed at the bottom of the figure.

Most of the VGIC families group together on the top right except for inward rectifier potassium channels (K_IR), two-pore domain potassium channels (K_2P), and plasmolipin (PLLP), where K_2P and PLLP are more centrally dispersed, while K_IR falls closer to LGIC families, acid-sensing ion channels, epithelial sodium channel (ENAC), and P2X purinoceptors, all of which share similar TM topology with only two TM helices. The inositol 1,4,5-trisphosphate and ionotropic glutamate receptors are also centrally located, while the Cys loop receptor LGIC families group separately from others toward the bottom right, indicating their shared functional and structural similarities. Interestingly, pannexins, volume-regulated anion channels (VRAC), connexins, chloride channel CLIC-like (CLCC), and CALHM families group closer together centrally, indicating shared features.

We also mapped TDL annotations defined by the IDG (Sharma and Nadler, 2021) in Pharos (Kelleher et al., 2023) to highlight the depth of understudied ICs within each group, and hopefully, spark additional studies of those members. Briefly, IDG describes Tclin as targets that have an approved drug; Tchem has known high potency small molecule binding targets; Tbio has experimentally supported Gene Ontology (GO) (Ashburner et al., 2000) annotations based on published literature, while Tdark is manually curated at the primary sequence level in UniProt, but does not meet any of the Tclin, Tchem, or Tbio criteria (Kelleher et al., 2023). There are 19 Tdark, 185 Tbio, 88 Tchem, and 127 Tclin ICs spread across all families. The proportion of these levels for each family is shown as pie charts in Figure 1. The CALHM family has the highest proportion of Tdark ICs, with OTOP, TMC, connexins, calcium-activated chloride channels (CaCC), calcium voltage-gated channel (CACN), K_2P, and the unclassified families accounting for the remaining Tdark ICs, emphasizing the need for additional research on these families. On the other hand, VGICs – voltage-gated potassium channel (K_V), voltage-gated sodium channel (Na_V), voltage-gated calcium channel (Ca_V), ryanodine receptor (RyR) – and LGICs, including gamma-aminobutyric acid type A (GABA_A) receptor, nicotinic acetylcholine (nACh) receptor, and ENAC, have the highest proportion of Tclin ICs, solidifying their status as the well-studied families.

Mining and cataloging IC orthologs across the tree of life

Next, we sought to extend this annotation and collect related ICs from other organisms across different taxonomic lineages from the tree of life (UniProt Consortium, 2018) using the full complement of the annotated human IC sequences. To this end, we used a graph-based orthology inference approach (Huang et al., 2021) starting from the full-length and the pore-containing functional domains of the human IC sequences to define orthologous relationships across more than 1500 proteomes from the UniProt Proteomes database (UniProt Consortium, 2018). These selected proteome datasets include 353 Archaea, 696 Bacteria, and 547 Eukaryota organisms. This method has previously been successfully employed to define orthology across such a large collection of proteomes for protein kinases (Huang et al., 2021), and since it relies on both the full-length sequence and the annotated pore-containing domains, it can more accurately define true orthologs that are indeed evolutionarily related across organisms. Since this method relies on the pore-containing domain annotations for a domain-based orthology inference, we only used the 343 pore-containing IC sequences for this analysis, and did not include the auxiliary ICs that lack a pore-containing domain.

By combining domain and full-length searches, we could identify more than 48,000 IC orthologous relationships across diverse organisms. Figure 2A shows a heatmap depicting the phylogenetic profiling of human IC orthologs along the horizontal axis across the taxonomic lineages shown by the tree along the vertical axis. The color intensity indicates the percentage of orthologs detected for all the sequences in the IC family for a given taxonomic group. The highest number of orthologs were found for Aquaporins, including bacterial and archaeal orthologs, indicating that this family of ICs is the most conserved across lineages. Along with Aquaporins, the two paralogs of Golgi pH regulator ICs (GPHRs), both indicated as Tbio channels, were found to have the largest number of orthologs across metazoans. Along with GPHRs, several other ICs, notably members of the K_V family and GABA_A receptors, display some of the most widespread orthologs across organisms. Some dark channels, such as members of the TMC (TMC7 and TMC3) and CALHM (CALHM5 and CALHM3) families, are well conserved across metazoans and vertebrates, respectively, yet very little is known about their functions. On the other hand, some Tdark channels, such as potassium channel K member 7 (KCNK7) and TMC4, have a lineage-specific set of orthologs that extend only up to mammals. A full list of all the orthologs detected through this analysis is provided in Supplementary file 1C.

Figure 2

Download asset Open asset

Orthology profiling of human ion channels (ICs).

(A) Heatmap showing the percent of orthologs detected for each IC family within a given taxonomic lineage. The taxonomic groups are shown in the vertical axis with a tree on the left. Darker color represents a higher percentage of orthologs detected. Percentages were calculated as (total number of orthologs found for all ICs in a family)/(total number of organisms queried in the taxonomic lineage * number of sequences in the family). (B) Clustergram depicting the presence/absence of orthologous sequences of ICs across eukaryotic taxonomic lineages. ICs are clustered along the horizontal axis into nine distinct clusters. Taxonomic groups are shown on the vertical axis. Each square in the heatmap is colored based on the orthology relationship found for a specific IC in a specific organism (black: one-to-one ortholog present, red: co-ortholog detected, brown: no orthology detected). (C) Results from the enrichment analysis performed on human ICs of each cluster. The x-axis shows the number of ICs in the cluster enriched for the Gene Ontology (GO) term shown on the y-axis. The bars are colored based on their FDR values for the enriched term. For a full list of enriched terms, please refer to Supplementary file 1D.

To use this evolutionary conservation for functional inference of dark ICs, we first used hierarchical clustering to group ICs with similar orthology profiles into related clusters, which led to the definition of nine clusters, as shown in Figure 2B. Since the orthology profiles in prokaryotic lineages were very sparse, only the eukaryotic orthology profiles were retained for this analysis. Within eukaryotes, each cluster has a well-defined signature of orthology conservation. For example, ICs in cluster 2 have most of their orthologs only in mammals, whereas clusters 4, 5, and 6 have orthologs spanning other vertebrates. Similarly, clusters 7 and 8 have orthologs extending to other metazoans, while seven ICs in cluster 9, including the two Golgi pH regulators A and B and the sodium leak channel NALCN, have detectable orthologs in fungi, plants, protists, and other eukaryotic lineages. To translate these patterns of orthology profiles to function, we performed a functional GO term enrichment analysis for the human ICs in each cluster (Figure 2C). As expected, the most significant GO terms were related to IC function (Supplementary file 1D). However, beyond channel function, the enriched functional GO terms obtained for each cluster correlate well with the physiological functions present in the orthologous organismal groups. For example, clusters 4, 5, and 6 are conserved in vertebrates and have six dark IC sequences, including unclassified ICs such as CALHM3 and 5, and connexins GJA10, GJD4, and GJE1, which had functional enrichment for regulation of heart contraction and heart rate, traits that could be closely related to a closed blood circulatory system with an endothelium present in vertebrates (Monahan-Earley et al., 2013). On the other hand, cluster 2 with orthologs in mammals was conserved in mammalian-specific reproductive functions such as sperm capacitation.

Using the orthology information to identify evolutionary constraints in CALHMs

Once the orthologs have been identified across different organisms, they can be leveraged to find evolutionarily conserved signals that could point to functional similarities within related groups of sequences. To this end, we classified subsets of orthologous IC sequences into evolutionarily related clusters using a Bayesian partitioning with pattern selection (BPPS) algorithm, which classifies sequences based on patterns of amino acid conservation and variation in a large multiple sequence alignment (see Methods) (Neuwald, 2014). For this analysis, we focused on the orthologs of the CALHM family of IC sequences. The CALHM proteins constitute a family of large pore channels, forming oligomeric assemblies of different sizes (Ma et al., 2016). It has six member sequences in humans (CALHM1–6), three labeled as dark channels, constituting one of the families with the highest prevalence of dark ICs. It has been well established that CALHM1 is activated by removing extracellular Ca²⁺ and membrane depolarization and that the heteromeric CALHM1 and CALHM3 channels are implicated in the ATP release during taste perception (Ma et al., 2018). CALHM1 has also been shown to regulate cortical neuron excitability (Ma et al., 2012), locomotion, and induces neurodegeneration in Caenorhabditis elegans (Tanis et al., 2013). On the other hand, CALHM6 has been shown to be important for immune system functions by facilitating induction of immune cells during infection (Danielli et al., 2023). In contrast, the activation stimuli and physiological roles of other family members remain largely unexplored. All CALHM family members share a common arrangement in the TM domain, with each subunit consisting of four TM segments (S1-S4), forming a large cylindrical pore lined by the first TM segment S1 along with a short N-terminal helix preceding S1. The exact gating mechanism of the CALHM channels is still unclear, partly since all their cryo-EM structures have been in an open state. However, previous studies have indicated that the N-terminal region, called the amino-terminal helix (NTH), plays a crucial role in modulating voltage dependence and stabilizing the closed channel state (Tanis et al., 2017), while a more recent study has suggested that the voltage-dependent gate is formed by the proximal regions of S1 (Ma et al., 2025). Collectively, these studies indicate the role of the amino termini – the NTH and S1 regions, to play a crucial role in the gating mechanism and the determination of pore size (Choi et al., 2019) of CALHMs.

Thus, we performed a phylogenetic analysis to find evolutionarily conserved residues that might shed more light on these critical mechanisms that govern CALHM function. Figure 3A shows a phylogenetic tree depicting the evolutionary relationship across the six members of the CALHM family across diverse taxa. CALHM1 and 3 form a distinct clade from CALHM2, 4, 5, and 6. We first analyzed 5805 CALHM homologs to identify pattern positions conserved across all these sequences with the hypothesis that such conserved positions could point to shared functional features across all CALHMs. The Bayesian analysis identified 13 aligned positions conserved across all 6 CALHM homologs, which we will refer to as CALHM shared patterns (Figure 3B and C). Most of these conserved positions were hydrophobic amino acid residues, and five conserved cystine residues, four of which are involved in forming inter-molecular disulfide bridges (C46=C130, C48=C162). Residue position numbers reflect the numbering based on human CALHM2 (PDB id: 6uiv). Interestingly, five of the conserved positions (F44, Y56, I61, P64, W117) are located close to the amino termini in a functionally important linker region connecting S1 and S2 (S1-S2 linker). Specifically, this linker region could regulate the dynamic conformational changes of S1, where the S1 could adopt either a vertical conformation relative to the membrane plane, resulting in an enlarged pore size, or a lifted conformation, leading to a reduced pore size (Figure 3D; Choi et al., 2019). Therefore, mutations in this linker are expected to affect channel functions. Based on the positioning of these conserved residues, and previous studies that highlight the importance of this region in gating functions, we hypothesized that these conserved pattern positions play a role in the gating mechanism of CALHM. The conservation of these residues across orthologs of all six CALHM sequences further suggests that all CALHM paralogs could share this gating mechanism.

Figure 3 with 1 supplement see all

Download asset Open asset

Evolutionary analysis of calcium homeostasis modulator (CALHM) reveals conserved pattern positions.

(A) A phylogenetic tree depicting the evolutionary relationships across all six CALHM members with orthologs across different taxa. (B) The conserved pattern positions conserved across all six CALHM members are shown as a weblogo with the red bars indicating the significance (longer bars indicate higher significance) of conservation and are mapped into a representative structure of human CALHM2. The four transmembrane helices are labeled S1-S4. Conserved residues that are targeted for mutation are highlighted using an asterisk symbol in the labels. (C) Schematic representing the location of transmembrane regions and identified conserved pattern positions in a representative human CALHM2 sequence. The residues targeted for mutations are labeled with their corresponding positions for CALHM2, 1, and 6 shown in the labels. (D) Cartoon representation of CALHM2 structure (PDB ID: 6uiv and 6uiw) in open and closed conformation, respectively. (E) List of disease variants and mutations performed in the conserved pattern positions for functional studies.

To test our hypothesis and determine the functional importance of the CALHM shared patterns, we sought to perform a series of mutational experiments to determine the functional implications of perturbations at these positions. To achieve this, we methodically determined target mutations for each position. We first scanned the Genome Aggregation Database (gnomAD) (Gudmundsson et al., 2022) to check for any prevalent variations at these conserved positions within the sampled population to use as our mutational targets. We found several disease variants at these positions that are listed in Figure 3E that were used to prioritize target mutations. For positions where a variation was not found, we tested their significance by performing alanine mutations, causing a deletion of the side chain at the β-carbon.

Targeted mutational and electrophysiological studies of CALHM conserved residues

To assess the functional roles of the predicted conserved residues in CALHM channels, we performed targeted mutational and electrophysiological analyses in two representative human CALHMs: the well-studied human CALHM1 and the relatively understudied human CALHM6.

Building on the insights from previous structural and functional studies (Ma et al., 2025; Choi et al., 2019), which implicated the NTH/S1 region in channel gating, we prioritized the constraints located in the S1-S2 linker and at the interface between S1 and the TM domain (Figure 3B, C, and E). These two regions were hypothesized to play distinct roles in gating: the S1-S2 linker as a flexible hinge and the S1-TMD interface as a stabilizing contact for S1 movement.

Both total protein western blotting and surface biotinylation assays showed that the mutants have either comparable or substantially higher expression levels than wild type, confirming that neither protein expression nor trafficking to the plasma membrane was impaired (Figure 4—figure supplement 1). We then established electrophysiological measurements for wild-type CALHM1 and CALHM6 following protocols described in the Methods. Our results were consistent with previous findings (Ma et al., 2012; Figure 4). In addition to room temperature, which is commonly used for patch-clamp studies of CALHM channels, we also conducted measurements at physiological temperatures (37°C). We observed that CALHM1 activity was significantly higher at physiological temperature than at room temperature, as reported previously (Kwon et al., 2021; Jeon et al., 2021), while the CALHM6 currents showed only a small increase (Figure 4). Notably, both CALHM1 and CALHM6 currents were inhibited by extracellular Gd³⁺ (Figure 5C and I), a commonly used inhibitor for the CALHM family, consistent with previous studies (Ma et al., 2012; Danielli et al., 2023; Ma et al., 2025). The robust currents at 37°C, particularly those of CALHM1, provide a solid basis to interpret the phenotypes of the mutants tested, as detailed below.

Figure 4 with 1 supplement see all

Download asset Open asset

Electrophysiological studies of human calcium homeostasis modulator CALHM1 and CALHM6.

Whole-cell voltage-clamp recordings were performed in tsA cells overexpressing wild-type CALHM1 (A–D) and wild-type CALHM6 (E–H), as well as in non-transfected cells (I–L). Currents were measured under three conditions sequentially from the same cell: 5 mM Ca²⁺ at 22°C (A, E, I; black), 0 mM Ca²⁺ at 22°C (B, F, J; blue), and 0 mM Ca²⁺ at 37°C (C, G, K; red). Voltage steps ranged from −100 mV to +140 mV, followed by a final tail pulse at −100 mV, with a holding potential of 0 mV (protocol illustrated in the box on the right). Current-voltage (I–V) relationships were plotted in D, H, and L using mean current amplitudes (averaged across independent cells) measured at the end of the voltage steps. The arrow indicates the time point at which current amplitudes were measured. Error bars represent SEM (D, n=5; H, n=5; L, n=5).

Figure 5 with 1 supplement see all

Download asset Open asset

Functional characterization of calcium homeostasis modulator CALHM1 and CALHM6 mutants at conserved residues at 37°C.

Whole-cell voltage-clamp recordings were performed in tsA cells overexpressing wild-type CALHM1 (**A–C**), CALHM1 mutant I109W (**D–F**), wild-type CALHM6 (**G–I**), and CALHM6 mutants (Y51A (**J–L**) and W113A (**M–O**)). Currents were measured under two conditions: 0 mM Ca²⁺ at 37°C (A, D, G, J, M; red) and 0 mM Ca²⁺ at 37°C plus 100 µM Gd³⁺ (B, E, H, K, N; blue), with both conditions recorded sequentially from the same cell. Voltage steps ranged from −80 mV to +120 mV, followed by a final tail pulse at −80 mV, with a holding potential of 0 mV (protocol shown in the box on the right). Current-voltage (I–V) relationships were plotted in C, F, I, L, and O using mean current amplitudes (averaged across independent cells) measured at the end of the voltage steps. The arrow indicates the time point at which the current was measured. The number of independent measurements (cells) were: n=5 (C); n=5 (F); n=5 (I); n=5 (L); n=5 (O). Error bars represent SEM. (**P, Q**) Current amplitudes obtained using a two-step voltage protocol (from +120 mV to −80 mV; protocol shown in the box on the right) are compared between wild-type CALHM1 and its mutants (P), and between wild-type CALHM6 and its mutants (Q). Each dot represents an independent measurement (cell), and bar represents the mean current amplitude across cells. The number of independent measurements (cells) for each bar in P and Q are shown from left to right: 5, 8, 5, 8, 6, 7, 5, 5, 5, 6, 7, 7 (P); 5, 6, 5, 7, 7, 5, 6, 6 (Q). Statistical analysis was performed using one-way ANOVA with Bonferroni’s post hoc test, comparing each mutant to wild type (*p<0.05; **p<0.01; ***p<0.001).

Mutations of a predicted conserved residue in the S1–2 linker (F40 in CALHM1; F39 in CALHM6) either abolished or markedly reduced channel activity in both CALHM1 and CALHM6, presumably by impeding the conformational dynamics of S1 required for channel gating (Figure 5P and Q; Figure 5—figure supplement 1). Similarly, mutation of a conserved tyrosine residue on S2 (Y52 and Y51 in CALHM1 and CALHM6, respectively), whose side chain directly contacts the S1-S2 linker, also resulted in strong phenotypic changes: Y52A in CALHM1 abolished channel activation (Figure 5P and Q; Figure 5—figure supplement 1), while Y51A in CALHM6 converted the channel from voltage-independent to voltage-dependent gating, resulting in outward rectification (Figure 5J–L, P, and Q; Figure 5—figure supplement 1). These residues are likely key determinants controlling the conformational dynamics during gating.

Interestingly, mutations in other conserved residues, near – but not within – the S1–2 linker (W114, L57, and P60 in CALHM1; W113 and A127 in CALHM6), also abolished or markedly reduced channel activity (Figure 5P and Q; Figure 5—figure supplement 1). Among these, W114 in CALHM1 and W113 in CALHM6 directly contact S1. Thus, substituting these bulky hydrophobic residues with smaller (cysteine or alanine) or positively charged (arginine) residues is expected to alter the conformational dynamics of S1 and therefore impair channel gating. Notably, however, the CALHM6 W113A was an exception, retaining wild-type-like currents at 37°C and exhibiting voltage dependence and inhibition by extracellular Gd³⁺ similar to wild-type CALHM6 (Figure 5M–O), suggesting that this mutation does not impair fundamental gating properties.

Finally, since most mutants showed either reduced or completely abolished activity, we included a positive control in the electrophysiological experiments: I109W on CALHM1, which has been previously documented to increase channel activity. As expected, we reproduced this gain-of-function phenotype (Figure 5). To further explore channel-specific differences, we also examined the corresponding mutant in CALHM6 (L108W). Interestingly, L108W decreased CALHM6 channel activity (Figure 5), possibly reflecting inherent differences between the two channels.

Discussion

Here, for the first time, we have computationally defined the full IC complement of the human genome – the ‘channelome’. We provide this comprehensive list and a rich annotation of functional, structural, and sequence features, mapping them to 4 widely accepted IC groups and further classifying them into 55 families. We also highlight unclassified outlier groups that contain most of the understudied ‘dark’ and potentially novel unclassified IC sequences. As part of our curation, we also provide annotation for the pore-containing domain that is based on multiple sources, including an extensive literature review, further strengthened by the application of literature mining using LLMs and structural analysis tools. We use the pore-containing domain to distinguish pore-containing from auxiliary ICs, providing additional functional context. Our annotations can serve as a reference for comparing ICs within or across families, designing experiments for functional studies, and identifying potential targets to illuminate the functional relevance of understudied ICs further.

The application of LLMs, specifically RAG systems, in our research demonstrates significant potential in automating literature mining and correcting certain inaccuracies. However, the system currently faces limitations related to diverse data presentation formats, complex channel subunit relationships, and variant gene nomenclature, which hinder its effectiveness. The high frequency of responses where evidence was not found underscores the need for enhanced entity recognition capabilities, particularly for accurately identifying gene and protein names across different nomenclatures. Additionally, the RAG system’s ability to critically engage with data by identifying and amending presumed inaccuracies can be widely used to verify and correct annotation entries, though this also raises concerns about potential overcorrections or misinterpretations without adequate validation. By more data cleaning and standardizations, LLMs and RAG systems can become more reliable and integral tools for curating and expanding bioinformatics databases, ultimately enhancing the accuracy and comprehensiveness of our IC knowledgebase.

To the best of our knowledge, identifying true orthologs across the tree of life performed here is the first large-scale effort to consolidate IC orthologs across diverse organisms and provides a comprehensive view of the evolutionary conservation of individual ICs. Based on this analysis, we could pinpoint the Aquaporins family of ICs as the most ancient family of ICs with orthologous sequences present back to bacterial and archaeal species. More importantly, the clustering of human ICs based on their depth of conservation placed them into functionally meaningful clusters with many enriched functions exclusive to their orthologous taxonomic group. The placement of understudied ICs in such groups allows for meaningful functional predictions, which can be further elucidated using experimental approaches.

The orthologous sequences identified and presented here can serve as a valuable resource for performing evolutionary and functional analysis using statistical and machine learning approaches. We demonstrate one such use by performing a Bayesian pattern-based classification on the CALHM subset of ICs to identify amino acid positions significantly conserved across all CALHM sequences. These features reside on a functionally important S1-S2 linker region that potentially governs their gating mechanism. Because these features were conserved and shared across all CALHM sequences, we also hypothesize that all CALHM sequences share this gating mechanism. By performing targeted mutations on these conserved residues, we show that a mutation in any of these evolutionarily conserved residues results in a dramatic loss of gating function, thus highlighting the functional importance of the identified pattern positions.

Among the conserved positions targeted for mutations, mutations of residues in the S1-S2 linker (F40C in CALHM1; F39L in CALHM6; Y52A in CALHM1; and Y51A in CALHM6) resulted in strong phenotypic changes, most likely due to their direct involvement in the conformational dynamics of S1 for channel gating. Mutations in W114 in CALHM1, which is not directly on the S1-S2 linker, also resulted in abolished activity; however, mutations in an analogous position W113 in CALHM6 did not impair channel function, presenting an exception compared to other residues across CALHM1 and 6. Among other residues that are near – but not within – the S1-S2 linker, the positioning of P60 on the S2 helix of CALHM1 is particularly intriguing. In the cryo-EM structure, P60 resides approximately midway along S2, inducing a distortion or bending of the helix due to the rigid ring structure of proline. This bent configuration allows extensive contact between the extracellular portion of S2 and the S1-S2 linker, while the intracellular portion of S2 interacts extensively with S1. Furthermore, we hypothesize that this bent conformation of S2 contributes to the flexibility of the S1-S2 linker, as if S2 were straightened, it would consequently straighten the S1-S2 linker, potentially reducing its flexibility. Replacing the proline with glycine eliminates the structural constraint imposed by proline’s cyclic side chain, allowing S2 to adopt a more regular, uninterrupted helical structure. Consequently, a straightened S2 may lead to reduced flexibility of the S1-S2 linker, ultimately impacting channel gating. Similarly, L57 in CALHM1 is positioned near P60 on the protrusion of the bend of S2, with its side chain facing residues on the adjacent S3. We hypothesize that the larger side chain of the L57R mutant would force S2 to straighten due to steric collision between the bulky arginine side chain and residues on S2, again reducing the flexibility of the S1-S2 linker and altering channel gating. A127 in CALHM6 resides in a short alpha helix within the extracellular domain. While A127 does not directly interact with the S1-S2 linker, it is adjacent to the highly conserved disulfide bond (between C41 and C126), linking the helix containing A127 to the S1-S2 linker. Thus, it is conceivable that the A127G mutant may indirectly affect the conformation of the S1-S2 linker through this disulfide bond.

These findings complement prior studies and provide additional insights into CALHM gating mechanisms. Several previous studies examined conserved residues in CALHM channels and proposed roles in gating (Tanis et al., 2013; Danielli et al., 2023; Tanis et al., 2017; Ma et al., 2025; Kwon et al., 2021; Syrjänen et al., 2023). These studies were based on comparisons of a limited set of homologs from model organisms such as C. elegans or mouse and identified residues in various parts of the protein that influence gating, providing important clues toward their functioning. Here, we have extended the analysis to encompass thousands of CALHM sequences collected from the entire tree of life, allowing us to identify residues conserved across all family members through evolution. Guided by a hypothesis from previous structural and functional studies (Ma et al., 2025; Choi et al., 2019), which highlighted the NTH/S1 region as a key element in channel gating, we focused on evolutionarily conserved residues in the S1-S2 linker and at the interface of S1 with the rest of the TMD. We reasoned that if S1 movement is critical for gating, then these two structural elements – the S1-S2 linker, acting as a hinge, and the S1 interface with the TMD, serving as a stabilizing contact – would be key determinants of the conformational dynamics of S1.

Together, our experimental data indicate that the conserved residues at the S1-S2 linker and its immediate surroundings play an important role in CALHM channel gating, supporting a model in which S1 dynamics are central to the gating mechanism (Ma et al., 2025; Choi et al., 2019). Furthermore, the contrasting phenotypes of Y52A in CALHM1 vs Y51A in CALHM6, W114A in CALHM1 vs W113A of CALHM6, and I109W in CALHM1 vs L108W in CALHM6 highlight differences in gating properties specific to each family member. Overall, the complementary results from this study and the published literature highlight the complexity of CALHM gating and suggest that distinct conserved elements contribute to both shared and lineage-specific gating mechanisms in this unique family of large-pore channels. Importantly, this was demonstrated not only for the well-studied CALHM1 but also for the relatively understudied human CALHM6, providing valuable clues into shared gating features across light and dark CALHM sequence sets.

In addition to the CALHM shared patterns, we also present a second subset of conserved positions shared only within CALHM2, 4, 5, and 6, paralogs that are evolutionarily closer compared to CALHM1 and 3 (Figure 3—figure supplement 1). These subsets of conserved residues fall in the intracellular helical segment that is involved in oligomerization and formation of the pore complex governing CALHM function. Previous studies have indicated that CALHM2 adopts a unique undecameric oligomer different from the CALHM1 octamer (Syrjanen et al., 2020). Thus, with the presence of shared patterns within the subset of CALHM2 and the understudied CALHM4, 5, and 6 in this intracellular helical region, we hypothesize that these conserved positions help maintain a similar mode of oligomerization between these evolutionarily related subsets of CALHMs. Because many disease mutations map to these conserved positions, we believe the datasets and approaches offer a powerful avenue for elucidating the functional and clinical relevance of the understudied ICs.

Methods

Identification and annotation of human ICs

The annotation of human ICs was performed using a semi-automated pipeline. First, all the protein sequences that were labeled as ICs were collected from the UniProt (UniProt Consortium, 2018), KEGG (Kanehisa and Goto, 2000; Kanehisa et al., 2023), Pharos (Kelleher et al., 2023), GtoP (Alexander et al., 2023), and HGNC (Seal et al., 2023) databases. The annotations described in Table 1 were then collected from literature sources manually and compiled together. The UniProt-specific labels and the complex formation information were extracted from the UniProt database. The sequences were run through TMHMM (Krogh et al., 2001) and Phobius (Käll et al., 2007) to predict TM and helical regions. The predictions were cross-referenced and confirmed against the CDD (Lu et al., 2020), Pfam (Mistry et al., 2021), PrositePattern, PrositeProfiles (Henschel et al., 2007), and Simple Modular Architecture Research Tool (SMART) (Letunic et al., 2021) databases, where a reference was available. Finally, the prediction of the pore region and pore-lining residues was supplemented using the MOLE software (Sehnal et al., 2013). The prediction of the pore region and the information available in previous literature was combined to annotate the final set of auxiliary IC sequences. An auxiliary IC is defined as any sequence that itself does not have a pore domain, but has experimental evidence of being part of an IC complex. Thus, any IC where a pore domain could not be found was subjected to additional literature review to find evidence for their interactions with the pore-containing ICs, and if the evidence was found, they were included as auxiliary ICs; otherwise, they were removed from our curated IC list.

RAG system for verifying ion selectivity and gating mechanism annotations

To systematically evaluate the ion selectivity and gating mechanism fields in the curated human IC dataset, we built a RAG pipeline comprising three sequential stages: corpus construction and vectorization, query formulation and similarity search, and evidence synthesis with an LLM. All scripts and configuration files are archived in the project repository (https://github.com/esbgkannan/ionchannels-final-pdf, copy archived at Kannan, 2026).

Corpus construction and vectorization

PubMed identifiers (PMIDs) linked to each IC were obtained from our annotation pipeline. Full-text PDFs were downloaded, converted to plain text with the pdfminer.six Python library, and segmented into overlapping fragments of approximately 1000 tokens (50-token overlap) using the CharacterTextSplitter module of LangChain. Each fragment was embedded with the OpenAI text-embedding-3-large model (3072-dimensional vectors). Embeddings and fragment metadata (PMID, page number, fragment index) were stored in a local Qdrant vector database. Prior to embedding, boilerplate headers, footers, and reference lists were removed to retain only article body text.

Query formulation and similarity search

For every IC, two natural-language questions were generated automatically: (1) ‘Is there any evidence that <ION> is the ion selectivity of the <IC-NAME>>ion channel?’, and (2) ‘Does this article provide evidence for <GATING> as the gating mechanism for the <IC-NAME>>ion channel?’.

Here, <IC-NAME>>encompasses the UniProt primary name and all recognized synonyms (produced by extract_alternative_names.py script in https://github.com/esbgkannan/ionchannels-final-pdf). Each query was submitted to Qdrant to retrieve relevant fragments from the database. When PMIDs were available in the dataset, an initial PMID-filtered search was performed; filtered and unfiltered results were subsequently merged and fragments with the highest cosine similarity retained.

Evidence synthesis with an LLM

The retrieved fragments were inserted into a fixed prompt and supplied to GPT-4o (temperature = 0). The model was instructed to return a JSON object having three keys and values of answer (Found, or Not Found), confidence (a number between 0 and 1), and evidence (an explanation of LLM’s decision for the answer). Outputs were parsed with LangChain’s JsonOutputParser, combined with fragment metadata, and compiled to a single JSON file for further analysis. Predictions with confidence ≥0.80 were accepted automatically; lower-confidence or Not Found results triggered manual review. The structured records were merged with the annotation table, enabling automatic comparison with pre-existing entries and flagging of discrepancies for curator review.

Entries for which no supporting evidence could be located (model answer ‘Not Found’ confirmed on manual inspection) are marked with an asterisk in Supplementary file 1A. An illustrative exchange is provided in Figure 1—figure supplement 1.

Defining a pore containing functional domain

We aimed to define the pore-containing functional domain so that it spanned the pore region and included all the TM domains present in an IC. This ensured that our domain definition included the pore region and any other functionally important TMs, while excluding any accessory domains on the flanking sequences. To ensure we did not miss any TM regions, four different sources of annotation information were combined to define the pore-containing functional domain for the 343 ICs. (1) The TM predictions from TMHMM (Krogh et al., 2001) and Phobius (Käll et al., 2007) were first used to identify the TM regions. (2) The UniProt-based TM annotations were matched with the predictions to verify the TM regions further. (3) The literature-based TM annotations were then used to verify the TM positions and organization, where available manually. (4) Where experimentally resolved crystal structure coordinates were available, the MOLE software was used to identify the pore-lining residues.

Pairwise sequence alignment using sequence embeddings

The pore-containing functional domains for the 343 pore-containing ICs were passed to DEDAL (Llinares-López et al., 2023), which was run using standard parameters and resulted in homology logit scores based on an all-vs-all pairwise sequence alignment using their sequence embeddings. These scores were used to generate a sequence similarity matrix passed to the umap function of the umap_learn package v0.5.7 (McInnes et al., 2018) in Python 3.9 to generate 2D UMAP embeddings. This was used to generate a 2D scatterplot that defined the placement of individual ICs shown in Figure 1—figure supplement 3. Finally, an average position of ICs within a family was used to define the placement of that family in Figure 1.

Orthology detection and analysis

The KinOrtho pipeline (Huang et al., 2021) was used for defining the orthologs and co-orthologs of human ICs across the tree of life. KinOrtho employs a graph-based orthology inference approach using both the full-length and the pore domain regions for inferring orthologous relationships, primarily relying on sequence similarity within these regions. The pipeline began with pairwise sequence similarity searches using both full-length and pore-containing domain sequences of human ICs against the target proteomes database. Since a pore domain needs to be defined for this analysis, we only used the 343 pore-containing IC sequences to perform orthology detection. They were used as queries for running KinOrtho against a reference database of curated canonical proteomes from the UniProt Proteomes Release 2022_05 that consisted of 343 Archaea, 696 Bacteria, and 548 Eukaryota proteomes. The initial sequence similarity search was conducted using BLASTp (Camacho et al., 2009) with default parameters and an e-value cutoff of 1e-5 to retain high-confidence hits. The top hit from this forward search was then used as a query in a reciprocal BLASTp search against the human proteome, applying a more stringent e-value threshold of 1e-200 to retain only the top and high-confidence hits. Pairs of sequences were retained only if each was the top hit for the other in this reciprocal search. Such a reciprocal best-hit strategy minimizes false positives by ensuring that the candidate orthologs are each other’s most similar sequence in the respective proteomes, which is particularly important when distinguishing true orthologs from paralogs or other homologs that may share partial similarity but have diverged functionally (Hernández-Salmerón and Moreno-Hagelsieb, 2020).

All the retained hit pairs were then passed to cluster analysis using OrthoMCL (Li et al., 2003), followed by filtering of relationships to keep only relationships within the same cluster. This process was conducted separately using both the full-length human IC sequences and their pore domain sequences. Only sequences that were identified as orthologs in both the full-length and domain-based analyses were retained as high-confidence orthologs. This ensured that both overall sequence similarity and conservation of functionally critical domains were satisfied.

Finally, the obtained ortholog sequence sets were subjected to a series of validation tests that provide more guardrails and ensure we avoid duplicates, fragments, and extraneous hits, ensuring that only sequences with high-quality annotation were retained as true orthologous relationships. These validation tests followed the following steps:

Step 1: Check UniProt Entry Type: We first check the ‘entryType’ flag in UniProt to check whether it is: ‘Reviewed’ or ‘Unreviewed’. ‘Reviewed’ sequences are manually annotated and reviewed, thus are of the highest quality and safe to include. Any sequences with this status are passed. Sequences with ‘Unreviewed’ status proceed to Step 2.

Step 2: Check for Protein Existence Evidence: For the ‘Unreviewed’ sequences, check their ‘proteinExistence’ flag. This flag can have one of five values: (1) Evidence at protein level. (2) Evidence at transcript level. (3) Inferred from homology. (4) Predicted. (5) Uncertain. Flags 1, 2, and 3 are dependable evidence of protein existence, thus are passed, whereas 4 and 5 are low confidence and subjected to further verification in Step 3.

Step 3: For the proteins with low confidence (4 or 5) protein existence values, we next check five different measures/flags to ensure its integrity:

Check their sequence version for unusual patterns. Multiple updates with high sequence versions or a very old version update indicate unstable or deprecated sequences, respectively.
Perform sequence level checks.
1. Check for compositional bias: Sequences with compositional bias might be low complexity regions or repeats.
2. Check for proportion of non-standard amino acids: Presence of a large number of non-standard amino acids could also indicate poor protein annotation.
3. Check sequence length: Unusually short (<30 aa) or long (>5000 aa) sequences could indicate fragments or fusion errors and misannotation.
Check for cross-references: If the protein completely lacks annotated domains, or additional cross-referenced metadata, it might suggest poorly characterized or erroneous proteins. For any sequence that is subjected to Step 3, it should pass all the checks in this step to be included as an orthologous hit.

48,694 unique orthologous relationships from 36,846 sequences passed the orthology pipeline and validation checks and were included in the final list of orthologous sequences. The list of all the orthologous sequences that passed the validation check is provided in Supplementary file 1C.

To illustrate the extent of these orthologous sequences across different taxonomic lineages, they were used to create a presence/absence matrix of orthologs, with rows representing each human IC and columns representing the different organisms. To reduce individual granularity and get estimates at the family and lineage level for visualization, first, the orthologs were grouped by IC families and then, by their defined taxonomic lineages. For each group representing one IC family and one taxonomic lineage, a percentage value representing the proportion of detected orthologs was calculated using the following: (total number of orthologs found for all ICs in a family)/(total number of organisms queried in the taxonomic lineage * number of human IC sequences in the family). These percentages are depicted in the heatmap in Figure 2A, where each cell represents an IC family and its proportion of orthologs in a given taxonomic lineage.

Next, the presence/absence matrix of individual ICs was used to perform an orthology profiling clustering and enrichment analysis. Since this analysis is human IC-centric, only the orthologs from eukaryotic lineages were selected. Hierarchical clustering was performed with the Ward method of clustering and Euclidean distance metric using the SciPy package (Virtanen et al., 2020) in Python. The resulting dendrogram was used to define nine clusters that group ICs with similar presence/absence patterns. Human ICs falling in these nine clusters were then subjected to a functional enrichment analysis using the Gene Ontology resource GO enrichment tool (Carbon et al., 2021). GO terms with a corrected FDR <0.01 were retained as significantly enriched terms.

CALHM evolutionary analysis

BPPS was used to perform a pattern-based classification of the CALHM homologs. First, orthologs for all six human CALHMs were collected. This ortholog dataset was supplemented with more hits from the UniProt database using MAPGAPS, a multiply-aligned profile for global alignment of protein sequences (Neuwald, 2009). Along with finding the best hits for the human CALHMs, MAPGAPS also aligns those hits to the template profile alignment to generate a large multiple-sequence alignment of the resulting 5805 CALHM. This large alignment was then subjected to BPPS, which performs a hierarchical classification of the sequence sets based on conserved pattern positions shared by subsets of sequences using a Bayesian statistical procedure (Neuwald, 2014). This generates a hierarchical cluster where sequences within each cluster are defined by distinct conserved patterns.

Cell lines

HEK293T cells are purchased from Sigma-Aldrich (Catalog Number: 96121229). Neuro2A cells are purchased from ATCC (Catalog Number: CCL-131). The cells are authenticated and tested negative for mycoplasma contamination by the vendor.

CALHM plasmid construction and expression by transient transfection

Full-length human CALHM1 and CALHM6 in the pEGC Bacmam vector were used. The translated product contains the human CALHM1 or CALHM6 protein, a thrombin digestion site (LVPRGS), an enhanced GFP protein, and an 8× His tag. Primers for site-directed mutagenesis were designed using Snapgene and synthesized by Eurofins Genomics. The QuikChange mutagenesis protocol was used to generate all the mutants of the study. Sanger sequencing was performed to identify positive clones.

Adherent HEK293T (ECACC, Catalog Number: 96121229) cells were grown in DMEM media supplemented with 10% fetal bovine serum. Transient transfection was conducted using Lipofectamine 2000 by following the manufacturer’s protocol. Specifically, the cells were cultured in 60 mm Petri dishes until 80% confluency. Transfection solution was made by mixing 500 ng of plasmid DNA, 4 µL of Lipofectamine 2000 reagent, and 100 µL Opti-MEM media. After 20 min incubation at room temperature, the DNA-lipid complexes were added to the cell culture and incubated at 37°C. The next day, 10 mM sodium butyrate was added to the cells to boost protein expression. The cell culture was then grown at 30°C for another day before harvesting. The cell pellet was flash-frozen with liquid nitrogen and stored at –80°C. Each CALHM1 and CALHM6 mutant was transfected in triplicate as biological replicates.

Expression analysis of CALHM1, CALHM6, and their mutants

To analyze total expression levels of wild-type CALHM1, wild-type CALHM6, and their mutants, cells were lysed in TBS buffer (20 mM Tris, pH 8.0, 150 mM NaCl) supplemented with 10% lauryl maltose neopentyl glycol and cholesterol hemisuccinate detergents on ice. Lysates were solubilized at 4°C for 1 hr and clarified by centrifugation at 13,000 rpm for 5 min. The supernatant was mixed with 4× SDS loading buffer containing 5% 2-mercaptoethanol and resolved on a 4–20% gradient SDS-PAGE gel.

In-gel fluorescence imaging of the C-terminal GFP tag was performed immediately after electrophoresis using a ChemiDoc system to visualize CALHM1 and CALHM6 proteins. Following imaging, proteins were transferred to a nitrocellulose membrane using semi-dry transfer buffer (48 mM Tris base, 39 mM glycine, 20% methanol). Membranes were blocked with TBST (20 mM Tris, pH 8.0, 150 mM NaCl, 0.1% Tween-80) containing 4% non-fat milk for 1 hr at room temperature. β-Actin was detected as a loading control by incubating membranes with HRP-conjugated anti-β-actin antibody (Proteintech, Catalog Number: HRP-60008; 1:2000 dilution) for 1 hr at 4°C. After four washes with TBST (15 min each), chemiluminescence signals were developed using Pierce ECL substrate and imaged on a ChemiDoc system. A brightfield image was overlaid to visualize protein marker positions.

Each mutant was analyzed in triplicate from independently transfected cell pellets. GFP signal intensities for CALHM1 and CALHM6 were quantified using ImageJ and normalized to β-actin chemiluminescence signals. Mean values and SEM were calculated for each mutant, and relative expression levels were compared to wild-type proteins using bar graphs generated in Microsoft Excel.

Surface expression analysis

Surface biotinylation of CALHM1, CALHM6, and their mutants was performed using the Pierce Cell Surface Biotinylation and Isolation Kit (Thermo Fisher Scientific) following the manufacturer’s protocol. Cells were cultured and harvested 48 hr post-transfection as described above. Surface-isolated proteins were analyzed by SDS-PAGE, and in-gel fluorescence imaging of the C-terminal GFP tag was performed using a ChemiDoc system.

Electrophysiology

TsA201 cells expressing plasmids encoding N-terminal GFP-tagged human CALHM1 or CALHM6 were used. After 1 day post-transfection with plasmid DNA (100 ng/mL) and Lipofectamine 2000 (Invitrogen, 11668019), the cells were trypsinized and replated onto poly-L-lysine-coated (Sigma P4707) glass coverslips. After cell attachment, the coverslip was transferred to a recording chamber. Whole-cell patch-clamp recordings were performed at room temperature (21–23°C) or body temperature (36–38°C). Signals were amplified using a Multiclamp 700B amplifier and digitized using a Digidata 1550B A/D converter (Molecular Devices, Sunnyvale, CA, USA). The whole-cell current was measured on the cells with an access resistance of less than 10 MΩ after the whole-cell configuration was obtained. The amplifier circuitry compensated the whole-cell capacitance. The two-step pulse from 120 mV to –80 mV for 50 ms was continuously applied to the cell membrane every 5 s to monitor the activation of the CALHM current. The step pulse from –100 mV to 140 mV (or –80 mV to 120 mV) for 200 ms with a holding potential of 0 mV was applied to plot the current-voltage relationship. Electrical signals were digitized at 10 kHz and filtered at 2 kHz. Recordings were analyzed using Clampfit 11.3 (Axon Instruments Inc), GraphPad Prism 10 (La Jolla, CA, USA), and OriginPro 2024 (OriginLab, Northampton, MA, USA). The standard bath solution contains (in mM): 150 NaCl, 5 KCl, 1 MgCl₂, 2 CaCl₂, 12 Mannitol, 10 HEPES, pH = 7.4 with NaOH. For a whole-cell recording, the extracellular solution contains (in mM): 150 NaCl, 10 HEPES, 1 MgCl₂, 5 CaCl₂. To establish a zero Ca²⁺ condition, 5 mM CaCl₂ was replaced with 5 mM EGTA. For the zero Ca²⁺ condition with 100 µM Gd³⁺, 5 mM CaCl₂ was omitted entirely without adding additional EGTA. The intracellular solution contains (in mM): 150 NaCl, 10 HEPES, 1 MgCl₂, 5 EGTA.

All data are expressed as mean ± SEM. Multiple comparisons were performed by one-way or two-way ANOVA with Bonferroni’s post hoc test. n indicates the number of cells. Significance was defined as: *p<0.05, **p<0.01, ***p<0.001. The absence of an asterisk indicates nonsignificance.

Data availability

The human IC annotation table with all the curation information is made available with the manuscript as Supplementary file 1A. The fasta sequences for the human ICs (both full-length sequences and the pore domain sequences) are available through Zenodo. The full length sequences for all the identified orthologs are also available through Zenodo. The code and results related to the RAG annotation pipeline are available at GitHub (copy archived at Kannan, 2026).

The following data sets were generated

1. Taujale R
2. Kannan N
(2025) Zenodo
Identification and classification of ion-channels across the tree of life: Insights into understudied CALHM channels.

https://doi.org/10.5281/zenodo.16232528

References

1. Abbott GW
(2022) Kv channel ancillary subunits: where do we go from here?
Physiology 37:2022.

https://doi.org/10.1152/physiol.00005.2022
- PubMed
- Google Scholar
1. Alexander SPH
2. Mathie AA
3. Peters JA
4. Veale EL
5. Striessnig J
6. Kelly E
7. Armstrong JF
8. Faccenda E
9. Harding SD
10. Davies JA
11. Aldrich RW
12. Attali B
13. Baggetta AM
14. Becirovic E
15. Biel M
16. Bill RM
17. Caceres AI
18. Catterall WA
19. Conner AC
20. Davies P
21. De Clerq K
22. Delling M
23. Di Virgilio F
24. Falzoni S
25. Fenske S
26. Fortuny-Gomez A
27. Fountain S
28. George C
29. Goldstein SAN
30. Grimm C
31. Grissmer S
32. Ha K
33. Hammelmann V
34. Hanukoglu I
35. Hu M
36. Ijzerman AP
37. Jabba SV
38. Jarvis M
39. Jensen AA
40. Jordt SE
41. Kaczmarek LK
42. Kellenberger S
43. Kennedy C
44. King B
45. Kitchen P
46. Liu Q
47. Lynch JW
48. Meades J
49. Mehlfeld V
50. Nicke A
51. Offermanns S
52. Perez-Reyes E
53. Plant LD
54. Rash L
55. Ren D
56. Salman MM
57. Sieghart W
58. Sivilotti LG
59. Smart TG
60. Snutch TP
61. Tian J
62. Trimmer JS
63. Van den Eynde C
64. Vriens J
65. Wei AD
66. Winn BT
67. Wulff H
68. Xu H
69. Yang F
70. Fang W
71. Yue L
72. Zhang X
73. Zhu M
(2023) The concise guide to PHARMACOLOGY 2023/24: Ion channels
British Journal of Pharmacology 180 Suppl 2:S145–S222.

https://doi.org/10.1111/bph.16178
- PubMed
- Google Scholar
1. Ashburner M
2. Ball CA
3. Blake JA
4. Botstein D
5. Butler H
6. Cherry JM
7. Davis AP
8. Dolinski K
9. Dwight SS
10. Eppig JT
11. Harris MA
12. Hill DP
13. Issel-Tarver L
14. Kasarskis A
15. Lewis S
16. Matese JC
17. Richardson JE
18. Ringwald M
19. Rubin GM
20. Sherlock G
(2000) Gene Ontology: tool for the unification of biology
Nature Genetics 25:25–29.

https://doi.org/10.1038/75556
- PubMed
- Google Scholar
1. Bagal SK
2. Brown AD
3. Cox PJ
4. Omoto K
5. Owen RM
6. Pryde DC
7. Sidders B
8. Skerratt SE
9. Stevens EB
10. Storer RI
11. Swain NA
(2013) Ion channels as therapeutic targets: a drug discovery perspective
Journal of Medicinal Chemistry 56:593–624.

https://doi.org/10.1021/jm3011433
- PubMed
- Google Scholar
(2020) The current chemical biology tool box for studying ion channels
The Journal of Physiology 598:4455–4471.

https://doi.org/10.1113/JP276695
- PubMed
- Google Scholar
1. Camacho C
2. Coulouris G
3. Avagyan V
4. Ma N
5. Papadopoulos J
6. Bealer K
7. Madden TL
(2009) BLAST+: architecture and applications
BMC Bioinformatics 10:421.

https://doi.org/10.1186/1471-2105-10-421
- PubMed
- Google Scholar
1. Carbon S
2. Douglass E
3. Good BM
4. Unni DR
5. Harris NL
6. Mungall CJ
7. Basu S
8. Chisholm RL
9. Dodson RJ
10. Hartline E
11. Fey P
12. Thomas PD
13. Albou LP
14. Ebert D
15. Kesling MJ
16. Mi H
17. Muruganujan A
18. Huang X
19. Mushayahama T
20. LaBonte SA
21. Siegele DA
22. Antonazzo G
23. Attrill H
24. Brown NH
25. Garapati P
26. Marygold SJ
27. Trovisco V
28. dos Santos G
29. Falls K
30. Tabone C
31. Zhou P
32. Goodman JL
33. Strelets VB
34. Thurmond J
35. Garmiri P
36. Ishtiaq R
37. Rodríguez-López M
38. Acencio ML
39. Kuiper M
40. Lægreid A
41. Logie C
42. Lovering RC
43. Kramarz B
44. Saverimuttu SCC
45. Pinheiro SM
46. Gunn H
47. Su R
48. Thurlow KE
49. Chibucos M
50. Giglio M
51. Nadendla S
52. Munro J
53. Jackson R
54. Duesbury MJ
55. Del-Toro N
56. Meldal BHM
57. Paneerselvam K
58. Perfetto L
59. Porras P
60. Orchard S
61. Shrivastava A
62. Chang HY
63. Finn RD
64. Mitchell AL
65. Rawlings ND
66. Richardson L
67. Sangrador-Vegas A
68. Blake JA
69. Christie KR
70. Dolan ME
71. Drabkin HJ
72. Hill DP
73. Ni L
74. Sitnikov DM
75. Harris MA
76. Oliver SG
77. Rutherford K
78. Wood V
79. Hayles J
80. Bähler J
81. Bolton ER
82. De Pons JL
83. Dwinell MR
84. Hayman GT
85. Kaldunski ML
86. Kwitek AE
87. Laulederkind SJF
88. Plasterer C
89. Tutaj MA
90. Vedi M
91. Wang SJ
92. D’Eustachio P
93. Matthews L
94. Balhoff JP
95. Aleksander SA
96. Alexander MJ
97. Cherry JM
98. Engel SR
99. Gondwe F
100. Karra K
101. Miyasato SR
102. Nash RS
103. Simison M
104. Skrzypek MS
105. Weng S
106. Wong ED
107. Feuermann M
108. Gaudet P
109. Morgat A
110. Bakker E
111. Berardini TZ
112. Reiser L
113. Subramaniam S
114. Huala E
115. Arighi CN
116. Auchincloss A
117. Axelsen K
118. Argoud-Puy G
119. Bateman A
120. Blatter MC
121. Boutet E
122. Bowler E
123. Breuza L
124. Bridge A
125. Britto R
126. Bye-A-Jee H
127. Casas CC
128. Coudert E
129. Denny P
130. Estreicher A
131. Famiglietti ML
132. Georghiou G
133. Gos A
134. Gruaz-Gumowski N
135. Hatton-Ellis E
136. Hulo C
137. Ignatchenko A
138. Jungo F
139. Laiho K
140. Le Mercier P
141. Lieberherr D
142. Lock A
143. Lussi Y
144. MacDougall A
145. Magrane M
146. Martin MJ
147. Masson P
148. Natale DA
149. Hyka-Nouspikel N
150. Orchard S
151. Pedruzzi I
152. Pourcel L
153. Poux S
154. Pundir S
155. Rivoire C
156. Speretta E
157. Sundaram S
158. Tyagi N
159. Warner K
160. Zaru R
161. Wu CH
162. Diehl AD
163. Chan JN
164. Grove C
165. Lee RYN
166. Muller HM
167. Raciti D
168. Van Auken K
169. Sternberg PW
170. Berriman M
171. Paulini M
172. Howe K
173. Gao S
174. Wright A
175. Stein L
176. Howe DG
177. Toro S
178. Westerfield M
179. Jaiswal P
180. Cooper L
181. Elser J
182. The Gene Ontology Consortium
(2021) The Gene Ontology resource: enriching a GOld mine
Nucleic Acids Research 49:D325–D334.

https://doi.org/10.1093/nar/gkaa1113
- PubMed
- Google Scholar
1. Castro EV
2. Shepherd JW
3. Guggenheim RS
4. Sengvoravong M
5. Hall BC
6. Chappell MK
7. Hearn JA
8. Caraccio ON
9. Bissman C
10. Lantow S
11. Buehner D
12. Costlow HR
13. Prather DM
14. Zonza AM
15. Witt M
16. Zahratka JA
(2022) ChanFAD: a functional annotation database for ion channels
Frontiers in Bioinformatics 2:835805.

https://doi.org/10.3389/fbinf.2022.835805
- PubMed
- Google Scholar
1. Chen GL
2. Li J
3. Zhang J
4. Zeng B
(2023) To be or not to be an ion channel: cryo-EM structures have a say
Cells 12:1870.

https://doi.org/10.3390/cells12141870
- PubMed
- Google Scholar
1. Choi W
2. Clemente N
3. Sun W
4. Du J
5. Lü W
(2019) The structures and gating mechanism of human calcium homeostasis modulator 2
Nature 576:163–167.

https://doi.org/10.1038/s41586-019-1781-3
- PubMed
- Google Scholar
1. Como M
2. Koppala BR
3. Hasan MN
4. Han VL
5. Arora I
6. Sun D
(2021) Cell volume regulation in immune cell function
Activation and Survival. Cell Physiol Biochem 55:71–88.

https://doi.org/10.33594/000000331
- PubMed
- Google Scholar
1. Danielli S
2. Ma Z
3. Pantazi E
4. Kumar A
5. Demarco B
6. Fischer FA
7. Paudel U
8. Weissenrieder J
9. Lee RJ
10. Joyce S
11. Foskett JK
12. Bezbradica JS
(2023) The ion channel CALHM6 controls bacterial infection-induced cellular cross-talk at the immunological synapse
The EMBO Journal 42:e111450.

https://doi.org/10.15252/embj.2022111450
- PubMed
- Google Scholar
1. Eddy SR
(1998) Profile hidden Markov models
Bioinformatics 14:755–763.

https://doi.org/10.1093/bioinformatics/14.9.755
- PubMed
- Google Scholar
1. Edelman A
2. Saussereau E
(2012) Cystic fibrosis and other channelopathies
Archives de Pediatrie 19 Suppl 1:S13–S16.

https://doi.org/10.1016/S0929-693X(12)71101-6
- PubMed
- Google Scholar
1. Fodor AA
2. Aldrich RW
(2006) Statistical limits to the identification of ion channel domains by sequence similarity
The Journal of General Physiology 127:755–766.

https://doi.org/10.1085/jgp.200509419
- PubMed
- Google Scholar
1. Gao J
2. Miao Z
3. Zhang Z
4. Wei H
5. Kurgan L
(2019) Prediction of ion channels and their types from protein sequences: comprehensive review and comparative assessment
Current Drug Targets 20:579–592.

https://doi.org/10.2174/1389450119666181022153942
- Google Scholar
(2022) Variant interpretation using population databases: Lessons from gnomAD
Human Mutation 43:1012–1030.

https://doi.org/10.1002/humu.24309
- PubMed
- Google Scholar
1. Gurnett CA
2. Campbell KP
(1996) Transmembrane auxiliary subunits of voltage-dependent ion channels
The Journal of Biological Chemistry 271:27975–27978.

https://doi.org/10.1074/jbc.271.45.27975
- PubMed
- Google Scholar
(2007) Using structural motif descriptors for sequence-based binding site prediction
BMC Bioinformatics 8 Suppl 4:S5.

https://doi.org/10.1186/1471-2105-8-S4-S5
- PubMed
- Google Scholar
1. Hernández-Salmerón JE
2. Moreno-Hagelsieb G
(2020) Progress in quickly finding orthologs as reciprocal best hits: comparing blast, last, diamond and MMseqs2
BMC Genomics 21:741.

https://doi.org/10.1186/s12864-020-07132-6
- PubMed
- Google Scholar
1. Huang L-C
2. Taujale R
3. Gravel N
4. Venkat A
5. Yeung W
6. Byrne DP
7. Eyers PA
8. Kannan N
(2021) KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases
BMC Bioinformatics 22:446.

https://doi.org/10.1186/s12859-021-04358-3
- PubMed
- Google Scholar
(2020) Global alignment and assessment of TRP channel transmembrane domain structures to explore functional mechanisms
eLife 9:e58660.

https://doi.org/10.7554/eLife.58660
- PubMed
- Google Scholar
1. Jegla TJ
2. Zmasek CM
3. Batalov S
4. Nayak SK
(2009) Evolution of the human ion channel set
Combinatorial Chemistry & High Throughput Screening 12:2–23.

https://doi.org/10.2174/138620709787047957
- PubMed
- Google Scholar
1. Jeon YK
2. Choi SW
3. Kwon JW
4. Woo J
5. Choi SW
6. Kim SJ
7. Kim SJ
(2021) Thermosensitivity of the voltage-dependent activation of calcium homeostasis modulator 1 (calhm1) ion channel
Biochemical and Biophysical Research Communications 534:590–596.

https://doi.org/10.1016/j.bbrc.2020.11.035
- PubMed
- Google Scholar
(2007) Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server
Nucleic Acids Research 35:W429–W432.

https://doi.org/10.1093/nar/gkm256
- PubMed
- Google Scholar
1. Kanehisa M
2. Goto S
(2000) KEGG: kyoto encyclopedia of genes and genomes
Nucleic Acids Research 28:27–30.

https://doi.org/10.1093/nar/28.1.27
- PubMed
- Google Scholar
(2023) KEGG for taxonomy-based analysis of pathways and genomes
Nucleic Acids Research 51:D587–D592.

https://doi.org/10.1093/nar/gkac963
- PubMed
- Google Scholar
Software
1. Kannan N
(2026) Ionchannels-final-pdf, version swh:1:rev:fb763748ef6bd00e53b974d83b205a156edc31b9
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:0476abbd231397406847691744a8e7af1578042f;origin=https://github.com/esbgkannan/ionchannels-final-pdf;visit=swh:1:snp:7f43e7c6998cfd914d43a978e9feeaefbc22850b;anchor=swh:1:rev:fb763748ef6bd00e53b974d83b205a156edc31b9
1. Kato AS
2. Bredt DS
(2007)
Pharmacological regulation of ion channels by auxiliary subunits

Current Opinion in Drug Discovery & Development 10:565–572.
- PubMed
- Google Scholar
1. Kelleher KJ
2. Sheils TK
3. Mathias SL
4. Yang JJ
5. Metzger VT
6. Siramshetty VB
7. Nguyen DT
8. Jensen LJ
9. Vidović D
10. Schürer SC
11. Holmes J
12. Sharma KR
13. Pillai A
14. Bologa CG
15. Edwards JS
16. Mathé EA
17. Oprea TI
(2023) Pharos 2023: an integrated resource for the understudied human proteome
Nucleic Acids Research 51:D1405–D1416.

https://doi.org/10.1093/nar/gkac1033
- PubMed
- Google Scholar
1. Kim JB
(2014) Channelopathies
Korean Journal of Pediatrics 57:1–18.

https://doi.org/10.3345/kjp.2014.57.1.1
- PubMed
- Google Scholar
(2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes
Journal of Molecular Biology 305:567–580.

https://doi.org/10.1006/jmbi.2000.4315
- PubMed
- Google Scholar
1. Kwon JW
2. Jeon YK
3. Kim J
4. Kim SJ
5. Kim SJ
(2021) Intramolecular disulfide bonds for biogenesis of CALHM1 ion channel are dispensable for voltage-dependent activation
Molecules and Cells 44:758–769.

https://doi.org/10.14348/molcells.2021.0131
- PubMed
- Google Scholar
1. Lara A
2. Simonson BT
3. Ryan JF
4. Jegla T
(2023) Genome-scale analysis reveals extensive diversification of voltage-gated K+ channels in stem cnidarians
Genome Biology and Evolution 15:evad009.

https://doi.org/10.1093/gbe/evad009
- PubMed
- Google Scholar
(2021) SMART: recent updates, new developments and status in 2020
Nucleic Acids Research 49:D458–D460.

https://doi.org/10.1093/nar/gkaa937
- PubMed
- Google Scholar
(2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes
Genome Research 13:2178–2189.

https://doi.org/10.1101/gr.1224503
- PubMed
- Google Scholar
1. Li B
2. Gallin WJ
(2004) VKCDB: voltage-gated potassium channel database
BMC Bioinformatics 5:3.

https://doi.org/10.1186/1471-2105-5-3
- PubMed
- Google Scholar
1. Li Y
2. Um SY
3. McDonald TV
(2006) Voltage-gated potassium channels: regulation by accessory subunits
The Neuroscientist 12:199–210.

https://doi.org/10.1177/1073858406287717
- PubMed
- Google Scholar
1. Lin Z
2. Akin H
3. Rao R
4. Hie B
5. Zhu Z
6. Lu W
7. Smetanin N
8. Verkuil R
9. Kabeli O
10. Shmueli Y
11. Dos Santos Costa A
12. Fazel-Zarandi M
13. Sercu T
14. Candido S
15. Rives A
(2023) Evolutionary-scale prediction of atomic-level protein structure with a language model
Science 379:1123–1130.

https://doi.org/10.1126/science.ade2574
- PubMed
- Google Scholar
(2023) Deep embedding and alignment of protein sequences
Nature Methods 20:104–111.

https://doi.org/10.1038/s41592-022-01700-2
- PubMed
- Google Scholar
1. Lu S
2. Wang J
3. Chitsaz F
4. Derbyshire MK
5. Geer RC
6. Gonzales NR
7. Gwadz M
8. Hurwitz DI
9. Marchler GH
10. Song JS
11. Thanki N
12. Yamashita RA
13. Yang M
14. Zhang D
15. Zheng C
16. Lanczycki CJ
17. Marchler-Bauer A
(2020) CDD/SPARCLE: the conserved domain database in 2020
Nucleic Acids Research 48:D265–D268.

https://doi.org/10.1093/nar/gkz991
- PubMed
- Google Scholar
1. Ma Z
2. Siebert AP
3. Cheung K-H
4. Lee RJ
5. Johnson B
6. Cohen AS
7. Vingtdeux V
8. Marambaud P
9. Foskett JK
(2012) Calcium homeostasis modulator 1 (CALHM1) is the pore-forming subunit of an ion channel that mediates extracellular Ca2+ regulation of neuronal excitability
PNAS 109:E1963–E1971.

https://doi.org/10.1073/pnas.1204023109
- PubMed
- Google Scholar
1. Ma Z
2. Tanis JE
3. Taruno A
4. Foskett JK
(2016) Calcium homeostasis modulator (CALHM) ion channels
Pflugers Archiv 468:395–403.

https://doi.org/10.1007/s00424-015-1757-6
- PubMed
- Google Scholar
1. Ma Z
2. Taruno A
3. Ohmoto M
4. Jyotaki M
5. Lim JC
6. Miyazaki H
7. Niisato N
8. Marunaka Y
9. Lee RJ
10. Hoff H
11. Payne R
12. Demuro A
13. Parker I
14. Mitchell CH
15. Henao-Mejia J
16. Tanis JE
17. Matsumoto I
18. Tordoff MG
19. Foskett JK
(2018) CALHM3 is essential for rapid ion channel-mediated purinergic neurotransmission of GPCR-mediated tastes
Neuron 98:547–561.

https://doi.org/10.1016/j.neuron.2018.03.043
- PubMed
- Google Scholar
1. Ma Z
2. Paudel U
3. Wang M
4. Foskett JK
(2025) A mechanism of CALHM1 ion channel gating
American Journal of Physiology. Cell Physiology 328:C1109–C1124.

https://doi.org/10.1152/ajpcell.00925.2024
- PubMed
- Google Scholar
(2018) UMAP: uniform manifold approximation and projection
Journal of Open Source Software 3:861.

https://doi.org/10.21105/joss.00861
- Google Scholar
(1999) Ion channels in presynaptic nerve terminals and control of transmitter release
Physiological Reviews 79:1019–1088.

https://doi.org/10.1152/physrev.1999.79.3.1019
- PubMed
- Google Scholar
1. Mistry J
2. Chuguransky S
3. Williams L
4. Qureshi M
5. Salazar GA
6. Sonnhammer ELL
7. Tosatto SCE
8. Paladin L
9. Raj S
10. Richardson LJ
11. Finn RD
12. Bateman A
(2021) Pfam: The protein families database in 2021
Nucleic Acids Research 49:D412–D419.

https://doi.org/10.1093/nar/gkaa913
- PubMed
- Google Scholar
(2013) Evolutionary origins of the blood vascular system and endothelium
Journal of Thrombosis and Haemostasis 11 Suppl 1:46–66.

https://doi.org/10.1111/jth.12253
- PubMed
- Google Scholar
(2015) Evolution of voltage-gated ion channels at the emergence of Metazoa
The Journal of Experimental Biology 218:515–525.

https://doi.org/10.1242/jeb.110270
- PubMed
- Google Scholar
1. Neuwald AF
(2009) Rapid detection, classification and accurate alignment of up to a million or more related protein sequences
Bioinformatics 25:1869–1875.

https://doi.org/10.1093/bioinformatics/btp342
- PubMed
- Google Scholar
1. Neuwald AF
(2014) A Bayesian sampler for optimization of protein domain hierarchies
Journal of Computational Biology 21:269–286.

https://doi.org/10.1089/cmb.2013.0099
- PubMed
- Google Scholar
1. Picado A
2. Chaikuad A
3. Wells CI
4. Shrestha S
5. Zuercher WJ
6. Pickett JE
7. Kwarcinski FE
8. Sinha P
9. de Silva CS
10. Zutshi R
11. Liu S
12. Kannan N
13. Knapp S
14. Drewry DH
15. Willson TM
(2020) A chemical probe for dark kinase STK17B derives its potency and high selectivity through a unique P-loop conformation
Journal of Medicinal Chemistry 63:14626–14646.

https://doi.org/10.1021/acs.jmedchem.0c01174
- PubMed
- Google Scholar
1. Rajan AS
2. Aguilar-Bryan L
3. Nelson DA
4. Yaney GC
5. Hsu WH
6. Kunze DL
7. Boyd AE
(1990) Ion channels and insulin secretion
Diabetes Care 13:340–363.

https://doi.org/10.2337/diacare.13.3.340
- PubMed
- Google Scholar
1. Ranjan R
2. Khazen G
3. Gambazzi L
4. Ramaswamy S
5. Hill SL
6. Schürmann F
7. Markram H
(2011) Channelpedia: an integrative and interactive database for ion channels
Frontiers in Neuroinformatics 5:36.

https://doi.org/10.3389/fninf.2011.00036
- PubMed
- Google Scholar
(2020) Cardiac and neuronal HCN channelopathies
Pflugers Archiv 472:931–951.

https://doi.org/10.1007/s00424-020-02384-3
- PubMed
- Google Scholar
1. Seal RL
2. Braschi B
3. Gray K
4. Jones TEM
5. Tweedie S
6. Haim-Vilmovsky L
7. Bruford EA
(2023) Genenames.org: the HGNC resources in 2023
Nucleic Acids Research 51:D1003–D1009.

https://doi.org/10.1093/nar/gkac888
- PubMed
- Google Scholar
(2013) MOLE 2.0: advanced approach for analysis of biomacromolecular channels
Journal of Cheminformatics 5:39.

https://doi.org/10.1186/1758-2946-5-39
- PubMed
- Google Scholar
1. Sharma K
2. Nadler L
(2021) Identifying new drug targets by illuminating the druggable genome
The FASEB Journal 35:1799.

https://doi.org/10.1096/fasebj.2021.35.S1.01799
- Google Scholar
1. Sheils T
2. Mathias SL
3. Siramshetty VB
4. Bocci G
5. Bologa CG
6. Yang JJ
7. Waller A
8. Southall N
9. Nguyen D-T
10. Oprea TI
(2020) How to illuminate the druggable genome using pharos
Current Protocols in Bioinformatics 69:e92.

https://doi.org/10.1002/cpbi.92
- PubMed
- Google Scholar
1. Shrestha S
2. Byrne DP
3. Harris JA
4. Kannan N
5. Eyers PA
(2020) Cataloguing the dead: breathing new life into pseudokinase research
The FEBS Journal 287:4150–4169.

https://doi.org/10.1111/febs.15246
- PubMed
- Google Scholar
1. Špačková A
2. Vávra O
3. Raček T
4. Bazgier V
5. Sehnal D
6. Damborský J
7. Svobodová R
8. Bednář D
9. Berka K
(2024) ChannelsDB 2.0: a comprehensive database of protein tunnels and pores in AlphaFold era
Nucleic Acids Research 52:D413–D418.

https://doi.org/10.1093/nar/gkad1012
- PubMed
- Google Scholar
1. Syrjanen JL
2. Michalski K
3. Chou TH
4. Grant T
5. Rao S
6. Simorowski N
7. Tucker SJ
8. Grigorieff N
9. Furukawa H
(2020) Structure and assembly of calcium homeostasis modulator proteins
Nature Structural & Molecular Biology 27:150–159.

https://doi.org/10.1038/s41594-019-0369-9
- PubMed
- Google Scholar
(2023) Structure of human CALHM1 reveals key locations for channel regulation and blockade by ruthenium red
Nature Communications 14:3821.

https://doi.org/10.1038/s41467-023-39388-3
- PubMed
- Google Scholar
1. Tanis JE
2. Ma Z
3. Krajacic P
4. He L
5. Foskett JK
6. Lamitina T
(2013) CLHM-1 is a functionally conserved and conditionally toxic Ca2+-permeable ion channel in Caenorhabditis elegans
The Journal of Neuroscience 33:12275–12286.

https://doi.org/10.1523/JNEUROSCI.5919-12.2013
- PubMed
- Google Scholar
1. Tanis JE
2. Ma Z
3. Foskett JK
(2017) The NH₂ terminus regulates voltage-dependent gating of CALHM ion channels
American Journal of Physiology. Cell Physiology 313:C173–C186.

https://doi.org/10.1152/ajpcell.00318.2016
- PubMed
- Google Scholar
1. Taujale R
2. Venkat A
3. Huang L-C
4. Zhou Z
5. Yeung W
6. Rasheed KM
7. Li S
8. Edison AS
9. Moremen KW
10. Kannan N
(2020) Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases
eLife 9:e54532.

https://doi.org/10.7554/eLife.54532
- PubMed
- Google Scholar
1. Taujale R
2. Zhou Z
3. Yeung W
4. Moremen KW
5. Li S
6. Kannan N
(2021) Mapping the glycosyltransferase fold landscape using interpretable deep learning
Nature Communications 12:5656.

https://doi.org/10.1038/s41467-021-25975-9
- PubMed
- Google Scholar
1. Thorneloe KS
2. Nelson MT
(2005) Ion channels in smooth muscle: regulators of intracellular calcium and contractility
Canadian Journal of Physiology and Pharmacology 83:215–242.

https://doi.org/10.1139/y05-016
- PubMed
- Google Scholar
1. UniProt Consortium T
(2018) UniProt: the universal protein knowledgebase
Nucleic Acids Research 46:2699.

https://doi.org/10.1093/nar/gky092
- Google Scholar
1. Uribe C
2. Nery MF
3. Zavala K
4. Mardones GA
5. Riadi G
6. Opazo JC
(2024) Evolution of ion channels in cetaceans: a natural experiment in the tree of life
Scientific Reports 14:17024.

https://doi.org/10.1038/s41598-024-66082-1
- PubMed
- Google Scholar
1. Vacher H
2. Trimmer JS
(2011) Diverse roles for auxiliary subunits in phosphorylation-dependent regulation of mammalian brain voltage-gated potassium channels
Pflugers Archiv 462:631–643.

https://doi.org/10.1007/s00424-011-1004-8
- PubMed
- Google Scholar
1. Virtanen P
2. Gommers R
3. Oliphant TE
4. Haberland M
5. Reddy T
6. Cournapeau D
7. Burovski E
8. Peterson P
9. Weckesser W
10. Bright J
11. van der Walt SJ
12. Brett M
13. Wilson J
14. Millman KJ
15. Mayorov N
16. Nelson ARJ
17. Jones E
18. Kern R
19. Larson E
20. Carey CJ
21. Polat İ
22. Feng Y
23. Moore EW
24. VanderPlas J
25. Laxalde D
26. Perktold J
27. Cimrman R
28. Henriksen I
29. Quintero EA
30. Harris CR
31. Archibald AM
32. Ribeiro AH
33. Pedregosa F
34. van Mulbregt P
35. SciPy 1.0 Contributors
(2020) SciPy 1.0: fundamental algorithms for scientific computing in Python
Nature Methods 17:261–272.

https://doi.org/10.1038/s41592-019-0686-2
- PubMed
- Google Scholar
(2012) Ion channel drug discovery: challenges and future directions
Future Medicinal Chemistry 4:661–679.

https://doi.org/10.4155/fmc.12.4
- PubMed
- Google Scholar
(2005) Overview of molecular relationships in the voltage-gated ion channel superfamily
Pharmacological Reviews 57:387–395.

https://doi.org/10.1124/pr.57.4.13
- PubMed
- Google Scholar

Article and author information

Author details

Rahil Taujale
1. Institute of Bioinformatics, University of Georgia, Athens, United States
2. Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States
Contribution
Conceptualization, Resources, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing

Contributed equally with
Sung Jin Park

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-1292-1619
Sung Jin Park

Department of Molecular Biosciences, Northwestern University, Evanston, United States

Contribution
Data curation, Formal analysis, Visualization, Methodology, Writing – original draft

Contributed equally with
Rahil Taujale

Competing interests
No competing interests declared
Nathan Gravel

Institute of Bioinformatics, University of Georgia, Athens, United States

Contribution
Resources, Formal analysis, Validation, Investigation

Competing interests
No competing interests declared
Saber Soleymani

Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States

Contribution
Data curation, Formal analysis, Visualization, Methodology, Writing – original draft

Competing interests
No competing interests declared
Rayna Carter

Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States

Contribution
Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared
Kennady Boyd

Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States

Contribution
Resources, Data curation

Competing interests
No competing interests declared
Sarah I Keuning

Department of Molecular Biosciences, Northwestern University, Evanston, United States

Contribution
Formal analysis, Validation

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0009-0008-9408-5119
Zheng Ruan

Department of Biochemistry & Molecular Biology, Thomas Jefferson University, Philadelphia, United States

Contribution
Supervision, Validation, Methodology

Competing interests
No competing interests declared
Wei Lü
1. Department of Molecular Biosciences, Northwestern University, Evanston, United States
2. Department of Pharmacology, Northwestern University, Chicago, United States
3. Chemistry of Life Processes Institute, Northwestern University, Evanston, United States
Contribution
Conceptualization, Funding acquisition, Investigation, Methodology, Writing – original draft, Project administration, Writing – review and editing

For correspondence
wei.lu@northwestern.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-3009-1025
Natarajan Kannan
1. Institute of Bioinformatics, University of Georgia, Athens, United States
2. Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States
Contribution
Conceptualization, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing

For correspondence
nkannan@uga.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-2833-8375

Funding

National Institutes of Health (U01CA271376)

Rahil Taujale
Nathan Gravel
Saber Soleymani
Rayna Carter
Kennady Boyd
Wei Lü
Natarajan Kannan

National Institute of Neurological Disorders and Stroke (5R00NS128258)

Zheng Ruan

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank members of the Kannan Lab for feedback and rotation student William N Lantz for the manual evaluation of RAG LLM-based annotations. Maya Salcedo is acknowledged for onboarding the undergraduates involved in the project. Funding for NK and WL from NIH Common Fund (IDG) U01CA271376 is acknowledged. ZR is supported by 5R00NS128258 from NINDS.

Version history

Sent for peer review: February 10, 2025
Preprint posted: February 18, 2025
Reviewed Preprint version 1: May 9, 2025
Reviewed Preprint version 2: November 18, 2025
Version of Record published: May 18, 2026

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.106134. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.