Investigating the native functions of [NiFe]-CODH through genomic context analysis

eLife Assessment

This valuable work analyzes a large dataset of [NiFe]-CODHs, integrating genomic context, operon organization, and clade-specific gene neighborhoods to discern patterns of diversification and adaptation. A consistent examination of CODH genomic contexts, including CODH-HCP co-occurrence, informs interpretations of enzymatic activity, biotechnological potential, and differential functional roles, in line with current standards in genomic enzymology. With solid support, this work provides a broadly informative contribution to the field.

https://doi.org/10.7554/eLife.108780.3.sa0

Significance of the findings:

Valuable: Findings that have theoretical or practical implications for a subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Solid: Methods, data and analyses broadly support the claims with only minor weaknesses

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

Carbon monoxide dehydrogenases containing nickel-iron active sites ([NiFe]-CODHs) catalyze the reversible oxidation of CO to CO₂, representing key targets for biocatalytic CO₂ reduction. Despite dramatic differences in catalytic rates and O₂ tolerance between CODH variants, the molecular basis for this functional diversity remains poorly understood. We applied comparative genomics and synteny analysis to investigate the biochemical roles of CODH clades A-F using 1376 CODH and 1545 hybrid cluster protein sequences. Around 30% of genomes encode multiple CODH isoforms. Analysis revealed distinct gene clustering patterns correlating with biochemical function. Clades A, E, and F exhibit a degree of distributional exclusivity. Clades C and D frequently co-occur with active CODHs, suggesting auxiliary roles. Operon architecture analysis revealed functional specialization: clade A links to acetyl-CoA synthase; clades A, E, and F contain essential maturation machinery (CooC, CooJ, CooT) correlating with catalytic activity; clade B associates with transporters; clade C with electron transfer partners; clade D with transcriptional regulators. High CODH-HCP co-occurrence (except clade A) suggests functional or environmental interdependency. These findings establish clades A, E, and F as primary biocatalyst targets while defining regulatory functions for clades C and D, providing a genomics framework for predicting CODH phenotypes.

Introduction

Genomic enzymology has been proven to help understand protein (super) families since the mid-1990s, helping to connect enzyme sequences to function through comparative genomics and neighborhood analysis (Babbitt et al., 1996; Knox and Allen, 2023). In this study, we are employing a genome neighborhood and co-occurrence analysis to help understand reactivity and functionality of the family of nickel-containing carbon monoxide dehydrogenases ([NiFe]-CODHs) and their relationship to hybrid cluster proteins (HCPs).

[NiFe]-CODHs are ancient and diverse enzymes that catalyze the interconversion between carbon monoxide (CO) and carbon dioxide (CO₂), a reaction of high interest for biotechnological applications, including CO₂ capture and conversion. Research on this enzyme spans over 60 years, and recent studies have provided important biochemical insights, such as their turnover frequency, oxygen tolerance, and catalytic mechanism (Basak et al., 2025; Can et al., 2014). These properties vary greatly between enzymes, not only between separate phylogenetic clades but also within clades, making functional prediction from sequence alone challenging. Phylogenetic analyses of available [NiFe]-CODH (hereafter referred to as CODH) sequences with different focus such as gene transfer (Techtmann et al., 2012), primary structure (Inoue et al., 2018), biome distribution (Inoue et al., 2022), and human gut microbiome (Katayama et al., 2024) have enriched our understanding of this old and diverse enzyme family. From initially small data sets of 17 sequences (Lindahl and Chang, 2001) to datasets well above 5000 sequences (this study). It has been shown that up to eight distinct phylogenetic clades (Figure 1) can be distinguished with all of them having sequence variations while preserving the overall fold, as seen with cryo-electron microscopy (Biester et al., 2024) and X-ray crystallography (Basak et al., 2025; Domnik et al., 2017; Gong et al., 2008; Jeoung et al., 2022; Jeoung and Dobbek, 2007; Wittenborn et al., 2020).

The biochemical characterization of this enzyme family is still ongoing, and it shows a wide range of turnover frequencies, as well as different degrees of O₂ tolerance. For example, looking at two CODHs from Carboxydothermus hydrogenoformans: ChCODH-II, a benchmark CODH known for its high CO oxidation activity but low O₂ tolerance, contrasts with ChCODH-IV – another enzyme from the same clade and organism – which retains 20% of activity after 1 hr of O₂ exposure but displays reduced CO oxidation capacity with an increased activation barrier and much lower K_M (Domnik et al., 2017), both belonging to clade F (Figure 1). Similarly, a less active CODH from clade E, NvCODH (formerly known as DvCODH) from Nitratidesulfovibrio vulgaris, has been reported to fully reactivate after initial inactivation by O₂ exposure (Hadj-Saïd et al., 2015). Also, the two CODHs from Thermococcus sp. AM4, TcCODH-I, and TcCODH-II belonging to clade E react slower with O₂ compared to ChCODH-II, but have overall equal O₂ sensitivity (Benvenuti et al., 2020).

Figure 1

Download asset Open asset

Schematic phylogenetic tree of [NiFe]-CODH, with selected CODHs marked with their respective positions.

Tree was built using IQ-TREE, 1000 ultrafast bootstrap, containing 5508 putative CODH sequences and one outgroup (MBE6442607.1 hydroxylamine reductase [*Desulfovibrio desulfuricans*]) for rooting. A detailed searchable tree with bootstrap values can be found in Supplementary file 5.

In addition to the previously mentioned diversity within CODH’s clades or organisms, with regard to activity and oxygen tolerance, it is known that some CODHs rely on maturases for full activation, while others do not. For example, RrCODH, from the phototroph Rhodospirillum rubrum, needs to be expressed together with three maturases (CooC, CooJ, and CooT) in order to be isolated in an active form (Kerby et al., 1997). A similar situation arises for NvCODH; however, its genomic neighborhood (Figure 2) only contains one maturase (CooC) which is required for active production (Hadj-Saïd et al., 2015). On the contrary, ChCODH-II can be heterologously expressed without co-expression of any maturases (Merrouch et al., 2018). Also, ChCODH-I needs to be co-expressed with CooC in order to reach high activity, but it can also be expressed without it, albeit with reduced activity (Inoue et al., 2014). Interestingly, much of the diversity with regard to activity, O₂ tolerance, and maturase dependence does not only occur between the different clades but also within them.

Figure 2

Download asset Open asset

Operons of selected [NiFe]-CODH.

* NtCODH, formally known as MtCODH, formerly known as CtCODH due to renaming of host organism (Gtari and Ventura, 2025). ** NvCODH, formerly known as DvCODH due to renaming of host organism (Waite et al., 2020).

Due to the homology between the CODHs in this study and the fact that active CODHs have been demonstrated from several of the clades, it is reasonable to assume that CODHs from all clades are able to interconvert CO₂/CO. However, a recent study by Dobbek and co-workers showed that C. hydrogenoformans CODH-V (ChCODH-V) from clade D was not able to perform this reaction (Jeoung et al., 2022). They showed that this enzyme has a closer similarity to the family of HCPs, due to its morphing active site, composed of iron, sulfur, and oxygen, responding with structural and stoichiometric changes upon redox shifts. A connection between HCPs and CODHs has been pointed out previously by Inoue et al., 2018, due to their close phylogenetic relationship, and was further discussed by Fujishiro and Takaoka, 2023. HCPs can be divided into three phylogenetic classes, of which class III exhibits a homodimeric structure like CODH. Generally, the two enzyme families share a similar overall fold while their active sites differ greatly, both in terms of amino acid and metallocofactor composition. Similar to ChCODH-V, HCPs do not catalyze CO₂/CO interconversion, but they do display a range of activities at low rate, such as hydroxylamine reductase, peroxidase, nitric oxide reductase, and S-nitrosylase activity. The main natural function of HCPs is debated, but it was recently established that it is most likely a nitric oxide reductase involved in nitric oxide detoxification (Hagen, 2022).

In this study, we contribute to painting a holistic phylogenetic picture of CODHs by focusing on the analysis of their genetic environment as well as harnessing the concept of synteny in which we use a semi-quantitative approach to predict characteristics of CODH, clade- and subclade-specific. We are presenting certain clade-specific trends in the operon composition in CODH. We limit our search on operon composition because the majority of our data are prokaryotic genomes, which link functional-related genes physically in an operon (Yaniv, 2011). We furthermore wanted to limit noise and false positives as much as possible; therefore, we excluded data not within our defined parameters (see Methods). Additionally, since many of the genomes included in this analysis are not completely sequenced or include multiple CODH from different clades, only including genes in close proximity to the target gene ensures that the data is not distorted. Since it is known that many organisms have multiple isoforms of CODHs coded in their genome, we analyze the co-occurrence of CODH of different clades in an organism, as well as the co-occurrences of CODH and HCP. With our findings, we want to propose a systematic approach in the analysis of new CODHs, with the focus on identifying promising CO₂ reduction catalysts, suitable for biotechnological application.

Results

Co-occurrence and correlation

After evaluating the assemblies in regard to their count of CODH, we found that around 30% of all assemblies encode for more than one CODH. For HCP, this number is much smaller, around 6%. Figure 3A shows that the occurrence of multiple isoforms from specific clades within organisms varies. Clades B, C, and D almost exclusively occur only once within a genome, while clades A, E, and F are more likely to co-occur with another isoform from the same clade. The overall trend that we observe is most likely underrepresenting the number of genomes encoding multiple CODHs, since incomplete genomes are also included in these analyses. When calculating the correlation of the co-occurrence of CODH from two different clades in one organism, a pattern evolves (Figure 3B). Most obvious is the low co-occurrence of clades A and E with F. However, clade F reversely has a higher co-occurrence with clade E due to the asymmetrical nature of the data. This is an effect of the different sizes of the clade datasets. Furthermore, all clades seem to have low co-occurrence with clade B. Due to the effect stated above, clade B does, however, show some co-occurrence with clades D and E. Clade C shows strong co-occurrence with clade E, while clade D co-occurs with clades A, E, and F. Clades C and D also have a higher probability to co-occur with each other. As outlined in the Introduction, from biochemical studies, it is known that CODHs from clades A, E, and F are active, whereas CO₂/CO interconversion activity is missing in CODH from clade D. This co-occurrence might suggest that the redox sensing properties of CODH from clade D (and potentially clade C) are useful for organisms already containing a functional CODH. Interestingly, a high co-occurrence was also seen for CODH and HCP, with an exception for clade A CODH.

Figure 3 with 1 supplement see all

Download asset Open asset

CODH co-occurence within organisms.

(A) Frequency of X amounts of [NiFe]-CODH from the same clade co-occurring in the same organism. (B) Probability matrix of [NiFe]-CODH from different clades co-occurring in the same organism. The equation for probability can be found in the Methods section. A long table format of the matrix can be found in Supplementary file 2. Additional raw data can also be found in Supplementary file 1.

Neighbor analysis

We semi-quantitatively evaluated the operon composition of 1351 CODHs (121 A, 130 B, 168 C, 253 D, 434 E, 245 F, Supplementary files 6 and 7) with proteins whose function we could predict using eggNOG (Cantalapiedra et al., 2021; Huerta-Cepas et al., 2019), the NCBI product prediction, and manual curation. The results are summarized in Figure 4. Even though eight clades of CODH are known, we only present six in this analysis, since our quality measures excluded clade G and clade H CODH from the initial dataset because of poor assembly quality or lack of host information. In the following, we only report on neighbors that are encoded in the same operon as more than 10% of CODHs per clade (see Supplementary file 2). Starting with CODHs from clade A, 93% contain a one-carbon pool-related gene in their operon, followed by CooC (62%) and iron-sulfur (FeS) cluster-containing protein (31%). As one carbon pool-related gene, we defined genes associated either with direct conversion of one carbon compounds, such as formate dehydrogenases or with the Wood-Ljungdahl pathway. Clade B CODH operons mainly encode ABC transporter-associated genes (64%). Furthermore, almost a quarter (24%) of all CODHs from clade B could not be associated with any neighbor, and 12% are coded close to transcriptional regulators. For CODH from clade C, the three main neighbors are proteins associated with FeS cluster containing proteins (such as CooF) (72%), NAD(P) or FAD-dependent oxidoreductases (71%), and transcription (58%), or other (10%) regulation. The overall diversity of neighboring proteins from clade D and the fact that a major part of those CODHs seemingly do not encode close to any other genes (64%) made it challenging to sum up their different codons, and no clear pattern could be observed. Only transcription regulation proteins (9.9%) and general regulatory proteins (9.5%) are worth mentioning in this context. Clades E and F both have a larger set of proteins frequently observed in their associated operons. Operons encoding either clade E or F CODH contain CooC-like genes (59% E, 68% F), one carbon pool-associated genes (49% E, 37% F), and FeS genes (29% E, 53% F). Transcription regulators (17% E, 35% F) and NAD(P)/FAD-dependent oxidoreductase (22% E, 42% F) have also been found. The maturation protein CooT was exclusively found in operons from clade E (16%) and F (6.1%). The same holds true for CooJ, but in clade F, CooJ was seen in even fewer operons (12% E, <5.0% F). Additional hydrogenases (25%) and their maturation machinery (17%) are coded primarily for clade F CODH, as well as different types of transporter proteins (11%).

Figure 4 with 1 supplement see all

Download asset Open asset

Operon content of CODH encoding operons in different clades.

(A) Phylogenetic tree of putative [NiFe]-CODH unrooted with 1376 sequences. The colored rings mark if the operon from a protein/leaf contains one or more of a certain type of protein using the same coloring as in C. The types of proteins are from center to outer ring: one carbon pool, CooC, CooT, CooJ, ferredoxin, FeS, hydrogenase, hydrogenase-maturase, NAD(P)/FAD-dependent oxidoreductase, regulation, ABC transporter, and transporter. A detailed searchable tree with bootstrap values can be found in Supplementary files 6 and 7. (B) Distribution of operon size for CODH and HCP genes. (C) Proportion of [NiFe]-CODH from one clade being coded near a certain type of protein. The dotted line marks a proportion of 10%, which we set as our significance cutoff. Raw data can be found in Supplementary file 1.

We performed a similar analysis of the operons encoding for HCP and analyzed a total of 1476 HCP genes (class I: 1049, class II: 23, class III: 404, Supplementary files 8 and 9) showing a low frequency of isoforms within organisms (Figure 3—figure supplement 1). HCP exhibited a large variety of neighbors, leading to difficulties in extracting meaningful information from their operon composition (Figure 4—figure supplement 1). Furthermore, classes I and III had a high proportion of entries without any neighbors (49% and 75%, respectively), which is reflected in their tendency to have fewer proteins coded in their operons (see Figure 4B). However, we observed a high frequency of FeS cluster proteins (18%) and transcription regulators (17%) for class I, as well as NAD(P) or FAD oxidoreductases (96%) and transport proteins (78%) for class II HCP. It needs to be noted, however, that our sample set for class II HCP is very small, so its information value is considerably lower compared to the other classes/clades.

Discussion

Our analysis reveals substantial diversity in the occurrence, co-occurrence, and genomic context of CODH and HCP genes, suggesting complex evolutionary and functional relationships within and across microbial lineages. The observed differences in the frequency of multiple isoforms per genome (~30% for CODH versus ~6% for HCP) indicate that CODHs are more often retained in multiple copies, potentially pointing to functional diversification among its isoforms. Similar values have been reported by Techtmann et al., 2012, where they found that a striking 43% of organisms coded for more than one CODH. On the other hand, Katayama et al., 2024, investigating only the human gut microbiome, found a number as low as 5.5%. We suspect that this number underrepresents the amount of organisms carrying multiple isoforms of CODH in the human gut, since Katayama and co-workers performed data refinements that exclude potential CODHs (such as the strict requirement for a [4Fe-4S] D-cluster, even though Inoue et al., 2018 reported on the diversity of the D-cluster earlier; ).

There are many examples of organisms coding for multiple CODH isoforms, as outlined in Figure 3; Supplementary files 10 and 11. Many of them have been known to literature for a long time (even though their CODH abundance has only been discussed sporadically), with the most famous example being C. hydrogenoformans encoding five different CODHs (Wu et al., 2005). Another interesting example is Clostridium formicoaceticum, since this organism has a total of six CODH isoforms encoded in its genome. It needs to be noted that in our analysis, this organism did not show up as an organism with six CODHs; see Figure 3A. This is due to our analysis only counting CODH stemming from the same organism when their genes are associated with the same genome assembly. We therefore rather underestimate counts of organisms with multiple CODHs, such as the aforementioned. The only CODH from C. formicoaceticum that has so far been isolated, characterized, and discussed is one of its clade E CODHs, that is associated with acetyl-CoA synthase (ACS) (Bao et al., 2019; Diekert and Thauer, 1978). Other examples from literature attempted to investigate the influence of CODH isoforms on the metabolism. Archaeglobus fulgidus contains three CODH genes, two from clade A and one from clade D. Its clade D CODH seems to have no role in the CO metabolism of this organism (Hocking et al., 2015). There also has been a report for the organism Methanosarcina acetivorans, which harbors three CODHs, all of them belonging to clade A, where only two of them are associated with ACS and are believed to be involved in the CO metabolism, the other one being a lone gene that is seemingly not involved in carbon metabolism (Matschiavelli et al., 2012). Another interesting example is Thermoanaerobacter kivui, formerly known as Acetogenium kivui (Collins et al., 1994). When TkCODH-I (clade C) is deleted from the organism, the strain loses its ability to grow on CO; however, if grown on H₂+CO₂ the overall acetate production is greatly increased (Jain et al., 2022). Similar effects have been shown for Clostridium autoethanogenum, which contains three isoforms of CODH. If its clade C CODH is deleted, the organism’s lag phase is reduced and its growth rate is greatly increased (Liew et al., 2016). The other two CODH isoforms from this organism are from clades E and D. Deletion of clade D CODH showed no immediate effect on the organism, except moderately lower overall biomass yield (Liew et al., 2016). In our analysis, we saw an increased frequency of co-occurrence of clades C and D with A, E, or F, which together with biological data may indicate a complementary role, possibly linked to redox sensing or regulatory functions. This is especially evident with the examples for clade C CODHs from T. kivui and C. autoethanogenum. For the case of clade D, which lacks catalytic activity toward CO/CO₂ interconversion (as reported by Jeoung et al., 2022, through their recombinant production of ChCODH-V) and until now has unknown influence in the metabolism that might only manifest in harsher environments, since it’s believed to be involved in stress response (Jeoung et al., 2022) (similar to HCPs [Hagen, 2022], see below). However, experimental proof for this claim is still missing.

Furthermore, clades A, E, and F rarely co-occur. Interestingly, many organisms do, however, contain multiple copies of CODHs from one of these clades, such as M. acetivorans (clade A), C. hydrogenofromans (clade F), and C. formicoaceticum (clade E). We suspect an evolutionary reason behind this, as is also outlined by others (Adam et al., 2018; Lindahl and Chang, 2001). Biochemical data indicate that CODHs from these clades possess CO/CO₂ interconversion capability, as previously mentioned. This is also in line with the genetic context of these CODHs, which is most often tuned for this CO/CO₂ interconversion chemistry (Figure 4C, see below).

The high rate of co-occurrence between CODH and HCP genes (except for clade A) suggests functional integration, a shared metabolic niche, or involvement in a coordinated response to redox stress, given that HCPs are thought to regulate nitric or oxidative stress (Hagen, 2022). The lack of co-localization for clade A CODHs might point to distinct metabolic roles or evolutionary constraints, or to the high rate of archaeal genes in clade A CODHs, even though HCP genes are known to also be found in archaea (Hagen, 2022). Interestingly, the co-occurrence between clade D CODH and HCP seems the highest for our dataset; the reason for that remains unclear. Similar to clade D CODH, empirical proof that HCP influences the activity or expression of CODH is missing.

The genomic context analysis adds another layer of functional inference, as has been done before by others with different foci (Inoue et al., 2018; Katayama et al., 2024; Matson et al., 2011; Techtmann et al., 2012). We could show again that operons containing clade A CODHs are highly conserved with one carbon pool-related genes and CooC, as they are almost exclusively found as part of the Wood-Ljungdahl pathway, which has a typical arrangement similar to Methanosarcina barkeri CODH (MbCODH, Figure 2, Supplementary file 7). Recently, another representative from this group from M. thermophila (MetCODH) has been resolved (Biester et al., 2024).

In contrast, clade B CODHs appear largely alone or associated with transport-related genes, raising the possibility of a non-canonical or even degenerated function. Its operon composition is also rather consistent, and its arrangements only vary to a small extent, as ABC transporters are either coded upstream (as for Ruminococcus flavefaciens’ CODH, RfCODH, see Figure 2) or downstream of the CODH gene. Almost all operons analyzed do not contain any maturases, except for a small cluster that branches off rather early in the tree (Supplementary file 7). This might indicate that the need for a maturase was lost due to re-purposing of the CODH. We yet await biochemical characterization of any clade B CODH.

Clade C CODHs are associated with FeS cluster proteins, regulators, and redox enzymes, pointing toward more regulatory or redox-modulatory roles, which could also be indicated in knockout studies (Liew et al., 2016). The only isolated example from this clade is TkCODH-I. Its operon exhibits a composition only partially representable for clade C CODHs, containing only one other gene coding for an FeS protein (Figure 2). Furthermore, TkCODH-I’s sequence branches off early and seems to be rather distinct (Supplementary file 5), only having one other close relative from Aceticella autotrophica (Frolov et al., 2023). Furthermore, the reported isolated CODH from Jain et al., 2022, stems from a CO-adapted strain (Weghoff and Müller, 2016), which might harbor mutations in the protein sequences that are not accessible to us at the moment. Drawn together, we conclude that right now, TkCODH-I might not be an optimal representative for clade C CODHs, and more clade C CODHs should be isolated to help us understand their biochemical properties better.

The high operonic variability and frequency of solitary coding regions in clade D might reflect either evolutionary drift or multifunctionality not restricted to operonic structure. Clade D will therefore not be discussed further.

Operons from clades E and F are more functionally complex, including components from the Wood-Ljungdahl pathway, hydrogenases, and additional redox partners, consistent with a diverse metabolic role. That being said, their operon compositions and arrangements showcased some interesting clustering. Starting with clade F, which has the highest proportion of CODHs that might be associated with hydrogenases, including the aforementioned ChCODH-I and RrCODH, both interacting with hydrogenases to produce hydrogen in vivo (Fox et al., 1996; Soboh et al., 2002), however with greatly differing operon compositions (Figure 2). ChCODH-I-like operons (see Supplementary file 7) contain their hydrogenase modules directly within the CODH operon, whereas RrCODH-like operons do not include the hydrogenase module which is coded upstream of the CODH gene, with an intergenic space of >400 bp (Fox et al., 1996). RrCODH-like operons are the only clade F operons that include two additional maturation enzymes, CooT and CooJ. Clade F also contains many ACS-associated CODHs, such as Neomoorella thermoacetica CODH, NtCODH, formerly known as Moorella thermoacetica (Gtari and Ventura, 2025). Those NtCODH-like operons all have the same arrangement. This arrangement is distinct from clade A and E ACS-associated CODH operons.

In our dataset, most of the hydrogenases and their maturation genes are associated with clade F, suggesting active hydrogen metabolism, coupling CO oxidation to H₂ production or consumption, as has been suggested earlier for a wider range of CODH clades (Inoue et al., 2018; Techtmann et al., 2012). We believe that our data grossly underestimates this relationship overall, since operon examples such as RrCODH and TcCODH-II showcase that hydrogenases associated with a CODH are not necessarily encoded in the same operon. There has also been a report of a clade A CODH from the methanogen M. thermophila (Terlesky and Ferry, 1988) being associated with a hydrogenase. However, in later studies, it was shown that the genome of M. thermophila does not contain a hydrogenase (Smith and Ingram-Smith, 2007). Investigation of the electron transport chain of its membrane could not find a hydrogen-oxidizing complex (Welte and Deppenmeier, 2011). Together with our analysis, we conclude that hydrogenase association is a trait almost exclusive to clade E and F CODHs. ChCODH-II’s operon seems to be rather uniquely constructed, as a similar operon composition only can be found for other Carboxydothermus species. A similar situation can be seen for ChCODH-IV, where its operon containing FeS- and NAD/FAD-dependent oxidoreductases is closer in similarity to some clade E CODH.

Regarding the biggest clade, clade E, its diversity is striking. NvCODH, formerly known as DvCODH (Waite et al., 2020), from the organism N. vulgaris, has a very small operon with only two genes in its close proximity, a transcriptional regulator (Zhou et al., 2012) and a maturation enzyme (CooC), see Figure 2. This is seen for a huge number of both clade E and F CODHs. The occurrence of neither CooJ nor CooT is prominent, both only appearing in two very distinct parts of clade E, all of them being associated with one carbon pool metabolism, with one exception from Clostridium pasteurianum BC1. This operon resembles the clade F RrCODH-like operon. The previously introduced archaeal CODHs, TcCODH-I and TcCODH-II from clade E, are contained in operons (Figure 2) that are rather specific and only found for a few other Thermococcus or Pyrococcus species (Benvenuti et al., 2020; Kim et al., 2015). It needs to be noted that TcCODH-I’s CooC gene is coded outside of its operon, and on the opposite strand. The CooC gene is therefore not included in our analysis. We are only aware of examples from this type of operon and don’t expect this to be a common trait of the CODH maturation machinery. However, it needs to be noted that the CooC proportion might be slightly underestimated. Interestingly, TcCODH-II, like CODH, all contain a CooT-like protein in their operon, forming the only cluster of CODH that contains only CooT-like proteins without CooJ. Within clade E, another unique genomic neighborhood from TkCODH-II must be pointed out. From experimental data, it is known that this CODH is associated with ACS (Jain et al., 2022); however, in our analysis, we did not see this ACS complex in TkCODH-II’s operon. This is due to the ACS subunit being coded further downstream of the CODH gene, not being taken into account due to our initial parameters, which showcase the limits to this study.

For HCPs, the high variability and low operon density – especially in classes I and III – point toward more modular or conditionally expressed roles, similar to clade D CODH. The clear patterning in class II operons, though based on a limited sample, may reflect specialized functions, perhaps in niche-specific oxidoreductase activities.

Conclusion

As previously mentioned, the aim of this study is to identify which CODH clades harbor the most promising enzymes for future application in CO₂ reduction. The operon composition of CODHs from different clades shows distinct differences, and what we could gather from this information is that clades A, E, and F are the most likely clades to harbor CODHs able to efficiently convert CO₂ to CO. These clades are therefore the most interesting for CO₂ reducing biotechnological applications, or as inspiration for new synthetic catalysts. Also, literature has shown that the activity of many CODHs depends on co-expression with maturation proteins such as CooC. In some cases, CooJ and CooT are also required for full activation. Although some CODHs (most notably CODH-II from C. hydrogenoformans) can function independently of maturases, our neighborhood analysis indicates that maturase-coding genes are predominantly found in operons from clades A, E, and F. This pattern implies yet again that these clades may represent more biochemically active or catalytically optimized CODHs, making them promising targets for future functional studies and biotechnological applications.

The function of clade B could not be deduced based on its genomic environment, but it seems to have a remarkable self-standing function that is not shared with any other CODHs. Its low co-occurrence with other CODH clades within organisms also supports this unique role for clade B. Clades C and D are more likely to show low or even no activity toward CO₂/CO interconversion, as was deduced from literature and the lack of one carbon metabolism-related genes in their operons. However, Jain et al. recently showed low CO oxidation activity in a clade C CODH from T. kivui, but this enzyme originated from a strain that had acquired the ability to grow on CO through laboratory evolution (Jain et al., 2022). Sequence data used to classify the CODH into clade C was from the original strain (incapable of growing on CO), and data on the engineered strain is not available. It is therefore not known whether the active CODH is the wild-type or an engineered enzyme, and we cannot draw any conclusions regarding the activity of clade C CODHs. Taken together, this makes clades B, C, and D less promising in the hunt for CO₂ reduction catalysts. However, much is still unknown about these enzymes, such as their cellular function.

Future work should focus on experimentally validating the functional differences among CODH isoforms, especially in organisms that contain members of multiple clades. Caution is warranted when extrapolating enzymatic activity or inactivity from a limited number of characterized examples to entire clades. Additionally, transcriptomic and proteomic studies could illuminate condition-dependent expression patterns and confirm proposed regulatory functions. Future bioinformatic work should look at the co-occurrence of CODH with other proteins outside their operons, once more sequence data is available. Finally, deeper phylogenomic analyses may reveal the evolutionary drivers behind the observed distribution and diversification of these ancient redox enzymes.

Methods

Data collection and refinement

Multiple pBLAST (Madden, 2013) searches (BLOSUM62, E<0.05, blastp 2.16.1+, NCBI online web server, nr database accessed 2025-01-16) were carried out using NCBI accession numbers provided by Inoue et al., 2018 (A-1, WP_011305243; A-2, WP_010878596; A-3, OGW06734; A-4, OIP92259; A-5, ODS42986; A-6, OIP30420; B-1, WP_026514536; B-2, WP_015485077; B-3,WP_012645460; B-4, WP_011393470; C-1, WP_039226206; C-2, WP_013237576; C-3, WP_010870233; C-4, WP_044921150; D-1, WP_011342982; D-2, WP_015926279; D-3, WP_079933214; D-4, WP_096205957; E-1, WP_012571978; E-2, WP_010939375; E-3, WP_088535808; F-1, WP_011343033; F-2, WP_011389181; G-1, OGP75751) and (Techtmann et al., 2012) (mini CooS, WP_007288589.1). Accession numbers from CODH from clade H (Inoue et al., 2022) were initially omitted due to limited host information. Duplicates were removed using seqkit’s (Shen et al., 2016) rmdup v2.9.0. Sequences of length below 400 amino acids were removed. Clustering was performed to further reduce data size, by using cd-hit v4.8.1 (Li et al., 2002; Li et al., 2001; Li and Godzik, 2006) and a global sequence identity of 99% or 90%, the latter only used for tree generation. It was necessary to have high sequence similarity in the clustering within organisms, since it was known that some organisms have multiple CODH with striking sequence similarities in their genome such as Clostridium pasteurianum BC1 (taxid: 86416) that contains WP_015614757.1 and WP_015615315.1 with 93.27% similarity. For the dataset involved in neighbor analysis, taxonomic information for each sequence was retrieved using R-packages taxize v0.10.0 (Chamberlain et al., 2020; Chamberlain and Szöcs, 2013) and taxizedb v0.3.2 (Chamberlain et al., 2025), and only sequences that could be related to a recorded organism were kept (Supplementary file 3 and Supplementary file 4). Sequences were aligned using E-INS-I from mafft v7.526 (Katoh and Standley, 2013) and sequences that had gaps in important positions related to D, B, or C cluster or acid-base active site residues were sorted out. The alignment was trimmed using trimAl’s (Capella-Gutiérrez et al., 2009) automated1 option v1.4.rev22 and a tree was generated using FastTree v2.2 (Price et al., 2010). Via visual inspection, further sequences that were not CODH sequences were removed. Leaves from unusually long branches were examined manually. If a sequence from such a branch was annotated as a protein other than hydroxylamine reductase or carbon monoxide dehydrogenase (CODH), it was investigated further. We assessed whether the protein length exceeded 400 amino acids and whether the key clusters were present. If these criteria still yielded ambiguous results, the protein structure was predicted using AlphaFold 3 (Abramson et al., 2024) to determine whether it adopted the characteristic CODH fold. Sequences lacking this fold were discarded. The final list of CODH sequences used in the neighbor and correlation analysis counted 1376. A similar approach was done for HCP (class I, Q01770.2; class II, WP_000458809.1; class III, WP_013294878.1), and a final count of 1545 sequences was collected for neighbor and correlation analysis. We applied the same procedure described for the previous dataset to a second set of CODH genes curated at 90% cd-hit identity, yielding 5508 sequences. See Appendix 1—figure 1 for detailed flowchart. Custom code can be found and retrieved for GitHub (https://github.com/boehmax; Böhm, 2025a; Böhm, 2025b; Böhm, 2025c; Böhm, 2025d).

Neighbor analysis

Genome information was downloaded for the accession lists generated for CODH and HCP. Therewith, 955 and 1425 genomes were downloaded, respectively, from NCBI’s genome database using NCBI-datasets command line tools v18.5.0 (O’Leary et al., 2024). Neighboring genes were defined as those located within a maximum of 15 genes upstream or downstream of the target gene, with an intergenic distance not exceeding 300 base pairs (bp), as was done previously by Inoue et al., 2018. We decided to use this relatively large intergenic distance to include as many neighbors as possible, and we expect that unrelated genes will disappear in the noise. For the same reason, we included an overlap region of 50 bp for genes in the same operon, which is rather high, as genes, for example, in Escherichia coli usually overlap 1–4 bp (Johnson and Chisholm, 2004). Amino acid sequences for those genes were retrieved from the NCBI nr protein database using Entrez v23.5 (Sayers, 2022). Their function was predicted using eggNOG v5.0 (Huerta-Cepas et al., 2019) and eggNOG-mapper v2.1.12 (Cantalapiedra et al., 2021). We considered results from eggNOG, as well as product predictions from NCBI, when manually assigning selected functional groups. The data was plotted using R v4.4.3 (R Development Core Team, 2023), tidyverse v2.0.0 (Wickham et al., 2019), patchwork v1.3.1 (Pedersen, 2025), ggnewscale v0.5.2 (Campitelli et al., 2025), ggtree v3.14.0 (Yu et al., 2018; Yu et al., 2017), ggtreeExtra v1.16.0 (Xu et al., 2021), treeio v1.30.0 (Wang et al., 2020), and gggenes v0.5.1 (Wilkins, 2023). Since CooJ determination was neither possible with the NCBI prediction nor via eggNOG, we selected operons from clades E and F that contained CooS and CooT and manually extracted accession numbers of potential CooJs which were used to search for further accession numbers using PSI-BLAST (BLOSUM45, E<0.001, NCBI online web server, nr database accessed 2025-04-16). The summary can be found in Supplementary file 1. These accessions were used to help annotate potential CooJs in our analysis. We could identify 68 potential CooJ genes.

Correlation analysis

Correlation coefficients of CODH and HCP from different clades/classes were calculated according to the formula

P (X | Y) = \frac{N_{X Y}}{N_{Y}},

where N_Y is the total number of assemblies containing protein from clade/class Y, N_XY is the total number of assemblies containing proteins from both clade/class X and Y, and P(X|Y) is the probability that a genome coding for a protein from clade/class Y also codes for a protein from clade/class X.

Tree generation

In total, five trees were generated. Trees carrying phylogenetic information were generated via IQ-TREE v2.0.7 (Minh et al., 2020) with the LG+I+R10 model and ultrafast bootstrapping with 1000 resampling for a dataset of 5508 CODH sequences, a dataset of 1351 CODH sequences, and a dataset of 1476 HCP sequences (see above for details on their generation). For the 5508 sequence CODH dataset, an outgroup was introduced to root the tree (MBE6442607.1). Sequences were aligned within their dataset using mafft’s FFT-NS-2 v7.526. The alignment was again trimmed using trimAl v1.4.rev22 and built using IQ-TREE v2.0.7 with the above parameters. For tree inspection and plotting, ggtree v3.14.0 (Yu et al., 2017) was used. The two other trees generated are taxonomic trees, either only on taxid using a custom Python script and ete3 v3.1.3 (Huerta-Cepas et al., 2016), or from WoL: Reference Phylogeny for Microbes (Zhu, 2023; Zhu et al., 2019). Clades of CODH were defined as described previously by others (Inoue et al., 2022; Inoue et al., 2018; Techtmann et al., 2012). Our tree showed similar topology with bootstraps of >75% for all clades.

Appendix 1

Appendix 1—figure 1

Download asset Open asset

Flowchart for the data collection and refinement process.

Numbers in braces refer to the number of sequences in each step, the first number referring to putative CODH sequences, the second number refers to putative HCP sequences. Custom scripts are available online (see Böhm, 2025a; Böhm, 2025b; Böhm, 2025c; Böhm, 2025d).

Data availability

All codes for bioinformatic analysis presented in this paper are openly accessible at Zenodo under the following DOI's; https://doi.org/10.5281/zenodo.16736767, https://doi.org/10.5281/zenodo.16736754, https://doi.org/10.5281/zenodo.16736722, https://doi.org/10.5281/zenodo.16744414.

References

1. Abramson J
2. Adler J
3. Dunger J
4. Evans R
5. Green T
6. Pritzel A
7. Ronneberger O
8. Willmore L
9. Ballard AJ
10. Bambrick J
11. Bodenstein SW
12. Evans DA
13. Hung C-C
14. O’Neill M
15. Reiman D
16. Tunyasuvunakool K
17. Wu Z
18. Žemgulytė A
19. Arvaniti E
20. Beattie C
21. Bertolli O
22. Bridgland A
23. Cherepanov A
24. Congreve M
25. Cowen-Rivers AI
26. Cowie A
27. Figurnov M
28. Fuchs FB
29. Gladman H
30. Jain R
31. Khan YA
32. Low CMR
33. Perlin K
34. Potapenko A
35. Savy P
36. Singh S
37. Stecula A
38. Thillaisundaram A
39. Tong C
40. Yakneen S
41. Zhong ED
42. Zielinski M
43. Žídek A
44. Bapst V
45. Kohli P
46. Jaderberg M
47. Hassabis D
48. Jumper JM
(2024) Accurate structure prediction of biomolecular interactions with AlphaFold 3
Nature 630:493–500.

https://doi.org/10.1038/s41586-024-07487-w
- PubMed
- Google Scholar
(2018) Evolutionary history of carbon monoxide dehydrogenase/acetyl-CoA synthase, one of the oldest enzymatic complexes
PNAS 115:E5836–E5837.

https://doi.org/10.1073/pnas.1716667115
- Google Scholar
1. Babbitt PC
2. Hasson MS
3. Wedekind JE
4. Palmer DRJ
5. Barrett WC
6. Reed GH
7. Rayment I
8. Ringe D
9. Kenyon GL
10. Gerlt JA
(1996) The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids
Biochemistry 35:16489–16501.

https://doi.org/10.1021/bi9616413
- PubMed
- Google Scholar
1. Bao T
2. Cheng C
3. Xin X
4. Wang J
5. Wang M
6. Yang ST
(2019) Deciphering mixotrophic Clostridium formicoaceticum metabolism and energy conservation: genomic analysis and experimental studies
Genomics 111:1687–1694.

https://doi.org/10.1016/j.ygeno.2018.11.020
- PubMed
- Google Scholar
1. Basak Y
2. Lorent C
3. Jeoung JH
4. Zebger I
5. Dobbek H
(2025) Metalloradical-driven enzymatic CO2 reduction by a dynamic Ni–Fe cluster
Nature Catalysis 8:794–803.

https://doi.org/10.1038/s41929-025-01388-5
- Google Scholar
1. Benvenuti M
2. Meneghello M
3. Guendon C
4. Jacq-Bailly A
5. Jeoung JH
6. Dobbek H
7. Léger C
8. Fourmond V
9. Dementin S
(2020) The two CO-dehydrogenases of thermococcus sp. AM4
Biochimica et Biophysica Acta (BBA) - Bioenergetics 1861:148188.

https://doi.org/10.1016/j.bbabio.2020.148188
- Google Scholar
(2024) Capturing a methanogenic carbon monoxide dehydrogenase/acetyl-CoA synthase complex via cryogenic electron microscopy
PNAS 121:e2410995121.

https://doi.org/10.1073/pnas.2410995121
- Google Scholar
Software
1. Böhm M
(2025a) Protein-to-genome
Zenodo.

https://doi.org/10.5281/zenodo.16736767
Software
1. Böhm M
(2025b) Protein-per-organism
Zenodo.

https://doi.org/10.5281/zenodo.16736754
Software
1. Böhm M
(2025c) Protein-neighbours
Zenodo.

https://doi.org/10.5281/zenodo.16736722
Software
1. Böhm M
(2025d) Filter-gaps
Zenodo.

https://doi.org/10.5281/zenodo.16744414
Software
(2025) ggnewscale: multiple fill and color scales in ggplot2, version v0.5.2
Zenodo.

https://doi.org/10.5281/zenodo.2543762
(2014) Structure, function, and mechanism of the nickel metalloenzymes, CO dehydrogenase, and acetyl-CoA synthase
Chemical Reviews 114:4149–4174.

https://doi.org/10.1021/cr400461p
- PubMed
- Google Scholar
(2021) eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale
Molecular Biology and Evolution 38:5825–5829.

https://doi.org/10.1093/molbev/msab293
- PubMed
- Google Scholar
(2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses
Bioinformatics 25:1972–1973.

https://doi.org/10.1093/bioinformatics/btp348
- PubMed
- Google Scholar
1. Chamberlain SA
2. Szöcs E
(2013) taxize: taxonomic search and retrieval in R
F1000Research 2:191.

https://doi.org/10.12688/f1000research.2-191.v2
- PubMed
- Google Scholar
Software
1. Chamberlain S
2. Szoecs E
3. Foster Z
4. Arendsee Z
5. Boettiger C
6. Ram K
7. Bartomeus I
8. Baumgartner J
9. O’Donnell J
10. Oksanen J
11. Tzovaras BG
12. Marchand P
13. Tran V
14. Salmon M
15. Li G
16. Grenié M
(2020) taxize: taxonomic information from around the web (manual), version 4c3e4a1
Github.

https://github.com/ropensci/taxize
Software
(2025) taxizedb: tools for working with “taxonomic” databases, version v0.3.1
Zenodo.

https://doi.org/10.5281/zenodo.1158055
1. Collins MD
2. Lawson PA
3. Willems A
4. Cordoba JJ
5. Fernandez-Garayzabal J
6. Garcia P
7. Cai J
8. Hippe H
9. Farrow JA
(1994) The phylogeny of the genus Clostridium: proposal of five new genera and eleven new species combinations
International Journal of Systematic Bacteriology 44:812–826.

https://doi.org/10.1099/00207713-44-4-812
- PubMed
- Google Scholar
1. Diekert GB
2. Thauer RK
(1978) Carbon monoxide oxidation by Clostridium thermoaceticum and Clostridium formicoaceticum
Journal of Bacteriology 136:597–606.

https://doi.org/10.1128/jb.136.2.597-606.1978
- PubMed
- Google Scholar
1. Domnik L
2. Merrouch M
3. Goetzl S
4. Jeoung J-H
5. Léger C
6. Dementin S
7. Fourmond V
8. Dobbek H
(2017) CODH-IV: a high-efficiency co-scavenging co dehydrogenase with resistance to O₂
Angewandte Chemie 56:15466–15469.

https://doi.org/10.1002/anie.201709261
- PubMed
- Google Scholar
1. Fox JD
2. He Y
3. Shelver D
4. Roberts GP
5. Ludden PW
(1996) Characterization of the region encoding the CO-induced hydrogenase of Rhodospirillum rubrum
Journal of Bacteriology 178:6200–6208.

https://doi.org/10.1128/jb.178.21.6200-6208.1996
- PubMed
- Google Scholar
(2023) Obligate autotrophy at the thermodynamic limit of life in a new acetogenic bacterium
Frontiers in Microbiology 14:1185739.

https://doi.org/10.3389/fmicb.2023.1185739
- PubMed
- Google Scholar
1. Fujishiro T
2. Takaoka K
(2023) Class III hybrid cluster protein homodimeric architecture shows evolutionary relationship with Ni, Fe-carbon monoxide dehydrogenases
Nature Communications 14:5609.

https://doi.org/10.1038/s41467-023-41289-4
- PubMed
- Google Scholar
1. Gong W
2. Hao B
3. Wei Z
4. Ferguson DJ
5. Tallant T
6. Krzycki JA
7. Chan MK
(2008) Structure of the alpha2epsilon2 Ni-dependent CO dehydrogenase component of the Methanosarcina barkeri acetyl-CoA decarbonylase/synthase complex
PNAS 105:9558–9563.

https://doi.org/10.1073/pnas.0800415105
- PubMed
- Google Scholar
1. Gtari M
2. Ventura S
(2025) Proposal of Neomoorella gen. nov. as a replacement name for the illegitimate prokaryotic genus name Moorella Collins et al. 1994
International Journal of Systematic and Evolutionary Microbiology 75:006779.

https://doi.org/10.1099/ijsem.0.006779
- Google Scholar
(2015) The carbon monoxide dehydrogenase from desulfovibrio vulgaris
Biochimica et Biophysica Acta 1847:1574–1583.

https://doi.org/10.1016/j.bbabio.2015.08.002
- PubMed
- Google Scholar
1. Hagen WR
(2022) Structure and function of the hybrid cluster protein
Coordination Chemistry Reviews 457:214405.

https://doi.org/10.1016/j.ccr.2021.214405
- Google Scholar
(2015) Assessment of the carbon monoxide metabolism of the hyperthermophilic sulfate-reducing archaeon archaeoglobus fulgidus VC-16 by comparative transcriptome analyses
Archaea 2015:235384.

https://doi.org/10.1155/2015/235384
- PubMed
- Google Scholar
(2016) ETE 3: reconstruction, analysis, and visualization of phylogenomic data
Molecular Biology and Evolution 33:1635–1638.

https://doi.org/10.1093/molbev/msw046
- PubMed
- Google Scholar
1. Huerta-Cepas J
2. Szklarczyk D
3. Heller D
4. Hernández-Plaza A
5. Forslund SK
6. Cook H
7. Mende DR
8. Letunic I
9. Rattei T
10. Jensen LJ
11. von Mering C
12. Bork P
(2019) eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses
Nucleic Acids Research 47:D309–D314.

https://doi.org/10.1093/nar/gky1085
- PubMed
- Google Scholar
1. Inoue T
2. Takao K
3. Fukuyama Y
4. Yoshida T
5. Sako Y
(2014) Over-expression of carbon monoxide dehydrogenase-I with an accessory protein co-expression: a key enzyme for carbon dioxide reduction
Bioscience, Biotechnology, and Biochemistry 78:582–587.

https://doi.org/10.1080/09168451.2014.890027
- PubMed
- Google Scholar
1. Inoue M
2. Nakamoto I
3. Omae K
4. Oguro T
5. Ogata H
6. Yoshida T
7. Sako Y
(2018) Structural and phylogenetic diversity of anaerobic carbon-monoxide dehydrogenases
Frontiers in Microbiology 9:3353.

https://doi.org/10.3389/fmicb.2018.03353
- PubMed
- Google Scholar
1. Inoue M
2. Omae K
3. Nakamoto I
4. Kamikawa R
5. Yoshida T
6. Sako Y
(2022) Biome-specific distribution of Ni-containing carbon monoxide dehydrogenases
Extremophiles 26:9.

https://doi.org/10.1007/s00792-022-01259-y
- PubMed
- Google Scholar
1. Jain S
2. Katsyv A
3. Basen M
4. Müller V
(2022) The monofunctional CO dehydrogenase CooS is essential for growth of Thermoanaerobacter kivui on carbon monoxide
Extremophiles 26:4.

https://doi.org/10.1007/s00792-021-01251-y
- PubMed
- Google Scholar
1. Jeoung JH
2. Dobbek H
(2007) Carbon dioxide activation at the Ni,Fe-cluster of anaerobic carbon monoxide dehydrogenase
Science 318:1461–1464.

https://doi.org/10.1126/science.1148481
- PubMed
- Google Scholar
1. Jeoung JH
2. Fesseler J
3. Domnik L
4. Klemke F
5. Sinnreich M
6. Teutloff C
7. Dobbek H
(2022) A morphing [4Fe-3S-nO]-cluster within a carbon monoxide dehydrogenase scaffold
Angewandte Chemie 61:e202117000.

https://doi.org/10.1002/anie.202117000
- PubMed
- Google Scholar
1. Johnson ZI
2. Chisholm SW
(2004) Properties of overlapping genes are conserved across microbial genomes
Genome Research 14:2268–2272.

https://doi.org/10.1101/gr.2433104
- PubMed
- Google Scholar
(2024) Phylogenetic diversity of putative nickel-containing carbon monoxide dehydrogenase-encoding prokaryotes in the human gut microbiome
Microbial Genomics 10:001285.

https://doi.org/10.1099/mgen.0.001285
- PubMed
- Google Scholar
1. Katoh K
2. Standley DM
(2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability
Molecular Biology and Evolution 30:772–780.

https://doi.org/10.1093/molbev/mst010
- PubMed
- Google Scholar
(1997) In vivo nickel insertion into the carbon monoxide dehydrogenase of Rhodospirillum rubrum: molecular and physiological characterization of cooCTJ
Journal of Bacteriology 179:2259–2266.

https://doi.org/10.1128/jb.179.7.2259-2266.1997
- PubMed
- Google Scholar
1. Kim MS
2. Choi AR
3. Lee SH
4. Jung HC
5. Bae SS
6. Yang TJ
7. Jeon JH
8. Lim JK
9. Youn H
10. Kim TW
11. Lee HS
12. Kang SG
(2015) A novel CO-responsive transcriptional regulator and enhanced H2 production by an engineered Thermococcus onnurineus NA1 strain
Applied and Environmental Microbiology 81:1708–1714.

https://doi.org/10.1128/AEM.03019-14
- PubMed
- Google Scholar
1. Knox HL
2. Allen KN
(2023) Expanding the viewpoint: leveraging sequence information in enzymology
Current Opinion in Chemical Biology 72:102246.

https://doi.org/10.1016/j.cbpa.2022.102246
- PubMed
- Google Scholar
(2001) Clustering of highly homologous sequences to reduce the size of large protein databases
Bioinformatics 17:282–283.

https://doi.org/10.1093/bioinformatics/17.3.282
- PubMed
- Google Scholar
(2002) Tolerating some redundancy significantly speeds up clustering of large protein databases
Bioinformatics 18:77–82.

https://doi.org/10.1093/bioinformatics/18.1.77
- PubMed
- Google Scholar
1. Li W
2. Godzik A
(2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
Bioinformatics 22:1658–1659.

https://doi.org/10.1093/bioinformatics/btl158
- PubMed
- Google Scholar
1. Liew F
2. Henstra AM
3. Winzer K
4. Köpke M
5. Simpson SD
6. Minton NP
(2016) Insights into CO2 fixation pathway of clostridium autoethanogenum by targeted mutagenesis
mBio 7:00427-16.

https://doi.org/10.1128/mBio.00427-16
- PubMed
- Google Scholar
1. Lindahl PA
2. Chang B
(2001) The evolution of acetyl-CoA synthase
Origins of Life and Evolution of the Biosphere 31:403–434.

https://doi.org/10.1023/a:1011809430237
- PubMed
- Google Scholar
Book
1. Madden T
(2013)
The BLAST sequence analysis tool

In: Madden T, editors. The NCBI Handbook. National Center for Biotechnology Information (US). pp. 1–15.
- Google Scholar
(2012) Function and regulation of isoforms of carbon monoxide dehydrogenase/acetyl coenzyme A synthase in Methanosarcina acetivorans
Journal of Bacteriology 194:5377–5387.

https://doi.org/10.1128/JB.00881-12
- PubMed
- Google Scholar
(2011) Anaerobic carbon monoxide dehydrogenase diversity in the homoacetogenic hindgut microbial communities of lower termites and the wood roach
PLOS ONE 6:e19316.

https://doi.org/10.1371/journal.pone.0019316
- PubMed
- Google Scholar
(2018) Maturation of the [Ni-4Fe-4S] active site of carbon monoxide dehydrogenases
Journal of Biological Inorganic Chemistry 23:613–620.

https://doi.org/10.1007/s00775-018-1541-0
- PubMed
- Google Scholar
(2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era
Molecular Biology and Evolution 37:1530–1534.

https://doi.org/10.1093/molbev/msaa015
- PubMed
- Google Scholar
1. O’Leary NA
2. Cox E
3. Holmes JB
4. Anderson WR
5. Falk R
6. Hem V
7. Tsuchiya MTN
8. Schuler GD
9. Zhang X
10. Torcivia J
11. Ketter A
12. Breen L
13. Cothran J
14. Bajwa H
15. Tinne J
16. Meric PA
17. Hlavina W
18. Schneider VA
(2024) Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets
Scientific Data 11:732.

https://doi.org/10.1038/s41597-024-03571-y
- PubMed
- Google Scholar
Software
1. Pedersen TL
(2025) patchwork: the composer of plots, version 1.3.2.9000
Patchwork.

https://patchwork.data-imaginist.com
(2010) FastTree 2--approximately maximum-likelihood trees for large alignments
PLOS ONE 5:e9490.

https://doi.org/10.1371/journal.pone.0009490
- PubMed
- Google Scholar
Software
1. R Development Core Team
(2023) R: a language and environment for statistical computing
R Foundation for Statistical Computing, Vienna, Austria.

https://www.R-project.org/
1. Sayers E
(2022)
Entrez Programming Utilities

A general introduction to the e-utilities, Entrez Programming Utilities, National Center for Biotechnology Information (US).
- Google Scholar
1. Shen W
2. Le S
3. Li Y
4. Hu F
(2016) SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation
PLOS ONE 11:e0163962.

https://doi.org/10.1371/journal.pone.0163962
- PubMed
- Google Scholar
1. Smith KS
2. Ingram-Smith C
(2007) Methanosaeta, the forgotten methanogen?
Trends in Microbiology 15:150–155.

https://doi.org/10.1016/j.tim.2007.02.002
- PubMed
- Google Scholar
(2002) Purification and catalytic properties of a CO-oxidizing:H2-evolving enzyme complex from Carboxydothermus hydrogenoformans
European Journal of Biochemistry 269:5712–5721.

https://doi.org/10.1046/j.1432-1033.2002.03282.x
- PubMed
- Google Scholar
1. Techtmann SM
2. Lebedinsky AV
3. Colman AS
4. Sokolova TG
5. Woyke T
6. Goodwin L
7. Robb FT
(2012) Evidence for horizontal gene transfer of anaerobic carbon monoxide dehydrogenases
Frontiers in Microbiology 3:132.

https://doi.org/10.3389/fmicb.2012.00132
- PubMed
- Google Scholar
1. Terlesky KC
2. Ferry JG
(1988) Ferredoxin requirement for electron transport from the carbon monoxide dehydrogenase complex to a membrane-bound hydrogenase in acetate-grown Methanosarcina thermophila
The Journal of Biological Chemistry 263:4075–4079.

https://doi.org/10.1016/S0021-9258(18)68892-1
- PubMed
- Google Scholar
1. Waite DW
2. Chuvochina M
3. Pelikan C
4. Parks DH
5. Yilmaz P
6. Wagner M
7. Loy A
8. Naganuma T
9. Nakai R
10. Whitman WB
11. Hahn MW
12. Kuever J
13. Hugenholtz P
(2020) Proposal to reclassify the proteobacterial classes Deltaproteobacteria and Oligoflexia, and the phylum Thermodesulfobacteria into four phyla reflecting major functional capabilities
International Journal of Systematic and Evolutionary Microbiology 70:5972–6016.

https://doi.org/10.1099/ijsem.0.004213
- PubMed
- Google Scholar
1. Wang LG
2. Lam TTY
3. Xu S
4. Dai Z
5. Zhou L
6. Feng T
7. Guo P
8. Dunn CW
9. Jones BR
10. Bradley T
11. Zhu H
12. Guan Y
13. Jiang Y
14. Yu G
(2020) Treeio: an r package for phylogenetic tree input and output with richly annotated and associated data
Molecular Biology and Evolution 37:599–603.

https://doi.org/10.1093/molbev/msz240
- PubMed
- Google Scholar
1. Weghoff MC
2. Müller V
(2016) CO metabolism in the thermophilic acetogen thermoanaerobacter kivui
Applied and Environmental Microbiology 82:2312–2319.

https://doi.org/10.1128/AEM.00122-16
- PubMed
- Google Scholar
1. Welte C
2. Deppenmeier U
(2011) Membrane-bound electron transport in Methanosaeta thermophila
Journal of Bacteriology 193:2868–2870.

https://doi.org/10.1128/JB.00162-11
- PubMed
- Google Scholar
1. Wickham H
2. Averick M
3. Bryan J
4. Chang W
5. McGowan L
6. François R
7. Grolemund G
8. Hayes A
9. Henry L
10. Hester J
11. Kuhn M
12. Pedersen T
13. Miller E
14. Bache S
15. Müller K
16. Ooms J
17. Robinson D
18. Seidel D
19. Spinu V
20. Takahashi K
21. Vaughan D
22. Wilke C
23. Woo K
24. Yutani H
(2019) Welcome to the tidyverse
Journal of Open Source Software 4:1686.

https://doi.org/10.21105/joss.01686
- Google Scholar
Software
1. Wilkins D
(2023) gggenes: draw gene arrow maps in “ggplot2", version 0.6.0
CRAN.

https://wilkox.org/gggenes/
(2020) The solvent-exposed Fe-S D-Cluster Contributes to oxygen-resistance in Desulfovibrio vulgaris Ni-Fe carbon monoxide dehydrogenase
ACS Catalysis 10:7328–7335.

https://doi.org/10.1021/acscatal.0c00934
- PubMed
- Google Scholar
1. Wu M
2. Ren Q
3. Durkin AS
4. Daugherty SC
5. Brinkac LM
6. Dodson RJ
7. Madupu R
8. Sullivan SA
9. Kolonay JF
10. Nelson WC
11. Tallon LJ
12. Jones KM
13. Ulrich LE
14. Gonzalez JM
15. Zhulin IB
16. Robb FT
17. Eisen JA
(2005) Life in hot carbon monoxide: the complete genome sequence of carboxydothermus hydrogenoformans Z-2901
PLOS Genetics 1:e65.

https://doi.org/10.1371/journal.pgen.0010065
- Google Scholar
1. Xu S
2. Dai Z
3. Guo P
4. Fu X
5. Liu S
6. Zhou L
7. Tang W
8. Feng T
9. Chen M
10. Zhan L
11. Wu T
12. Hu E
13. Jiang Y
14. Bo X
15. Yu G
(2021) ggtreeExtra: compact visualization of richly annotated phylogenetic data
Molecular Biology and Evolution 38:4039–4042.

https://doi.org/10.1093/molbev/msab166
- PubMed
- Google Scholar
1. Yaniv M
(2011) The 50th anniversary of the publication of the operon theory in the Journal of Molecular Biology: past, present and future
Journal of Molecular Biology 409:1–6.

https://doi.org/10.1016/j.jmb.2011.03.041
- PubMed
- Google Scholar
1. Yu G
2. Smith DK
3. Zhu H
4. Guan Y
5. Lam TTY
(2017) ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data
Methods in Ecology and Evolution 8:28–36.

https://doi.org/10.1111/2041-210X.12628
- Google Scholar
1. Yu G
2. Lam TTY
3. Zhu H
4. Guan Y
(2018) Two methods for mapping and visualizing associated data on phylogeny using ggtree
Molecular Biology and Evolution 35:3041–3043.

https://doi.org/10.1093/molbev/msy194
- PubMed
- Google Scholar
1. Zhou A
2. Chen YI
3. Zane GM
4. He Z
5. Hemme CL
6. Joachimiak MP
7. Baumohl JK
8. He Q
9. Fields MW
10. Arkin AP
11. Wall JD
12. Hazen TC
13. Zhou J
(2012) Functional characterization of Crp/Fnr-type global transcriptional regulators in Desulfovibrio vulgaris Hildenborough
Applied and Environmental Microbiology 78:1168–1177.

https://doi.org/10.1128/AEM.05666-11
- PubMed
- Google Scholar
1. Zhu Q
2. Mai U
3. Pfeiffer W
4. Janssen S
5. Asnicar F
6. Sanders JG
7. Belda-Ferre P
8. Al-Ghalith GA
9. Kopylova E
10. McDonald D
11. Kosciolek T
12. Yin JB
13. Huang S
14. Salam N
15. Jiao JY
16. Wu Z
17. Xu ZZ
18. Cantrell K
19. Yang Y
20. Sayyari E
21. Rabiee M
22. Morton JT
23. Podell S
24. Knights D
25. Li WJ
26. Huttenhower C
27. Segata N
28. Smarr L
29. Mirarab S
30. Knight R
(2019) Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea
Nature Communications 10:5477.

https://doi.org/10.1038/s41467-019-13443-4
- PubMed
- Google Scholar
Website
1. Zhu Q
(2023) WoL: reference phylogeny for microbes
Accessed April 5, 2019.

https://biocore.github.io/wol/

Article and author information

Author details

Maximilian Böhm

Molecular Biomimetics, Department of Chemistry – Ångström Laboratory, Uppsala University, Uppsala, Sweden

Contribution
Conceptualization, Investigation, Methodology, Writing – original draft, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-0205-8030
Henrik Land

Molecular Biomimetics, Department of Chemistry – Ångström Laboratory, Uppsala University, Uppsala, Sweden

Contribution
Conceptualization, Supervision, Funding acquisition, Project administration, Writing – review and editing

For correspondence
henrik.land@kemi.uu.se

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-3073-5641

Funding

Novo Nordisk Fonden (NNF21OC0066716)

Maximilian Böhm
Henrik Land

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The Novo Nordisk Foundation (Grant reference number NNF21OC0066716) is gratefully acknowledged for funding.

Version history

Sent for peer review: September 14, 2025
Preprint posted: September 19, 2025
Reviewed Preprint version 1: January 2, 2026
Reviewed Preprint version 2: March 2, 2026
Version of Record published: April 7, 2026

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.108780. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.