Investigating the native functions of [NiFe]-CODH through genomic context analysis

Maximilian Böhm; Henrik Land

doi:10.7554/eLife.108780.2

eLife Assessment

This valuable work analyzes a large dataset of [NiFe]-CODHs, integrating genomic context, operon organization, and clade-specific gene neighborhoods to discern patterns of diversification and adaptation. A consistent examination of CODH genomic contexts, including CODH-HCP co-occurrence, informs interpretations of enzymatic activity, biotechnological potential, and differential functional roles, in line with current standards in genomic enzymology. With solid support, this work provides a broadly informative contribution to the field.

https://doi.org/10.7554/eLife.108780.2.sa3

Significance of findings

valuable: Findings that have theoretical or practical implications for a subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

solid: Methods, data and analyses broadly support the claims with only minor weaknesses

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Carbon monoxide dehydrogenases containing nickel-iron active sites ([NiFe]-CODHs) catalyze the reversible oxidation of CO to CO₂, representing key targets for biocatalytic CO₂ reduction. Despite dramatic differences in catalytic rates and O₂ tolerance between CODH variants, the molecular basis for this functional diversity remains poorly understood. We applied comparative genomics and synteny analysis to investigate the biochemical roles of CODH clades A-F using 1,376 CODH and 1,545 hybrid cluster protein sequences. Around 30% of genomes encode multiple CODH isoforms. Analysis revealed distinct gene clustering patterns correlating with biochemical function. Clades A, E, and F exhibit a degree of distributional exclusivity. Clades C and D frequently co-occur with active CODHs, suggesting auxiliary roles. Operon architecture analysis revealed functional specialization: clade A links to acetyl-CoA synthase; clades A, E and F contain essential maturation machinery (CooC, CooJ, CooT) correlating with catalytic activity; clade B associates with transporters; clade C with electron transfer partners; clade D with transcriptional regulators. High CODH-HCP co-occurrence (except clade A) suggests functional or environmental interdependency. These findings establish clades A, E and F as primary biocatalyst targets while defining regulatory functions for clades C and D, providing a genomics framework for predicting CODH phenotypes.

Introduction

Genomic enzymology has been proven to help understand protein (super) families since the mid-1990s, helping to connect enzyme sequences to function through comparative genomics and neighborhood analysis (Babbitt et al., 1996; Knox and Allen, 2023). In this study, we are employing a genome neighborhood and co-occurrence analysis to help understand reactivity and functionality of the family of nickel containing carbon monoxide dehydrogenases ([NiFe]-CODHs) and their relationship to hybrid cluster proteins (HCPs).

[NiFe]-CODHs are ancient and diverse enzymes that catalyze the interconversion between carbon monoxide (CO) and carbon dioxide (CO₂), a reaction of high interest for biotechnological applications, including CO₂ capture and conversion. Research on this enzyme spans over 60 years, and recent studies have provided important biochemical insights, such as their turnover frequency, oxygen tolerance and catalytic mechanism (Basak et al., 2025; Can et al., 2014). These properties vary greatly between enzymes, not only between separate phylogenetic clades, but also within clades, making functional prediction from sequence alone challenging. Phylogenetic analyses of available [NiFe]-CODH (hereafter referred to as CODH) sequences with different focus such as gene transfer (Techtmann et al., 2012), primary structure (Inoue et al., 2018), biome distribution (Inoue et al., 2022) and human gut microbiome (Katayama et al., 2024) have enriched our understanding of this old and diverse enzyme family. From initially small data sets of 17 sequences (Lindahl and Chang, 2001) to datasets well above 5000 sequences (this study). It has been shown that up to eight distinct phylogenetic clades (Figure 1) can be distinguished with all of them having sequence variations while preserving the overall fold, as seen with cryo-electron microscopy (Biester et al., 2024) and x-ray crystallography (Basak et al., 2025; Domnik et al., 2017; Gong et al., 2008; Jeoung et al., 2022; Jeoung and Dobbek, 2007; Wittenborn et al., 2020).

Schematic phylogenetic tree of [NiFe]-CODH, with selected CODHs marked with their respective position. Tree was built using IQ-TREE, 1000 ultrafast bootstrap, containing 5508 putative CODH sequences and one outgroup (MBE6442607.1 hydroxylamine reductase [*Desulfovibrio desulfuricans*]) for rooting. Detailed searchable tree with bootstrap values can be found in Supplementary File 9_Tree5.

The biochemical characterization of this enzyme family is still ongoing, and it shows a wide range of turnover frequencies as well as different degrees of O₂ tolerance. For example, looking at two CODHs from Carboxydothermus hydrogenoformans: ChCODH-II, a benchmark CODH known for its high CO oxidation activity but low O₂ tolerance, contrasts with ChCODH-IV — another enzyme from the same clade and organism — which retains 20% of activity after 1 h of O₂ exposure but displays reduced CO oxidation capacity with an increased activation barrier and much lower K_M (Domnik et al., 2017), both belonging to clade F (Fig. 1). Similarly, a less active CODH from clade E, NvCODH (formerly known as DvCODH) from Nitratidesulfovibrio vulgaris, has been reported to fully reactivate after initial inactivation by O₂ exposure (Hadj-Said et al., 2015). Also, the two CODHs from Thermococcus sp. AM4, TcCODH-I and TcCODH-II belonging to clade E react slower with O₂ compared to ChCODH-II, but have overall equal O₂ sensitivity (Benvenuti et al., 2020).

In addition to the previously mentioned diversity within CODH’s clades or organisms, with regards to activity and oxygen tolerance, it is known that some CODHs rely on maturases for full activation while others do not. For example, RrCODH, from the phototroph Rhodospirillum rubrum, needs to be expressed together with three maturases (CooC, CooJ and CooT) in order to be isolated in an active form (Kerby et al., 1997). A similar situation arises for NvCODH, however, its genomic neighborhood (Fig. 2) only contains one maturase (CooC) which is required for active production (Hadj-Said et al., 2015). On the contrary, ChCODH-II can be heterologously expressed without co-expression of any maturases (Merrouch et al., 2018). Also, ChCODH-I needs to be co-expressed with CooC in order to reach high activity but it can also be expressed without it, albeit with reduced activity (Inoue et al., 2014). Interestingly, much of the diversity with regards to activity, O₂ tolerance and maturase dependence does not only occur between the different clades but also within them.

Operons of selected [NiFe]-CODH. * NtCODH formally known as MtCODH formerly known as CtCODH due to renaming of host organism (Gtari and Ventura, 2025). ** NvCODH formerly known as DvCODH due to renaming of host organism (Waite et al., 2020).

Due to the homology between the CODHs in this study and the fact that active CODHs have been demonstrated from several of the clades, it is reasonable to assume that CODHs from all clades are able to interconvert CO₂/CO. However, a recent study by Dobbek and co-workers showed that C. hydrogenoformans CODH-V (ChCODH-V) from clade D was not able to perform this reaction (Jeoung et al., 2022). They showed that this enzyme has a closer similarity to the family of hybrid cluster proteins (HCPs), due to its morphing active site, composed of iron, sulphur and oxygen, responding with structural and stoichiometric changes upon redox shifts. A connection between HCPs and CODHs has been pointed out previously by Inoue et al. (Inoue et al., 2018) due to their close phylogenetic relationship, and was further discussed by Fujishiro et al. (Fujishiro and Takaoka, 2023). HCPs can be divided in three phylogenetic classes, of which class III exhibits a homodimeric structure like CODH. Generally, the two enzyme families share a similar overall fold while their active sites differ greatly, both in terms of amino acid and metallocofactor composition. Similar to ChCODH-V, HCPs do not catalyze CO₂/CO interconversion but they do display a range of activities at low rate, such as hydroxylamine reductase-, peroxidase-, nitric oxide reductase- and S-nitrosylase activity. The main natural function of HCPs is debated but it was recently established that it is most likely a nitric oxide reductase involved in nitric oxide detoxification (Hagen, 2022).

In this study, we contribute to paint a wholistic phylogenetic picture of CODHs by focusing on the analysis of their genetic environment as well as harnessing the concept of synteny in which we use a semi quantitative approach to predict characteristics of CODH, clade- and subclade specific. We are presenting certain clade specific trends in the operon composition in CODH. We limit our search on operon composition because the majority of our data are procaryotic genomes, which link functional related genes physically in an operon (Yaniv, 2011). We furthermore wanted to limit noise and false positives as much as possible, therefore, excluding data not within our defined parameters (see Methods). Additionally, since many of the genomes included in this analysis are not completely sequenced or include multiple CODH from different clades, only including genes in close proximity to the target gene, ensures that the data is not distorted. Since it is known that many organisms have multiple isoforms of CODHs coded in their genome, we analyze the co-occurrence of CODH of different clades in an organism, as well as the co-occurrences of CODH and HCP. With our findings we want to propose a systematic approach in the analysis of new CODHs, with the focus on identifying promising CO₂ reduction catalysts, suitable for biotechnological application.

Results

Co-occurrence and Correlation

After evaluating the assemblies in regard to their count of CODH, we found that around 30% of all assemblies encode for more than one CODH. For HCP this number is much smaller, around 6%. Fig. 3A shows that the occurrence of multiple isoforms from specific clades within organisms varies. Clades B, C and D almost exclusively occur only once within a genome, while clades A, E and F are more likely to co-occur with another isoform from the same clade. The overall trend that we observe is most likely underrepresenting the number of genomes encoding multiple CODHs, since incomplete genomes are also included in these analyses. When calculating the correlation of the co-occurrence of CODH from two different clades in one organism, a pattern evolves (Fig. 3B). Most obvious is the low co-occurrence of clades A and E with F. However, clade F reversely has a higher co-occurrence with clade E due to the asymmetrical nature of the data. This is an effect of the different sizes of the clade datasets. Furthermore, all clades seem to have low co-occurrence with clade B. Due to the effect stated above, clade B does however show some co-occurrence with clades D and E. . Clade C shows strong co-occurrence with clade E while Clade D co-occurs with clades A, E and F. Clades C and D also have a higher probability to co-occur with each other. As outlined in the introduction, from biochemical studies it is known, that CODHs from clade A, E, and F are active whereas CO₂/CO interconversion activity is missing in CODH from clade D. This co-occurrence might suggest that the redox sensing properties of CODH from clade D (and potentially clade C) are useful for organisms already containing a functional CODH. Interestingly, a high co-occurrence was also seen for CODH and HCP, with an exception for clade A CODH.

(A) Frequency of X amounts of [NiFe]-CODH from the same clade co-occurring in the same organism. (B) Probability matrix of [NiFe]-CODH from different clades co-occurring in the same organism. The equation for probability can be found in the Method section. A long table format of the matrix can be found in Supplementary File 2_TableS1. Additional raw data can also be found in Supplementary File 1_Table S1.

Neighbor Analysis

We semi-quantitively evaluated the operon composition of 1351 CODHs (121 A, 130 B, 168 C, 253 D, 434 E, 245 F, Supplementary File 5_Tree1 and Supplementary File 6_Tree2) with proteins which function we could predict using eggNOG (Cantalapiedra et al., 2021; Huerta-Cepas et al., 2019), the NCBI product prediction and manual curation. The results are summarized in Fig. 4. Even though eight clades of CODH are known, we only present six in this analysis, since our quality measures excluded clade G and clade H CODH from the initial dataset because of poor assembly quality or lack of host information. In the following we only report on neighbors that are encoded in the same operon as more than 10% of CODHs per clade (see Supplementary file 2_Table S1). Starting with CODHs from clade A, 93% contain a one carbon pool related gene in their operon, followed by CooC (62%) and iron-sulfur (FeS) cluster containing protein (31%). As one carbon pool related gene, we defined genes associated either with direct conversion of one carbon compounds such as formate dehydrogenases, or with the Wood-Ljungdahl pathway. Clade B CODH operons mainly encode ABC transporter associated genes (64%). Furthermore, almost a quarter (24%) of all CODHs from clade B could not be associated with any neighbor, and 12% are coded close to transcriptional regulators. For CODH from clade C, the three main neighbors are proteins associated with FeS cluster containing proteins (such as CooF) (72%), NAD(P) or FAD dependent oxidoreductases (71%) and transcription (58%) or other (10%) regulation. The overall diversity of neighboring proteins from clade D, and the fact that a major part of those CODHs seemingly do not encode close to any other genes (64%) made it challenging to sum up their different codons, and no clear pattern could be observed. Only transcription regulation proteins (9.9%) and general regulatory proteins (9.5%) are worth mentioning in this context. Clade E and F both have a larger set of proteins frequently observed in their associated operons. Operons encoding either clade E or F CODH contain CooC like genes (59% E, 68% F), one carbon pool associated genes (49% E, 37% F) and FeS genes (29% E, 53% F). Transcription regulators (17% E, 35% F) and NAD(P)/FAD-dependent oxidoreductase (22% E, 42% F) have also been found. The maturation protein CooT was exclusively found in operons from clade E (16%) and F (6.1%). The same holds true for CooJ but in clade F, CooJ was seen in even fewer operons (12% E, <5.0% F). Additional Hydrogenases (25%) and their maturation machinery (17%) are coded primarily for clade F CODH, as well as different types of transporter proteins (11%).

(A) Phylogenetic tree of putative [NiFe]-CODH unrooted with 1376 sequences. The coloured rings mark if the operon from a protein/leaf contains one or more of a certain type of protein using the same coloring as in Fig. 4C. The types of proteins are from centre to outer ring: one carbon pool, CooC, CooT, CooJ, Ferredoxin, FeS, hydrogenase, hydrogenase-maturase, NAD(P)/FAD-dependent oxidoreductase, regulation, ABC transporter, transporter. Detailed searchable tree with bootstrap values can be found in Supplementary File 5_Tree1 and Supplementary File 6_Tree2. (B) Distribution of operon size for CODH and HCP genes. (C) Proportion of [NiFe]-CODH from one clade being coded near a certain type of protein. The dotted line marks a proportion of 10%, which we set as our significance cut-off. Raw data can be found in Supplementary File 1_Table S1.

We performed a similar analysis of the operons encoding for HCP and analyzed a total of 1476 HCP genes (class I: 1049, class II: 23, class III: 404, Supplementary File 7_Tree3 and Supplementary File 8_Tree4) showing a low frequency of isoforms within organisms (Supplementary file 1_Fig. S1). HCP exhibited a large variety of neighbors, leading to difficulties in extracting meaningful information from their operon composition (Supplementary file 1_Fig. S2). Furthermore, class I and III had a high proportion of entries without any neighbors (49% and 75%, respectively) which is reflected in their tendency to have fewer proteins coded in their operons (see Fig. 4B). However, we observed a high frequency of FeS cluster proteins (18%) and transcription regulators (17%) for class I, as well as NAD(P) or FAD oxidoreductases (96%) and transport proteins (78%) for class II HCP. It needs to be noted, however, that our sample set for class II HCP is very small, so its information value is considerably lower compared to the other classes/clades.

Discussion

Our analysis reveals substantial diversity in the occurrence, co-occurrence, and genomic context of CODH and HCP genes, suggesting complex evolutionary and functional relationships within and across microbial lineages. The observed differences in the frequency of multiple isoforms per genome (∼30% for CODH versus ∼6% for HCP) indicate that CODHs are more often retained in multiple copies, potentially pointing to functional diversification among its isoforms. Similar values have been reported by Techtmann et al., where they found that a striking 43% of organisms coded for more than one CODH (Techtmann et al., 2012). On the other hand, Katayama et al., investigating only the human gut microbiome found a number as low as 5.5% (Katayama et al., 2024). We suspect that this number underrepresents the amount of organisms carrying multiple isoforms of CODH in the human gut, since Katayama and co-workers performed data refinements that exclude potential CODHs (such as the strict requirement for a [4Fe-4S] D-cluster, even though Inoue et al. reported on the diversity of the D-cluster earlier (Inoue et al., 2018)).

There are many examples of organisms coding for multiple CODH isoforms, as outlined in Fig. 3 (Supplementary File 10_Tree6 and Supplementary File 11_Tree7). Many of them have been known to literature for a long time (even though their CODH abundance has only been discussed sporadically) with the most famous example being C. hydrogenoformans encoding five different CODHs (Wu et al., 2005). Another interesting example is Clostridium formicoaceticum, since this organism has a total of six CODH isoforms encoded in its genome. It needs to be noted, that in our analysis, this organism did not show up as an organism with six CODHs, see Fig. 3A. This is due to our analysis only counting CODH stemming from the same organism when their genes are associated with the same genome assembly. We therefore rather underestimate counts of organism with multiple CODHs, such as the aforementioned. The only CODH from C. formicoaceticum that has so far been isolated, characterized and discussed is one of its clade E CODHs, that is associated with acetyl-CoA synthase (ACS) (Bao et al., 2019; Diekert and Thauer, 1978). Other examples from literature attempted to investigate the influence of CODH isoforms on the metabolism. Archaeglobus fulgidus contains three CODH genes, two from clade A and one from clade D. Its clade D CODH seems to have no role in the CO metabolism of this organism (Hocking et al., 2015). There also has been a report for the organism Methanosarcina acetivorans, which harbors three CODHs, all of them belonging to clade A, where only two of them are associated with ACS and are believed to be involved in the CO metabolism, the other one being a lone gene and is seemingly not involved in carbon metabolism (Matschiavelli et al., 2012). Another interesting example is Thermoanaerobacter kivui, formerly known as Acetogenium kivui (Collins et al., 1994). When TkCODH-I (clade C) is deleted from the organism, the strain loses its ability to grow on CO, however, if grown on H₂+CO₂ the overall acetate production is greatly increased (Jain et al., 2021). Similar effects have been shown for Clostridium autoethanogenum, which contains three isoforms of CODH. If its clade C CODH is deleted, the organism’s lag phase is reduced and its growth rate is greatly increased (Liew et al., 2016). The other two CODH isoforms from this organism are from clade E, and D. Deletion of clade D CODH showed no immediate effect on the organism, except moderately lower overall biomass yield (Liew et al., 2016). In our analysis we saw an increased frequency of co-occurrence of clades C and D with A, E, or F, which together with biological data, may indicate a complementary role, possibly linked to redox sensing or regulatory functions. This is especially evident with the examples for clade C CODHs from T. kivui and C. autoethanogenum. For the case of clade D, which lacks catalytic activity towards CO/CO₂ interconversion (as reported by Jeoung et al. (Jeoung et al., 2022) through their recombinant production of ChCODH-V) and until now has unknown influence in the metabolism that might only manifest in harsher environments, since its believed to be involved in stress response (Jeoung et al., 2022) (similar to HCPs (Hagen, 2022), see below). However, experimental proof for this claim is still missing.

Furthermore, clades A, E, and F rarely co-occur. Interestingly, many organisms do however contain multiple copies of CODHs from one of these clades, such as M. acetivorans (clade A), C. hydrogenofromans (clade F), and C. formicoaceticum (clade E). We suspect an evolutionary reason behind this, as is also outlined by others (Adama et al., 2018; Lindahl and Chang, 2001). Biochemical data indicate that CODHs from these clades possess CO/CO₂ interconversion capability, as previously mentioned. This is also in line with the genetic context of these CODHs, which is most often tuned for this CO/CO₂ interconversion chemistry (Fig. 4C), see below.

The high rate of co-occurrence between CODH and HCP genes (except for clade A) suggests functional integration, a shared metabolic niche, or involvement in a coordinated response to redox stress, given that HCPs are thought to regulate nitric or oxidative stress (Hagen, 2022). The lack of co-localization for clade A CODHs might point to distinct metabolic roles or evolutionary constraints, or on the high rate of archaeal genes in clade A CODHs, even though HCP genes are known to also be found in archaea (Hagen, 2022). Interestingly, the co-occurrence between clade D CODH and HCP seems the highest for our data set, the reason for that is unclear. Similar to clade D CODH, empirical proof that HCP influences the activity or expression of CODH is missing.

The genomic context analysis adds another layer of functional inference, as has been done before by others with different foci (Inoue et al., 2018; Katayama et al., 2024; Matson et al., 2011; Techtmann et al., 2012). We could show again that operons containing clade A CODHs are highly conserved with one carbon pool-related genes and CooC, as they are almost exclusively found as part of the Wood-Ljungdahl pathway, which has a typical arrangement similar to Methanosarcina barkeris CODH (MbCODH, Fig. 2, Supplementary File 6_Tree2). Recently, another representative from this group from M. thermophila (MetCODH) has been resolved (Biester et al., 2024).

In contrast, clade B CODHs appear largely alone or associated with transport-related genes, raising the possibility of a non-canonical or even degenerated function. Its operon composition is also rather consistent and its arrangements only varies to a small extent, as ABC transporters are either coded upstream (as for Ruminococcus flavefaciens’ CODH, RfCODH, see Fig. 2) or downstream of the CODH gene. Almost all operons analyzed do not contain any maturases, expect for a small cluster that branches off rather early in the tree (Supplementary File 6_Tree2). This might indicate that the need for a maturase was lost due to re-purposing of the CODH. We yet await biochemical characterization of any clade B CODH.

Clade C CODHs are associated with FeS cluster proteins, regulators and redox enzymes, pointing towards more regulatory or redox-modulatory roles, which could also be indicated in knock-out studies (Liew et al., 2016). The only isolated example from this clade is TkCODH-I. Its operon exhibits a composition only partially representable for clade C CODHs, containing only one other gene coding for a FeS protein (Fig. 2). Furthermore, TkCODH-I’s sequence branches off early and seems to be rather distinct (Supplementary File 9_Tree5), only having one other close relative from Aceticella autotrophica (Frolov et al., 2023). Furthermore, the reported isolated CODH from Jain and co-workers (Jain et al., 2021) stems from a CO adapted strain (Weghoff and Müller, 2016), which might harbor mutations in the protein sequences that are not accessible to us at the moment. Drawn together, we conclude that right now TkCODH-I might not be an optimal representative for clade C CODHs and more clade C CODHs should be isolated to help us understand their biochemical properties better.

The high operonic variability and frequency of solitary coding regions in clade D might reflect either evolutionary drift or multifunctionality not restricted to operonic structure. Clade D will therefore not be discussed further.

Operons from clades E and F are more functionally complex, including components from the Wood-Ljungdahl pathway, hydrogenases, and additional redox partners, consistent with a diverse metabolic role. That being said, their operon compositions and arrangements showcased some interesting clustering. Starting with clade F, which has the highest proportion of CODHs that might be associated with hydrogenases, including the aforementioned ChCODH-I and RrCODH, both interacting with hydrogenases to produce hydrogen in vivo (Fox et al., 1996; Soboh et al., 2002), however with greatly differing operon compositions (Fig. 2). ChCODH-I like operons (see Supplemetary File 6_Tree2) contain their hydrogenase modules directly within the CODH operon, whereas RrCODH-like operons do not include the hydrogenase module which is coded upstream of the CODH gene, with an intergenic space of > 400 bp (Fox et al., 1996). RrCODH-like operons are the only clade F operons that include two additional maturation enzymes, CooT and CooJ. Clade F also contain many ACS associated CODHs, such as Neomoorella thermoacetica CODH, NtCODH, formerly known as Moorella thermoacetica (Gtari and Ventura, 2025). Those NtCODH-like operons all have the same arrangement. This arrangement is distinct from Clade A and E ACS associated CODH operons.

In our dataset most of the hydrogenases and their maturation genes are associated with clade F, suggesting active hydrogen metabolism, coupling CO oxidation to H₂ production or consumption as has been suggested earlier for a wider range of CODH clades (Inoue et al., 2018; Techtmann et al., 2012). We believe that our data grossly underestimates this relationship overall, since operon examples such as RrCODH and TcCODH-II showcase that hydrogenases associated with a CODH are not necessarily encoded in the same operon. There has also been a report of a clade A CODH from the methanogen M. thermophila (Terlesky and Ferry, 1988) being associated with a hydrogenase. However, in later studies it was shown that the genome of M. thermophila does not contain a hydrogenase (Smith and Ingram-Smith, 2007). Investigation of the electron transport chain of its membrane could not find a hydrogen oxidizing complex (Welte and Deppenmeier, 2011). Together with our analysis we conclude that hydrogenase association is a trait almost exclusive to clade E and F CODHs. ChCODH-II’s operon seems to be rather uniquely constructed as a similar operon composition only can be found for other Carboxydothermus species. A similar situation can be seen for ChCODH-IV, where its operon containing FeS and NAD/FAD-dependent oxidoreductases is closer in similarity to some clade E CODH.

Regarding the biggest clade, clade E, its diversity is striking. NvCODH, formerly known as DvCODH (Waite et al., 2020), from the organism Nitratidesulfovibrio vulgaris, has a very small operon with only two genes in its close proximity, a transcriptional regulator (Zhou et al., 2012) and a maturation enzyme (CooC), see Fig. 2. This is seen for a huge number of both clade E and F CODHs. The occurrence of neither CooJ nor CooT is prominent, both only appear in two very distinct parts of clade E, all of them being associated with one carbon pool metabolism, with one exception from Clostridium pasteurianum BC1. This operon resembles the clade F RrCODH-like operon. The previously introduced archaeal CODHs, TcCODH-I and TcCODH-II from clade E, are contained in operons (Fig. 2), that are rather specific and only found for a few other Thermococcus or Pyrococcus species (Benvenuti et al., 2020; Kim et al., 2015). It needs to be noted, that TcCODH-I’s CooC gene is coded outside of its operon, and on the opposite strand. The CooC gene is therefore not included in our analysis. We are only aware of examples from this type of operon, and don’t expect this to be a common trait of the CODH maturation machinery. However, it needs to be noted that the CooC proportion might be slightly underestimated. Interestingly, TcCODH-II like CODH all contain a CooT like protein in their operon, forming the only cluster of CODH that contain only CooT like proteins without CooJ. Within clade E, another unique genomic neighborhood from TkCODH-II must be pointed out. From experimental data it is known that this CODH is associated with ACS (Jain et al., 2021), however, in our analysis we did not see this ACS complex in TkCODH-II’s operon. This is due to the ACS subunit being coded further downstream of the CODH gene, not being taken into account due to our initial parameters, which showcases, the limits to this study.

For HCPs, the high variability and low operon density — especially in classes I and III — point towards more modular or conditionally expressed roles, similar to clade D CODH. The clear patterning in class II operons, though based on a limited sample, may reflect specialized functions, perhaps in niche-specific oxidoreductase activities.

Conclusion

As previously mentioned, the aim of this study is to identify which CODH clades harbor the most promising enzymes for future application in CO₂ reduction. The operon composition of CODHs from different clades show distinct differences and what we could gather from this information is that clade A, E and F are the most likely clades to harbor CODHs able to efficiently convert CO₂ to CO. These clades are therefore the most interesting for CO₂ reducing biotechnological applications, or as inspiration for new synthetic catalysts. Also, literature has shown that the activity of many CODHs depend on co-expression with maturation proteins such as CooC. In some cases, CooJ and CooT are also required for full activation. Although some CODHs (most notably CODH-II from C. hydrogenoformans) can function independently of maturases, our neighborhood analysis indicates that maturase-coding genes are predominantly found in operons from clades A, E, and F. This pattern implies yet again that these clades may represent more biochemically active or catalytically optimized CODHs, making them promising targets for future functional studies and biotechnological applications.

The function of Clade B could not be deduced based on its genomic environment but it seems to have a remarkable self-standing function, that is not shared with any other CODHs. Its low co-occurrence with other CODH clades within organisms also supports this unique role for Clade B. Clades C and D are more likely to show low or even no activity towards CO₂/CO interconversion, as was deduced from literature and the lack of C1 metabolism related genes in their operons. However, Jain et. al. recently showed low CO oxidation activity in a clade C CODH from T. kivui, but this enzyme originated from a strain that had acquired the ability to grow on CO through laboratory evolution (Jain et al., 2021). Sequence data used to classify the CODH into clade C was from the original strain (incapable of growing on CO) and data on the engineered strain is not available. It is therefore not known whether the active CODH is the wild-type or an engineered enzyme and we cannot draw any conclusions regarding the activity of clade C CODHs. Taken together, this makes clades B, C and D less promising in the hunt for CO₂ reduction catalysts. However, much is still unknown about these enzymes such as their cellular function.

Future work should focus on experimentally validating the functional differences among CODH isoforms, especially in organisms that contain members of multiple clades. Caution is warranted when extrapolating enzymatic activity or inactivity from a limited number of characterized examples to entire clades. Additionally, transcriptomic and proteomic studies could illuminate condition-dependent expression patterns and confirm proposed regulatory functions. Future bioinformatic work should look at the co-occurrence of CODH with other proteins outside their operons, once more sequence data is available. Finally, deeper phylogenomic analyses may reveal the evolutionary drivers behind the observed distribution and diversification of these ancient redox enzymes.

Methods

Data collection and refinement

Multiple pBLAST (Madden, 2013) searches (BLOSUM62, E < 0.05, blastp 2.16.1+, NCBI on-line web server, nr database accessed 2025-01-16) were carried out using NCBI accession numbers provided by Inoue et al. (Inoue et al., 2018) (A-1, WP_011305243; A-2, WP_010878596; A-3, OGW06734; A-4, OIP92259; A-5, ODS42986; A-6, OIP30420; B-1, WP_026514536; B-2, WP_015485077; B-3,WP_012645460; B-4, WP_011393470; C-1, WP_039226206; C-2, WP_013237576; C-3, WP_010870233; C-4, WP_044921150; D-1, WP_011342982; D-2, WP_015926279; D-3, WP_079933214; D-4, WP_096205957; E-1, WP_012571978; E-2, WP_010939375; E-3, WP_088535808; F-1, WP_011343033; F-2, WP_011389181; G-1, OGP75751) and Techtmann et al. (Techtmann et al., 2012) (mini CooS, WP_007288589.1). Accession numbers from CODH from clade H (Inoue et al., 2022) were initially omitted due to limited host information. Duplicates were removed using seqkit’s (Shen et al., 2016) rmdup v2.9.0. Sequences of length below 400 amino acids were removed. Clustering was performed to further reduce data size, by using cd-hit v4.8.1 (Li et al., 2002, 2001; Li and Godzik, 2006) and a global sequence identity of 99% or 90%, the later only used for tree generation. It was necessary to have high sequence similarity in the clustering within organisms, since it was known that some organisms have multiple CODH with striking sequence similarities in their genome such as Clostridium pasteurianum BC1 (taxid: 86416) that contains WP_015614757.1 and WP_015615315.1 with 93.27% similarity. For the dataset involved in neighbor analysis, taxonomic information for each sequence was retrieved using R-packages taxize v0.10.0 (Chamberlain et al., 2020; Chamberlain and Szocs, 2013) and taxizedb v0.3.2 (Chamberlain et al., 2025), and only sequences that could be related to a recorded organism were kept (Supplementary File 3_Table S3 and Supplementary File 4_Table S4). Sequences were aligned using E-INS-I from mafft v7.526 (Katoh and Standley, 2013) and sequences that had gaps in important positions related to D, B or C cluster or acid base active site residues were sorted out. The alignment was trimmed using trimAl’s (Capella-Gutiérrez et al., 2009) automated1 option v1.4.rev22 and a tree was generated using FastTree v2.2 (Price et al., 2010). Via visual inspection further sequences that were not CODH sequences were removed. Leaves from unusually long branches were examined manually. If a sequence from such a branch was annotated as a protein other than hydroxylamine reductase or carbon monoxide dehydrogenase (CODH), it was investigated further. We assessed whether the protein length exceeded 400 amino acids and whether the key clusters were present. If these criteria still yielded ambiguous results, the protein structure was predicted using AlphaFold 3 (Abramson et al., 2024) to determine whether it adopted the characteristic CODH fold. Sequences lacking this fold were discarded. The final list of CODH sequences used in the neighbor and correlation analysis counted 1376. A similar approach was done for HCP (class I, Q01770.2; class II, WP_000458809.1; class III, WP_013294878.1) and a final count of 1545 sequences was collected for neighbor and correlation analysis. We applied the same procedure described for the previous dataset to a second set of CODH genes curated at 90% cd-hit identity, yielding 5,508 sequences. See Fig. S3 for detailed flowchart. Custom code can be found and retrieved for GitHub (https://github.com/boehmax) (Böhm, 2025a, 2025b, 2025c, 2025d).

Neighbor analysis

Genome information was downloaded for the accession lists generated for CODH and HCP. Therewith, 955 and 1425 genomes were downloaded , respectively, from NCBI’s genome database using NCBI-datasets command line tools v18.5.0 (O’Leary et al., 2024). Neighboring genes were defined as those located within a maximum of 15 genes upstream or downstream of the target gene, with an intergenic distance not exceeding 300 base pairs (bp), as was done previously by Inoue et al. (Inoue et al., 2018). We decided to use this relatively large intergenic distance to include as many neighbors as possible, and we expect that unrelated genes will disappear in the noise. For the same reason, we included an overlap region of 50 bp for genes in the same operon, which is rather high, as genes for example in E. coli usually overlap 1 - 4 bp (Johnson and Chisholm, 2004). Amino acid sequences for those genes were retrieved from the NCBI nr protein database using Entrez v23.5 (Sayers, 2022). Their function was predicted using eggNOG v5.0 (Huerta-Cepas et al., 2019) and eggNOG-mapper v2.1.12 (Cantalapiedra et al., 2021). We considered results from eggNOG, as well as product predictions from NCBI, when manually assigning selected functional groups. The data was plotted using R v4.4.3 (R Core Team, 2023), tidyverse v2.0.0 (Wickham et al., 2019), patchwork v1.3.1 (Pedersen, 2025), ggnewscale v0.5.2 (Elio Campitelli et al., 2025), ggtree v3.14.0 (Yu et al., 2018, 2017), ggtreeExtra v1.16.0 (Xu et al., 2021), and treeio v1.30.0 (Wang et al., 2020) and gggenes v0.5.1 (Wilkins, 2023). Since CooJ determination was neither possible with the NCBI prediction nor via eggNOG, we selected operons from clade E and F that contained CooS and CooT, and manually extracted accession numbers of potential CooJs which were used search for further accession numbers using PSI-BLAST (BLOSUM45, E < 0.001, NCBI on-line web server, nr database accessed 2025-04-16). The summary can be found in Supplementary File 1_Table S1. These accessions were used to help annotate potential CooJs in our analysis. We could identify 68 potential CooJ genes.

Correlation analysis

Correlation coefficients of CODH and HCP from different clades/classes were calculated according the formula

where N_Y is the total number of assemblies containing protein from clade/class Y, N_XY is the total number of assemblies containing proteins from both clade/class X and Y, and P(X|Y) is the probability that a genome coding for a protein from clade/class Y also codes for a protein from clade/class X.

Tree generation

In total, five trees were generated. Trees carrying phylogenetic information were generated via IQ-TREE v2.0.7 (Minh et al., 2020) with the LG+I+R10 model and ultrafast bootstrapping with 1000 resampling for a dataset of 5508 CODH sequences, a dataset of 1351 CODH sequences, and a dataset of 1476 HCP sequences (see above for details on their generation). For the 5508 sequence CODH dataset an outgroup was introduced to root the tree (MBE6442607.1). Sequences were aligned within their dataset using mafft’s FFT-NS-2 v7.526. The alignment was again trimmed using trimAl v1.4.rev22 and built using IQ-TREE v2.0.7 with the above parameters. For tree inspection and plotting ggtree v3.14.0 (Yu et al., 2017) was used. The two other trees generated are taxonomic trees, either only on taxid using a custom python script and ete3 v3.1.3 (Huerta-Cepas et al., 2016), or from WoL: Reference Phylogeny for Microbes (Zhu, 2023; Zhu et al., 2019). Clades of CODH where defined as described previously by others (Inoue et al., 2022, 2018; Techtmann et al., 2012). Our tree showed similar topology with bootstraps of >75% for all clades.

Data availability

All codes for bioinformatic analysis presented in this paper is openly accessible at GitHub under the following DOI:s; https://doi.org/10.5281/zenodo.16736767 https://doi.org/10.5281/zenodo.16736754 https://doi.org/10.5281/zenodo.16736722 https://doi.org/10.5281/zenodo.16744414

Acknowledgements

The Novo Nordisk Foundation (Grant reference number NNF21OC0066716) is gratefully acknowledged for funding.

Additional files

Supplementary Files

Additional information

Funding

Novo Nordisk Fonden (NNF) (NNF21OC0066716)

Henrik Land

Novo Nordisk Fonden (NNF) (NNF21OC0066716)

Maximilian Böhm

References

1. Abramson J
2. Adler J
3. Dunger J
4. Evans R
5. Green T
6. Pritzel A
7. Ronneberger O
8. Willmore L
9. Ballard AJ
10. Bambrick J
11. Bodenstein SW
12. Evans DA
13. Hung C-C
14. O’Neill M
15. Reiman D
16. Tunyasuvunakool K
17. Wu Z
18. Žemgulytė A
19. Arvaniti E
20. Beattie C
21. Bertolli O
22. Bridgland A
23. Cherepanov A
24. Congreve M
25. Cowen-Rivers AI
26. Cowie A
27. Figurnov M
28. Fuchs FB
29. Gladman H
30. Jain R
31. Khan YA
32. Low CMR
33. Perlin K
34. Potapenko A
35. Savy P
36. Singh S
37. Stecula A
38. Thillaisundaram A
39. Tong C
40. Yakneen S
41. Zhong ED
42. Zielinski M
43. Žídek A
44. Bapst V
45. Kohli P
46. Jaderberg M
47. Hassabis D
48. Jumper JM
2024Accurate structure prediction of biomolecular interactions with AlphaFold 3Nature 630:493–500https://doi.org/10.1038/s41586-024-07487-w Google Scholar
1. Adama PS
2. Borrela G
3. Gribaldoa S.
2018Evolutionary history of carbon monoxide dehydrogenase/acetyl-CoA synthase, one of the oldest enzymatic complexesProc Natl Acad Sci U S A 115:E5836–E5837https://doi.org/10.1073/pnas.1716667115 Google Scholar
1. Babbitt PC
2. Hasson MS
3. Wedekind JE
4. Palmer DRJ
5. Barrett WC
6. Reed GH
7. Rayment I
8. Ringe D
9. Kenyon GL
10. Gerlt JA
1996The enolase superfamily: A general strategy for enzyme-catalyzed abstraction of the α-protons of carboxylic acidsBiochemistry 35:16489–16501https://doi.org/10.1021/bi9616413 Google Scholar
1. Bao T
2. Cheng C
3. Xin X
4. Wang J
5. Wang M
6. Yang S-T.
2019Deciphering mixotrophic Clostridium formicoaceticum metabolism and energy conservation: Genomic analysis and experimental studiesGenomics 111:1687–1694https://doi.org/10.1016/j.ygeno.2018.11.020 Google Scholar
1. Basak Y
2. Lorent C
3. Jeoung J-H
4. Zebger I
5. Dobbek H.
2025Metalloradical-driven enzymatic CO2 reduction by a dynamic Ni–Fe clusterNature Catalysis :1–10https://doi.org/10.1038/s41929-025-01388-5 Google Scholar
1. Benvenuti M
2. Meneghello M
3. Guendon C
4. Jacq-Bailly A
5. Jeoung JH
6. Dobbek H
7. Leger C
8. Fourmond V
9. Dementin S.
2020The two CO-dehydrogenases of Thermococcus spAM4. Biochim Biophys Acta Bioenerg 1861:148188https://doi.org/10.1016/j.bbabio.2020.148188 Google Scholar
1. Biester A
2. Grahame DA
3. Drennan CL
2024Capturing a methanogenic carbon monoxide dehydrogenase/acetyl-CoA synthase complex via cryogenic electron microscopyProceedings of the National Academy of Sciences 121:e2410995121https://doi.org/10.1073/pnas.2410995121 Google Scholar
1. Böhm M.
2025aprotein-to-genomehttps://doi.org/10.5281/zenodo.16736767
1. Böhm M.
2025bprotein-per-organismhttps://doi.org/10.5281/zenodo.16736754
1. Böhm M.
2025cprotein-neighbourshttps://doi.org/10.5281/zenodo.16736722
1. Böhm M.
2025dfilter-gapshttps://doi.org/10.5281/zenodo.16744414
1. Can M
2. Armstrong FA
3. Ragsdale SW
2014Structure, function, and mechanism of the nickel metalloenzymes, CO dehydrogenase, and acetyl-CoA synthaseChem Rev 114:4149–74https://doi.org/10.1021/cr400461p Google Scholar
1. Cantalapiedra CP
2. Hernández-Plaza A
3. Letunic I
4. Bork P
5. Huerta-Cepas J.
2021eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic ScaleMolecular Biology and Evolution 38:5825–5829https://doi.org/10.1093/molbev/msab293 Google Scholar
1. Capella-Gutiérrez S
2. Silla-Martínez JM
3. Gabaldón T.
2009trimAl: a tool for automated alignment trimming in large-scale phylogenetic analysesBioinformatics 25:1972–1973https://doi.org/10.1093/bioinformatics/btp348 Google Scholar
1. Chamberlain S
2. Arendsee Z
3. Stirling T.
2025taxizedb: Tools for Working with “Taxonomic” Databaseshttps://doi.org/10.5281/zenodo.1158055 Google Scholar
1. Chamberlain S
2. Szocs E.
2013taxize - taxonomic search and retrieval in R2Google Scholar
1. Chamberlain S
2. Szoecs E
3. Foster Z
4. Arendsee Z
5. Boettiger C
6. Ram K
7. Bartomeus I
8. Baumgartner J
9. O’Donnell J
10. Oksanen J
11. Tzovaras BG
12. Marchand P
13. Tran V
14. Salmon M
15. Li G
2020taxize: Taxonomic information from around the web (manual)https://github.com/ropensci/taxize
1. Collins MD
2. Lawson PA
3. Willems A
4. Cordoba JJ
5. Fernandez-Garayzabal J
6. Garcia P
7. Cai J
8. Hippe H
9. Farrow JAE
1994The Phylogeny of the Genus Clostridium: Proposal of Five New Genera and Eleven New Species CombinationsInternational Journal of Systematic and Evolutionary Microbiology 44:812–826https://doi.org/10.1099/00207713-44-4-812 Google Scholar
1. Diekert GB
2. Thauer RK
1978Carbon Monoxide Oxidation by Clostridium thermoaceticum and Clostridium formicoaceticumJournal of Bacteriology 136:597–606https://doi.org/10.1128/jb.136.2.597-606.1978 Google Scholar
1. Domnik L
2. Merrouch M
3. Goetzl S
4. Jeoung JH
5. Leger C
6. Dementin S
7. Fourmond V
8. Dobbek H.
2017CODH-IV: A High-Efficiency CO-Scavenging CO Dehydrogenase with Resistance to O2Angew Chem Int Ed Engl 56:15466–15469https://doi.org/10.1002/anie.201709261 Google Scholar
1. Campitelli Elio
2. van den Brand Teun
2025ggnewscale: Multiple Fill and Color Scales in ggplot2https://doi.org/10.5281/ZENODO.2543762
1. Fox JD
2. He Y
3. Shelver D
4. Roberts GP
5. Ludden PW
1996Characterization of the region encoding the CO-induced hydrogenase of Rhodospirillum rubrumJournal of Bacteriology 178:6200–6208https://doi.org/10.1128/jb.178.21.6200-6208.1996 Google Scholar
1. Frolov EN
2. Elcheninov AG
3. Gololobova AV
4. Toshchakov SV
5. Novikov AA
6. Lebedinsky AV
7. Kublanov IV
2023Obligate autotrophy at the thermodynamic limit of life in a new acetogenic bacteriumFrontiers in Microbiology 14https://doi.org/10.3389/fmicb.2023.1185739 Google Scholar
1. Fujishiro T
2. Takaoka K.
2023Class III hybrid cluster protein homodimeric architecture shows evolutionary relationship with Ni, Fe-carbon monoxide dehydrogenasesNature Communications 14:5609https://doi.org/10.1038/s41467-023-41289-4 Google Scholar
1. Gong W
2. Hao B
3. Wei Z
4. Ferguson DJ
5. Tallant T
6. Krzycki JA
7. Chan MK
2008Structure of the α2ε2 Ni-dependent CO dehydrogenase component of the Methanosarcina barkeri acetyl-CoA decarbonylase/synthase complexProceedings of the National Academy of Sciences 105:9558–9563https://doi.org/10.1073/pnas.0800415105 Google Scholar
1. Gtari M
2. Ventura S.
2025Proposal of Neomoorella gen. nov. as a replacement name for the illegitimate prokaryotic genus name Moorella Collins et al. 1994International Journal of Systematic and Evolutionary Microbiology 75:006779https://doi.org/10.1099/ijsem.0.006779 Google Scholar
1. Hadj-Said J
2. Pandelia ME
3. Leger C
4. Fourmond V
5. Dementin S.
2015The Carbon Monoxide Dehydrogenase from Desulfovibrio vulgarisBiochim Biophys Acta 1847:1574–83https://doi.org/10.1016/j.bbabio.2015.08.002 Google Scholar
1. Hagen WR
2022Structure and function of the hybrid cluster proteinCoordination Chemistry Reviews 457https://doi.org/10.1016/j.ccr.2021.214405 Google Scholar
1. Hocking WP
2. Roalkvam I
3. Magnussen C
4. Stokke R
5. Steen IH
2015Assessment of the Carbon Monoxide Metabolism of the Hyperthermophilic Sulfate-Reducing Archaeon Archaeoglobus fulgidus VC-16 by Comparative Transcriptome AnalysesArchaea 2015:235384https://doi.org/10.1155/2015/235384 Google Scholar
1. Huerta-Cepas J
2. Serra F
3. Bork P.
2016ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic DataMolecular Biology and Evolution 33:1635–1638https://doi.org/10.1093/molbev/msw046 Google Scholar
1. Huerta-Cepas J
2. Szklarczyk D
3. Heller D
4. Hernández-Plaza A
5. Forslund SK
6. Cook H
7. Mende DR
8. Letunic I
9. Rattei T
10. Jensen LJ
11. von Mering C
12. Bork P.
2019eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 virusesNucleic Acids Research 47:D309–D314https://doi.org/10.1093/nar/gky1085 Google Scholar
1. Inoue M
2. Nakamoto I
3. Omae K
4. Oguro T
5. Ogata H
6. Yoshida T
7. Sako Y.
2018Structural and Phylogenetic Diversity of Anaerobic Carbon-Monoxide DehydrogenasesFront Microbiol 9:3353https://doi.org/10.3389/fmicb.2018.03353 Google Scholar
1. Inoue M
2. Omae K
3. Nakamoto I
4. Kamikawa R
5. Yoshida T
6. Sako Y.
2022Biome-specific distribution of Ni-containing carbon monoxide dehydrogenasesExtremophiles 26:9https://doi.org/10.1007/s00792-022-01259-y Google Scholar
1. Inoue T
2. Takao K
3. Fukuyama Y
4. Yoshida T
5. Sako Y.
2014Over-expression of carbon monoxide dehydrogenase-I with an accessory protein co-expression: a key enzyme for carbon dioxide reductionBiosci Biotechnol Biochem 78:582–7https://doi.org/10.1080/09168451.2014.890027 Google Scholar
1. Jain S
2. Katsyv A
3. Basen M
4. Muller V.
2021The monofunctional CO dehydrogenase CooS is essential for growth of Thermoanaerobacter kivui on carbon monoxideExtremophiles 26:4https://doi.org/10.1007/s00792-021-01251-y Google Scholar
1. Jeoung J-H
2. Dobbek H.
2007Carbon Dioxide Activation at the Ni,Fe-Cluster of Anaerobic Carbon Monoxide DehydrogenaseScience 318:1461–1464https://doi.org/10.1126/science.1148481 Google Scholar
1. Jeoung JH
2. Fesseler J
3. Domnik L
4. Klemke F
5. Sinnreich M
6. Teutloff C
7. Dobbek H.
2022A Morphing [4Fe-3S-nO]-Cluster within a Carbon Monoxide Dehydrogenase ScaffoldAngew Chem Int Ed Engl 61:e202117000https://doi.org/10.1002/anie.202117000 Google Scholar
1. Johnson ZI
2. Chisholm SW
2004Properties of overlapping genes are conserved across microbial genomesGenome Research 14:2268–2272https://doi.org/10.1101/gr.2433104 Google Scholar
1. Katayama YA
2. Kamikawa R
3. Yoshida T.
2024Phylogenetic diversity of putative nickel-containing carbon monoxide dehydrogenase-encoding prokaryotes in the human gut microbiomeMicrobial Genomics 10https://doi.org/10.1099/mgen.0.001285 Google Scholar
1. Katoh K
2. Standley DM
2013MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and UsabilityMolecular Biology and Evolution 30:772–780https://doi.org/10.1093/molbev/mst010 PubMed Google Scholar
1. Kerby RL
2. Ludden PW
3. Roberts GP
1997In vivo nickel insertion into the carbon monoxide dehydrogenase of Rhodospirillum rubrum: molecular and physiological characterization of cooCTJJournal of Bacteriology 179:2259–2266https://doi.org/10.1128/jb.179.7.2259-2266.1997 Google Scholar
1. Kim M-S
2. Choi AR
3. Lee SH
4. Jung H-C
5. Bae SS
6. Yang T-J
7. Jeon JH
8. Lim JK
9. Youn H
10. Kim TW
11. Lee HS
12. Kang SG
2015A Novel CO-Responsive Transcriptional Regulator and Enhanced H2 Production by an Engineered Thermococcus onnurineus NA1 StrainApplied and Environmental Microbiology 81:1708–1714https://doi.org/10.1128/AEM.03019-14 Google Scholar
1. Knox HL
2. Allen KN
2023Expanding the viewpoint: Leveraging sequence information in enzymologyCurrent Opinion in Chemical Biology 72:102246https://doi.org/10.1016/j.cbpa.2022.102246 Google Scholar
1. Li W
2. Godzik A.
2006Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequencesBioinformatics 22:1658–1659https://doi.org/10.1093/bioinformatics/btl158 Google Scholar
1. Li W
2. Jaroszewski L
3. Godzik A.
2002Tolerating some redundancy significantly speeds up clustering of large protein databasesBioinformatics 18:77–82https://doi.org/10.1093/bioinformatics/18.1.77 Google Scholar
1. Li W
2. Jaroszewski L
3. Godzik A.
2001Clustering of highly homologous sequences to reduce the size of large protein databasesBioinformatics 17:282–283https://doi.org/10.1093/bioinformatics/17.3.282 Google Scholar
1. Liew F
2. Henstra AM
3. Winzer K
4. Köpke M
5. Simpson SD
6. Minton NP
2016Insights into CO2 Fixation Pathway of Clostridium autoethanogenum by Targeted MutagenesismBio 7https://doi.org/10.1128/mbio.00427-16 Google Scholar
1. Lindahl PA
2. Chang B.
2001The Evolution of Acetyl-CoA SynthaseOrigins of Life and Evolution of the Biosphere 31:403–434Google Scholar
1. Madden T.
2013The BLAST Sequence Analysis ToolIn: The NCBI Handbook [Internet] National Center for Biotechnology Information (US https://www.ncbi.nlm.nih.gov/books/NBK143764/Google Scholar
1. Matschiavelli N
2. Oelgeschläger E
3. Cocchiararo B
4. Finke J
5. Rother M.
2012Function and Regulation of Isoforms of Carbon Monoxide Dehydrogenase/Acetyl Coenzyme A Synthase in Methanosarcina acetivoransJournal of Bacteriology 194:5377–5387https://doi.org/10.1128/JB.00881-12 PubMed Google Scholar
1. Matson EG
2. Gora KG
3. Leadbetter JR
2011Anaerobic Carbon Monoxide Dehydrogenase Diversity in the Homoacetogenic Hindgut Microbial Communities of Lower Termites and the Wood RoachPLOS One 6:e19316https://doi.org/10.1371/journal.pone.0019316 Google Scholar
1. Merrouch M
2. Benvenuti M
3. Lorenzi M
4. Leger C
5. Fourmond V
6. Dementin S.
2018Maturation of the [Ni-4Fe-4S] active site of carbon monoxide dehydrogenasesJ Biol Inorg Chem 23:613–620https://doi.org/10.1007/s00775-018-1541-0 Google Scholar
1. Minh BQ
2. Schmidt HA
3. Chernomor O
4. Schrempf D
5. Woodhams MD
6. von Haeseler A
7. Lanfear R.
2020IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic EraMolecular Biology and Evolution 37:1530–1534https://doi.org/10.1093/molbev/msaa015 Google Scholar
1. O’Leary NA
2. Cox E
3. Holmes JB
4. Anderson WR
5. Falk R
6. Hem V
7. Tsuchiya MTN
8. Schuler GD
9. Zhang X
10. Torcivia J
11. Ketter A
12. Breen L
13. Cothran J
14. Bajwa H
15. Tinne J
16. Meric PA
17. Hlavina W
18. Schneider VA
2024Exploring and retrieving sequence and metadata for species across the tree of life with NCBI DatasetsScientific Data 11:732https://doi.org/10.1038/s41597-024-03571-y Google Scholar
1. Pedersen TL
2025patchwork: The Composer of Plotshttps://patchwork.data-imaginist.com
1. Price MN
2. Dehal PS
3. Arkin AP
2010FastTree 2 – Approximately Maximum-Likelihood Trees for Large AlignmentsPLOS One 5:e9490https://doi.org/10.1371/journal.pone.0009490 Google Scholar
1. R Core Team
2023R: A Language and Environment for Statistical Computinghttps://www.R-project.org/
1. Sayers E.
2022A General Introduction to the E-utilitiesIn: Entrez Programming Utilities Help [Internet] National Center for Biotechnology Information (US https://www.ncbi.nlm.nih.gov/books/NBK25497/Google Scholar
1. Shen W
2. Le S
3. Li Y
4. Hu F.
2016SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File ManipulationPLOS One 11:e0163962https://doi.org/10.1371/journal.pone.0163962 Google Scholar
1. Smith KS
2. Ingram-Smith C.
2007Methanosaeta, the forgotten methanogen?Trends in Microbiology 15:150–155https://doi.org/10.1016/j.tim.2007.02.002 Google Scholar
1. Soboh B
2. Linder D
3. Hedderich R.
2002Purification and catalytic properties of a CO-oxidizing:H2-evolving enzyme complex from Carboxydothermus hydrogenoformansEur J Biochem 269:5712–21https://doi.org/10.1046/j.1432-1033.2002.03282.x Google Scholar
1. Techtmann SM
2. Lebedinsky AV
3. Colman AS
4. Sokolova TG
5. Woyke T
6. Goodwin L
7. Robb FT
2012Evidence for horizontal gene transfer of anaerobic carbon monoxide dehydrogenasesFront Microbiol 3:132https://doi.org/10.3389/fmicb.2012.00132 Google Scholar
1. Terlesky KC
2. Ferry JG
1988Ferredoxin requirement for electron transport from the carbon monoxide dehydrogenase complex to a membrane-bound hydrogenase in acetate-grown Methanosarcina thermophilaJournal of Biological Chemistry 263:4075–4079https://doi.org/10.1016/S0021-9258(18)68892-1 Google Scholar
1. Waite DW
2. Chuvochina M
3. Pelikan C
4. Parks DH
5. Yilmaz P
6. Wagner M
7. Loy A
8. Naganuma T
9. Nakai R
10. Whitman WB
11. Hahn MW
12. Kuever J
13. Hugenholtz P.
2020Proposal to reclassify the proteobacterial classes Deltaproteobacteria and Oligoflexia, and the phylum Thermodesulfobacteria into four phyla reflecting major functional capabilitiesInternational Journal of Systematic and Evolutionary Microbiology 70:5972–6016https://doi.org/10.1099/ijsem.0.004213 Google Scholar
1. Wang L-G
2. TT-Y Lam
3. Xu S
4. Dai Z
5. Zhou L
6. Feng T
7. Guo P
8. Dunn CW
9. Jones BR
10. Bradley T
11. Zhu H
12. Guan Y
13. Jiang Y
14. Yu G.
2020Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated DataMolecular Biology and Evolution 37:599–603https://doi.org/10.1093/molbev/msz240 Google Scholar
1. Weghoff MC
2. Müller V.
2016CO Metabolism in the Thermophilic Acetogen Thermoanaerobacter kivuiApplied and Environmental Microbiology 82:2312–2319https://doi.org/10.1128/AEM.00122-16 Google Scholar
1. Welte C
2. Deppenmeier U.
2011Membrane-Bound Electron Transport in Methanosaeta thermophilaJournal of Bacteriology 193:2868–2870https://doi.org/10.1128/jb.00162-11 Google Scholar
1. Wickham H
2. Averick M
3. Bryan J
4. Chang W
5. McGowan L
6. François R
7. Grolemund G
8. Hayes A
9. Henry L
10. Hester J
11. Kuhn M
12. Pedersen T
13. Miller E
14. Bache S
15. Müller K
16. Ooms J
17. Robinson D
18. Seidel D
19. Spinu V
20. Takahashi K
21. Vaughan D
22. Wilke C
23. Woo K
24. Yutani H.
2019Welcome to the TidyverseJournal of Open Source Software 4:1686https://doi.org/10.21105/joss.01686 Google Scholar
1. Wilkins D.
2023gggenes: Draw Gene Arrow Maps in “ggplot2.”Google Scholar
1. Wittenborn EC
2. Guendon C
3. Merrouch M
4. Benvenuti M
5. Fourmond V
6. Leger C
7. Drennan CL
8. Dementin S.
2020The Solvent-Exposed Fe-S D-Cluster Contributes to Oxygen-Resistance in Desulfovibrio vulgaris Ni-Fe Carbon Monoxide DehydrogenaseACS Catal 10:7328–7335https://doi.org/10.1021/acscatal.0c00934 Google Scholar
1. Wu M
2. Ren Q
3. Durkin AS
4. Daugherty SC
5. Brinkac LM
6. Dodson RJ
7. Madupu R
8. Sullivan SA
9. Kolonay JF
10. Haft DH
11. Nelson WC
12. Tallon LJ
13. Jones KM
14. Ulrich LE
15. Gonzalez JM
16. Zhulin IB
17. Robb FT
18. Eisen JA
2005Life in hot carbon monoxide: the complete genome sequence of Carboxydothermus hydrogenoformans Z-2901PLoS Genet 1:e65https://doi.org/10.1371/journal.pgen.0010065 Google Scholar
1. Xu S
2. Dai Z
3. Guo P
4. Fu X
5. Liu S
6. Zhou L
7. Tang W
8. Feng T
9. Chen M
10. Zhan L
11. Wu T
12. Hu E
13. Jiang Y
14. Bo X
15. Yu G.
2021ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic DataMolecular Biology and Evolution 38:4039–4042https://doi.org/10.1093/molbev/msab166 Google Scholar
1. Yaniv M.
2011The 50th Anniversary of the Publication of the Operon Theory in the Journal of Molecular Biology: Past, Present and FutureJournal of Molecular Biology 409:1–6https://doi.org/10.1016/j.jmb.2011.03.041 Google Scholar
1. Yu G
2. TT-Y Lam
3. Zhu H
4. Guan Y.
2018Two Methods for Mapping and Visualizing Associated Data on Phylogeny Using GgtreeMolecular Biology and Evolution 35:3041–3043https://doi.org/10.1093/molbev/msy194 Google Scholar
1. Yu G
2. Smith DK
3. Zhu H
4. Guan Y
5. Lam TT-Y.
2017ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated dataMethods in Ecology and Evolution 8:28–36https://doi.org/10.1111/2041-210X.12628 Google Scholar
1. Zhou A
2. Chen YI
3. Zane GM
4. He Z
5. Hemme CL
6. Joachimiak MP
7. Baumohl JK
8. He Q
9. Fields MW
10. Arkin AP
11. Wall JD
12. Hazen TC
13. Zhou J.
2012Functional Characterization of Crp/Fnr-Type Global Transcriptional Regulators in Desulfovibrio vulgaris HildenboroughApplied and Environmental Microbiology 78:1168–1177https://doi.org/10.1128/AEM.05666-11 Google Scholar
1. Zhu Q.
2023WoL: Reference Phylogeny for MicrobesWoL https://biocore.github.io/wol/
1. Zhu Q
2. Mai U
3. Pfeiffer W
4. Janssen S
5. Asnicar F
6. Sanders JG
7. Belda-Ferre P
8. Al-Ghalith GA
9. Kopylova E
10. McDonald D
11. Kosciolek T
12. Yin JB
13. Huang S
14. Salam N
15. Jiao J-Y
16. Wu Z
17. Xu ZZ
18. Cantrell K
19. Yang Y
20. Sayyari E
21. Rabiee M
22. Morton JT
23. Podell S
24. Knights D
25. Li W-J
26. Huttenhower C
27. Segata N
28. Smarr L
29. Mirarab S
30. Knight R.
2019Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and ArchaeaNature Communications 10:5477https://doi.org/10.1038/s41467-019-13443-4 Google Scholar
1. Sayers EW
2. Beck J
3. Bolton EE
4. Brister JR
5. Chan J
6. Connor R
7. Feldgarden M
8. Fine AM
9. Funk K
10. Hoffman J
11. Kannan S
12. Kelly C
13. Klimke W
14. Kim S
15. Lathrop S
16. Marchler-Bauer A
17. Murphy TD
18. O'Sullivan C
19. Schmieder E
20. Skripchenko Y
21. Stine A
22. Thibaud-Nissen F
23. Wang J
24. Ye J
25. Zellers E
26. Schneider VA
27. Pruitt KD
2025Database resources of the National Center for Biotechnology Information in 2025GenBank

Article and author information

Author information

Maximilian Böhm
Molecular Biomimetics, Department of Chemistry – Ångström Laboratory, Uppsala University, Uppsala SE-75120, Sweden
ORCID iD: 0000-0003-0205-8030
Henrik Land
Molecular Biomimetics, Department of Chemistry – Ångström Laboratory, Uppsala University, Uppsala SE-75120, Sweden
ORCID iD: 0000-0003-3073-5641
- For correspondence: henrik.land@kemi.uu.se

Author Notes

Competing interests: No competing interests declared

Version history

Sent for peer review: September 14, 2025
Preprint posted: September 19, 2025
Reviewed Preprint version 1: January 2, 2026
Reviewed Preprint version 2: March 2, 2026
Version of Record published: April 7, 2026

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.108780. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 723
downloads: 54
citations: 2

Views, downloads and citations are aggregated across all versions of this paper published by eLife.